1 #include "caffe2/sgd/learning_rate_adaption_op.h" 7 LearningRateAdaptionOp<float, CPUContext>);
9 OPERATOR_SCHEMA(LearningRateAdaption)
12 .AllowInplace({{0, 0}})
14 Learning Rate Adaption is an operation that perform one iteration of 15 gradient descent based on learning rate: 16 lr(k) = lr(k-1) - lr_alpha * df(k-1)/dlr, 17 where df(k-1)/dlr is the gradient of objective function f on lr, and 18 lr_alpha is a learning rate hyperparameter. It can be prove that 19 df(k-1)/dlr equals INNERPRODUCT(grad(k-1), -grad(k-2)), where grad(k-1) is 20 the grad of f(k-1) on parameters. When the argument 21 "normalized_lr_adaption" is false, we simply perform the 23 lr(k) = lr(k-1) - lr_alpha * INNERPRODUCT(grad(k-1), grad(k-2)). 24 If we set "normalized_lr_adaption" to be true, we do not directly apply 25 INNERPRODUCT(grad(k-1), -grad(k-2)) as the grad. Instead, we perform the 27 lr(k) = lr(k-1) + lr_alpha * cosineSimilarity(grad(k-1), grad(k-2)). 31 "the learning rate for performing gradient descent on learning rate lr")
33 "normalized_lr_adaption",
34 "whether to apply normalized lr adaption or not")
35 .Input(0,
"lr",
"Learning rate")
36 .Input(1,
"grad",
"Gradient computed")
37 .Input(2,
"effgrad",
"The effective grad")
38 .Output(0,
"output_lr",
"Updated learning rate");
40 NO_GRADIENT(LearningRateAdaption);
A global dictionary that holds information about what Caffe2 modules have been loaded in the current ...