5 REGISTER_CPU_OPERATOR(Wngrad, WngradOp<float, CPUContext>);
6 OPERATOR_SCHEMA(Wngrad)
9 .AllowInplace({{0, 0}, {1, 1}})
12 Computes the WnGrad update for an input gradient and accumulated 13 history. This operator implement the optimization algorithm 14 in https://arxiv.org/abs/1803.02865 by Wu, Ward and Bottou. 15 Concretely, given inputs (param, grad, seq_b, learning_rate), 18 new_seq_b = seq_b + 1 / seq_b * norm(grad)^2 19 effective_lr = learning_rate / (new_seq_b + epsilon) 20 update = learning_rate * grad / (new_seq_b + epsilon) 21 new_param = param + update 22 and returns (new_param, new_seq_b). 24 Optionally returns effective_lr and update as well. 27 .Input(0, "param",
"Parameters to be updated")
28 .Input(1,
"seq_b",
"Seq_b history")
29 .Input(2,
"grad",
"Gradient computed")
30 .Input(3,
"lr",
"learning rate")
31 .Output(0,
"output_param",
"Updated parameters")
32 .Output(1,
"output_seq_b",
"Updated seq_b")
33 .Output(2,
"output_effective_lr",
"(optional) Effective learning rate")
34 .Output(3,
"output_update",
"(optional) Actual update that is applied.")
36 .Arg(
"epsilon",
"Default 1e-5");
38 REGISTER_CPU_OPERATOR(SparseWngrad, SparseWngradOp<float, CPUContext>);
39 OPERATOR_SCHEMA(SparseWngrad)
42 .EnforceOneToOneInplace()
45 This operator implement the optimization algorithm 46 in https://arxiv.org/abs/1803.02865 by Wu, Ward and Bottou. 47 Given inputs (param, seq_b, indices, grad, lr), runs the dense WnGrad 48 update on (param, grad, seq_b, lr), and returns (new_param, 49 new_seq_b) as in the dense case. 52 .Input(0, "param",
"Parameters to be updated")
53 .Input(1,
"seq_b",
"seq_b history")
54 .Input(2,
"indices",
"Sparse indices")
55 .Input(3,
"grad",
"Gradient computed")
56 .Input(4,
"lr",
"learning rate")
57 .Output(0,
"output_param",
"Updated parameters")
58 .Output(1,
"output_seq_b",
"Updated seq_b")
59 .Arg(
"epsilon",
"Default 1e-5");
61 SHOULD_NOT_DO_GRADIENT(Wngrad);
62 SHOULD_NOT_DO_GRADIENT(SparseWngrad);
A global dictionary that holds information about what Caffe2 modules have been loaded in the current ...