Implements Adamax algorithm (a variant of Adam based on infinity norm).
It has been proposed in `Adam: A Method for Stochastic Optimization`__.
Arguments:
params (iterable): iterable of parameters to optimize or dicts defining
parameter groups
lr (float, optional): learning rate (default: 2e-3)
betas (Tuple[float, float], optional): coefficients used for computing
running averages of gradient and its square
eps (float, optional): term added to the denominator to improve
numerical stability (default: 1e-8)
weight_decay (float, optional): weight decay (L2 penalty) (default: 0)
__ https://arxiv.org/abs/1412.6980
Definition at line 5 of file adamax.py.