Optimizers

class chainer.optimizers.AdaDelta(rho=0.95, eps=1e-06)[source]

Zeiler’s ADADELTA.

See: http://www.matthewzeiler.com/pubs/googleTR2012/googleTR2012.pdf

Parameters:
  • rho (float) – Exponential decay rate of the first and second order moments.
  • eps (float) – Small value for the numerical stability.
class chainer.optimizers.AdaGrad(lr=0.001, eps=1e-08)[source]

AdaGrad optimizer.

See: http://jmlr.org/papers/v12/duchi11a.html

Parameters:
  • lr (float) – Learning rate.
  • eps (float) – Small value for the numerical stability.
class chainer.optimizers.Adam(alpha=0.001, beta1=0.9, beta2=0.999, eps=1e-08)[source]

Adam optimizer.

See: http://arxiv.org/abs/1412.6980v8

Parameters:
  • alpha (float) – Step size.
  • beta1 (float) – Exponential decay rate of the first order moment.
  • beta2 (float) – Exponential decay rate of the second order moment.
  • eps (float) – Small value for the numerical stability.
class chainer.optimizers.MomentumSGD(lr=0.01, momentum=0.9)[source]

Momentum SGD optimizer.

Parameters:
  • lr (float) – Learning rate.
  • momentum (float) – Exponential decay rate of the first order moment.
class chainer.optimizers.NesterovAG(lr=0.01, momentum=0.9)[source]

Nesterov’s Accelerated Gradient.

See: http://arxiv.org/abs/1212.0901

Parameters:
  • lr (float) – Learning rate.
  • momentum (float) – Exponential decay rate of the first order moment.
class chainer.optimizers.RMSprop(lr=0.01, alpha=0.99, eps=1e-08)[source]

RMSprop optimizer.

See: T. Tieleman and G. Hinton (2012). Lecture 6.5 - rmsprop, COURSERA: Neural Networks for Machine Learning.

Parameters:
  • lr (float) – Learning rate.
  • alpha (float) – Exponential decay rate of the second order moment.
  • eps (float) – Small value for the numerical stability.
class chainer.optimizers.RMSpropGraves(lr=0.0001, alpha=0.95, momentum=0.9, eps=0.0001)[source]

Alex Graves’s RMSprop.

See: http://arxiv.org/abs/1308.0850

Parameters:
  • lr (float) – Learning rate.
  • alpha (float) – Exponential decay rate of the first and second order moments of the raw gradient.
  • momentum (float) – Exponential decay rate of the first order moment of the adjusted gradient.
  • eps (float) – Small value for the numerical stability.
class chainer.optimizers.SGD(lr=0.01)[source]

Vanilla Stochastic Gradient Descent.

Parameters:lr (float) – Learning rate.
class chainer.optimizers.SMORMS3(lr=0.001, eps=1e-16)[source]

Simon Funk’s SMORMS3.

See http://sifter.org/~simon/journal/20150420.html.

Parameters:
  • lr (float) – Learning rate.
  • eps (float) – Small value for the numerical stability.