Optimizers

class chainer.optimizers.AdaDelta(rho=0.95, eps=1e-06)[source]

Zeiler’s ADADELTA.

See: http://www.matthewzeiler.com/pubs/googleTR2012/googleTR2012.pdf

class chainer.optimizers.AdaGrad(lr=0.001, eps=1e-08)[source]

AdaGrad implementation.

See: http://jmlr.org/papers/v12/duchi11a.html

class chainer.optimizers.Adam(alpha=0.001, beta1=0.9, beta2=0.999, eps=1e-08)[source]

Adam optimization algorithm.

See: http://arxiv.org/abs/1412.6980v8

class chainer.optimizers.MomentumSGD(lr=0.01, momentum=0.9)[source]

Classical momentum SGD.

class chainer.optimizers.NesterovAG(lr=0.01, momentum=0.9)[source]

Nesterov’s Accelerated Gradient.

Formulated as the linear combination coefficients of the velocity and gradient contributions at each iteration.

See: http://arxiv.org/abs/1212.0901

class chainer.optimizers.RMSprop(lr=0.01, alpha=0.99, eps=1e-08)[source]

Hinton’s RMSprop.

class chainer.optimizers.RMSpropGraves(lr=0.0001, alpha=0.95, momentum=0.9, eps=0.0001)[source]

Alex Graves’s RMSprop.

See http://arxiv.org/abs/1308.0850

class chainer.optimizers.SGD(lr=0.01)[source]

Vanilla Stochastic Gradient Descent.

class chainer.optimizers.SMORMS3(lr=0.001, eps=1e-16)[source]

Simon Funk’s SMORMS3.

See http://sifter.org/~simon/journal/20150420.html.