Weight Initializers

Weight initializer is an instance of Initializer that destructively edits the contents of numpy.ndarray or cupy.ndarray. Typically, weight initializers are passed to __init__ of Link and initializes its the weights and biases.

Base class

class chainer.initializer.Initializer(dtype=None)[source]

Initializes array.

It initializes the given array.

Variables:dtype – Data type specifier. It is for type check in __call__ function.
__call__(array)[source]

Initializes given array.

This method destructively changes the value of array. The derived class is required to implement this method. The algorithms used to make the new values depend on the concrete derived classes.

Parameters:array (numpy.ndarray or cupy.ndarray) – An array to be initialized by this initializer.

Concrete initializers

class chainer.initializers.Identity(scale=1.0, dtype=None)[source]

Initializes array with the identity matrix.

It initializes the given array with the constant multiple of the identity matrix. Note that arrays to be passed must be 2D squared matrices.

Variables:scale (scalar) – A constant to be multiplied to identity matrices.
class chainer.initializers.Constant(fill_value, dtype=None)[source]

Initializes array with constant value.

Variables:
  • fill_value (scalar or numpy.ndarray or cupy.ndarray) – A constant to be assigned to the initialized array. Broadcast is allowed on this assignment.
  • dtype – Data type specifier.
chainer.initializers.Zero(dtype=None)[source]

Returns initializer that initializes array with the all-zero array.

Parameters:dtype – Data type specifier.
Returns:An initialized array.
Return type:numpy.ndarray or cupy.ndarray
chainer.initializers.One(dtype=None)[source]

Returns initializer that initializes array with the all-one array.

Parameters:dtype – Data type specifier.
Returns:An initialized array.
Return type:numpy.ndarray or cupy.ndarray
class chainer.initializers.Normal(scale=0.05, dtype=None)[source]

Initializes array with a normal distribution.

Each element of the array is initialized by the value drawn independently from Gaussian distribution whose mean is 0, and standard deviation is scale.

Parameters:
  • scale (float) – Standard deviation of Gaussian distribution.
  • dtype – Data type specifier.
class chainer.initializers.GlorotNormal(scale=1.0, dtype=None)[source]

Initializes array with scaled Gaussian distribution.

Each element of the array is initialized by the value drawn independently from Gaussian distribution whose mean is 0, and standard deviation is \(scale \times \sqrt{\frac{2}{fan_{in} + fan_{out}}}\), where \(fan_{in}\) and \(fan_{out}\) are the number of input and output units, respectively.

Reference: Glorot & Bengio, AISTATS 2010

Parameters:
  • scale (float) – A constant that determines the scale of the standard deviation.
  • dtype – Data type specifier.
class chainer.initializers.HeNormal(scale=1.0, dtype=None)[source]

Initializes array with scaled Gaussian distribution.

Each element of the array is initialized by the value drawn independently from Gaussian distribution whose mean is 0, and standard deviation is \(scale \times \sqrt{\frac{2}{fan_{in}}}\), where \(fan_{in}\) is the number of input units.

Reference: He et al., https://arxiv.org/abs/1502.01852

Parameters:
  • scale (float) – A constant that determines the scale of the standard deviation.
  • dtype – Data type specifier.
class chainer.initializers.Orthogonal(scale=1.1, dtype=None)[source]

Initializes array with an orthogonal system.

This initializer first makes a matrix of the same shape as the array to be initialized whose elements are drawn independently from standard Gaussian distribution. Next, it applies Singular Value Decomposition (SVD) to the matrix. Then, it initializes the array with either side of resultant orthogonal matrices, depending on the shape of the input array. Finally, the array is multiplied by the constant scale.

If the ndim of the input array is more than 2, we consider the array to be a matrix by concatenating all axes except the first one.

The number of vectors consisting of the orthogonal system (i.e. first element of the shape of the array) must be equal to or smaller than the dimension of each vector (i.e. second element of the shape of the array).

Variables:
  • scale (float) – A constant to be multiplied by.
  • dtype – Data type specifier.

Reference: Saxe et al., https://arxiv.org/abs/1312.6120

class chainer.initializers.Uniform(scale=0.05, dtype=None)[source]

Initializes array with a scaled uniform distribution.

Each element of the array is initialized by the value drawn independently from uniform distribution \([-scale, scale]\).

Variables:
  • scale (float) – A constant that determines the scale of the uniform distribution.
  • dtype – Data type specifier.
class chainer.initializers.LeCunUniform(scale=1.0, dtype=None)[source]

Initializes array with a scaled uniform distribution.

Each element of the array is initialized by the value drawn independently from uniform distribution \([-s, s]\) where \(s = scale \times \sqrt{\frac{3}{fan_{in}}}\). Here \(fan_{in}\) is the number of input units.

Reference: LeCun 98, Efficient Backprop http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf

Variables:
  • scale (float) – A constant that determines the scale of the uniform distribution.
  • dtype – Data type specifier.
class chainer.initializers.GlorotUniform(scale=1.0, dtype=None)[source]

Initializes array with a scaled uniform distribution.

Each element of the array is initialized by the value drawn independently from uniform distribution \([-s, s]\) where \(s = scale \times \sqrt{\frac{6}{fan_{in} + fan_{out}}}\). Here, \(fan_{in}\) and fan_{out} are the number of input and output units, respectively.

Variables:
  • scale (float) – A constant that determines the scale of the uniform distribution.
  • dtype – Data type specifier.
class chainer.initializers.HeUniform(scale=1.0, dtype=None)[source]

Initializes array with scaled uniform distribution.

Each element of the array is initialized by the value drawn independently from uniform distribution \([-s, s]\) where \(s = scale \times \sqrt{\frac{6}{fan_{in}}}\). Here, \(fan_{in}\) is the number of input units.

Variables:
  • scale (float) – A constant that determines the scale of the uniform distribution.
  • dtype – Data type specifier.

Helper function

chainer.init_weight(weights, initializer, scale=1.0)[source]

Helper function for initialization of the weight tensor.

This function accepts several types of initializer, prepares the appropriate ~chainer.Initializer if necessary, and does the initialization.

Parameters:
  • weights (numpy.ndarray or cupy.ndarray) – Weight tensor to be initialized.
  • initializer – The value used to initialize the data. May be None (in which case HeNormal is used as an initializer), a scalar to set all values to, an numpy.ndarray to be assigned, or a callable that takes numpy.ndarray or cupy.ndarray and edits its value.
  • scale (scalar) – A constant to multiply initializer by.