Weight Initializers¶
Weight initializer is an instance of Initializer
that
destructively edits the contents of numpy.ndarray
or cupy.ndarray
.
Typically, weight initializers are passed to __init__
of Link
and initializes its the weights and biases.
Base class¶
-
class
chainer.initializer.
Initializer
(dtype=None)[source]¶ Initializes array.
It initializes the given array.
Variables: dtype – Data type specifier. It is for type check in __call__
function.-
__call__
(array)[source]¶ Initializes given array.
This method destructively changes the value of array. The derived class is required to implement this method. The algorithms used to make the new values depend on the concrete derived classes.
Parameters: array (numpy.ndarray or cupy.ndarray) – An array to be initialized by this initializer.
-
Concrete initializers¶
-
class
chainer.initializers.
Identity
(scale=1.0, dtype=None)[source]¶ Initializes array with the identity matrix.
It initializes the given array with the constant multiple of the identity matrix. Note that arrays to be passed must be 2D squared matrices.
Variables: scale (scalar) – A constant to be multiplied to identity matrices.
-
class
chainer.initializers.
Constant
(fill_value, dtype=None)[source]¶ Initializes array with constant value.
Variables: - fill_value (scalar or numpy.ndarray or cupy.ndarray) – A constant to be assigned to the initialized array. Broadcast is allowed on this assignment.
- dtype – Data type specifier.
-
chainer.initializers.
Zero
(dtype=None)[source]¶ Returns initializer that initializes array with the all-zero array.
Parameters: dtype – Data type specifier. Returns: An initialized array. Return type: numpy.ndarray or cupy.ndarray
-
chainer.initializers.
One
(dtype=None)[source]¶ Returns initializer that initializes array with the all-one array.
Parameters: dtype – Data type specifier. Returns: An initialized array. Return type: numpy.ndarray or cupy.ndarray
-
class
chainer.initializers.
Normal
(scale=0.05, dtype=None)[source]¶ Initializes array with a normal distribution.
Each element of the array is initialized by the value drawn independently from Gaussian distribution whose mean is 0, and standard deviation is
scale
.Parameters: - scale (float) – Standard deviation of Gaussian distribution.
- dtype – Data type specifier.
-
class
chainer.initializers.
GlorotNormal
(scale=1.0, dtype=None)[source]¶ Initializes array with scaled Gaussian distribution.
Each element of the array is initialized by the value drawn independently from Gaussian distribution whose mean is 0, and standard deviation is \(scale \times \sqrt{\frac{2}{fan_{in} + fan_{out}}}\), where \(fan_{in}\) and \(fan_{out}\) are the number of input and output units, respectively.
Reference: Glorot & Bengio, AISTATS 2010
Parameters: - scale (float) – A constant that determines the scale of the standard deviation.
- dtype – Data type specifier.
-
class
chainer.initializers.
HeNormal
(scale=1.0, dtype=None)[source]¶ Initializes array with scaled Gaussian distribution.
Each element of the array is initialized by the value drawn independently from Gaussian distribution whose mean is 0, and standard deviation is \(scale \times \sqrt{\frac{2}{fan_{in}}}\), where \(fan_{in}\) is the number of input units.
Reference: He et al., https://arxiv.org/abs/1502.01852
Parameters: - scale (float) – A constant that determines the scale of the standard deviation.
- dtype – Data type specifier.
-
class
chainer.initializers.
Orthogonal
(scale=1.1, dtype=None)[source]¶ Initializes array with an orthogonal system.
This initializer first makes a matrix of the same shape as the array to be initialized whose elements are drawn independently from standard Gaussian distribution. Next, it applies Singular Value Decomposition (SVD) to the matrix. Then, it initializes the array with either side of resultant orthogonal matrices, depending on the shape of the input array. Finally, the array is multiplied by the constant
scale
.If the
ndim
of the input array is more than 2, we consider the array to be a matrix by concatenating all axes except the first one.The number of vectors consisting of the orthogonal system (i.e. first element of the shape of the array) must be equal to or smaller than the dimension of each vector (i.e. second element of the shape of the array).
Variables: Reference: Saxe et al., https://arxiv.org/abs/1312.6120
-
class
chainer.initializers.
Uniform
(scale=0.05, dtype=None)[source]¶ Initializes array with a scaled uniform distribution.
Each element of the array is initialized by the value drawn independently from uniform distribution \([-scale, scale]\).
Variables:
-
class
chainer.initializers.
LeCunUniform
(scale=1.0, dtype=None)[source]¶ Initializes array with a scaled uniform distribution.
Each element of the array is initialized by the value drawn independently from uniform distribution \([-s, s]\) where \(s = scale \times \sqrt{\frac{3}{fan_{in}}}\). Here \(fan_{in}\) is the number of input units.
Reference: LeCun 98, Efficient Backprop http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf
Variables:
-
class
chainer.initializers.
GlorotUniform
(scale=1.0, dtype=None)[source]¶ Initializes array with a scaled uniform distribution.
Each element of the array is initialized by the value drawn independently from uniform distribution \([-s, s]\) where \(s = scale \times \sqrt{\frac{6}{fan_{in} + fan_{out}}}\). Here, \(fan_{in}\) and fan_{out} are the number of input and output units, respectively.
Variables:
-
class
chainer.initializers.
HeUniform
(scale=1.0, dtype=None)[source]¶ Initializes array with scaled uniform distribution.
Each element of the array is initialized by the value drawn independently from uniform distribution \([-s, s]\) where \(s = scale \times \sqrt{\frac{6}{fan_{in}}}\). Here, \(fan_{in}\) is the number of input units.
Variables:
Helper function¶
-
chainer.
init_weight
(weights, initializer, scale=1.0)[source]¶ Helper function for initialization of the weight tensor.
This function accepts several types of initializer, prepares the appropriate
~chainer.Initializer
if necessary, and does the initialization.Parameters: - weights (numpy.ndarray or cupy.ndarray) – Weight tensor to be initialized.
- initializer – The value used to initialize the data.
May be
None
(in which caseHeNormal
is used as an initializer), a scalar to set all values to, annumpy.ndarray
to be assigned, or a callable that takesnumpy.ndarray
orcupy.ndarray
and edits its value. - scale (scalar) – A constant to multiply initializer by.