chainer.functions.batch_normalization¶

chainer.functions.
batch_normalization
(x, gamma, beta, eps=2e5, running_mean=None, running_var=None, decay=0.9, axis=None)[source]¶ Batch normalization function.
It takes the input variable
x
and two parameter variablesgamma
andbeta
. The parameter variables must both have the same dimensionality, which is referred to as the channel shape. This channel shape corresponds to the dimensions in the input which are not averaged over. Since the first dimension of the input corresponds to the batch size, the second dimension ofx
will correspond to the first dimension of the channel shape, the third dimension ofx
will correspond to the second channel dimension (if it exists) and so on. Therefore, the dimensionality of the input must be at least one plus the number of channel dimensions. The total effective “batch size” will then be considered to be the product of all dimensions inx
except for the channel dimensions.As an example, if the input is four dimensional and the parameter variables are one dimensional, then it is assumed that the first dimension of the input is the batch size, the second dimension is the channel size, and the remaining two dimensions are considered to be spatial dimensions that will be averaged over along with the batch size in the batch normalization computations. That is, the total batch size will be considered to be the product of all input dimensions except the second dimension.
Warning
train
argument is not supported anymore since v2. Instead, usechainer.using_config('train', train)
. Seechainer.using_config()
.Parameters:  x (
Variable
ornumpy.ndarray
orcupy.ndarray
) – Input variable.  gamma (
Variable
ornumpy.ndarray
orcupy.ndarray
) – Scaling parameter of normalized data.  beta (
Variable
ornumpy.ndarray
orcupy.ndarray
) – Shifting parameter of scaled normalized data.  eps (float) – Epsilon value for numerical stability.
 running_mean (numpy.ndarray or cupy.ndarray) – Running average of the mean. This is a running average of
the mean over several minibatches using the decay parameter.
The function takes a previous running average, and updates
the array inplace by the new running average.
If
None
, the running average is not computed. If this isNone
, thenrunnng_var
must also beNone
.  running_var (numpy.ndarray or cupy.ndarray) – Running average of the variance. This is a running average of
the variance over several minibatches using the decay parameter.
The function takes a previous running average, and updates
the array inplace by the new running average.
If
None
, the running average is not computed. If this isNone
, thenrunning_mean
must also beNone
.  decay (float) – Decay rate of moving average. It is used during training.
 axis (int, tuple of int or None) – Axis over which normalization is
performed. When axis is
None
, it is determined from input dimensions. For example, ifx.ndim
is 4, axis becomes (0, 2, 3) and normalization is performed over 0th, 2nd and 3rd axis of input. If it is 2, axis becomes (0) and normalization is performed over 0th axis of input. When a tuple of int is given to this option, numbers in the tuple must be being sorted in ascending order. For example, (0, 2) is OK, but (2, 0) is not.
See: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
See also
 x (