chainer.functions.softmax_cross_entropy¶
- chainer.functions.softmax_cross_entropy(x, t, normalize=True, cache_score=True, class_weight=None, ignore_label=- 1, reduce='mean', enable_double_backprop=False, soft_target_loss='cross-entropy')[source]¶
Computes cross entropy loss for pre-softmax activations.
- Parameters
x (
Variable
or N-dimensional array) – Variable holding a multidimensional array whose element indicates unnormalized log probability: the first axis of the variable represents the number of samples, and the second axis represents the number of classes. While this function computes a usual softmax cross entropy if the number of dimensions is equal to 2, it computes a cross entropy of the replicated softmax if the number of dimensions is greater than 2.t (
Variable
or N-dimensional array) – Variable holding a signed integer vector of ground truth labels. Ift[i] == ignore_label
, correspondingx[i]
is ignored. When the dtype is float, this function treatst
as an array holding probability distribution of labels, in other words, soft targets. In this case, the shape oft
must be the same as the shape ofx
. Note that the loss is calculated using cross entropy or KL divergence.normalize (bool) – If
True
, this function normalizes the cross entropy loss across all instances. IfFalse
, it only normalizes along a batch size.cache_score (bool) – When it is
True
, the function stores result of forward computation to use it on backward computation. It reduces computational cost though consumes more memory. Ifenable_double_backprop
option isTrue
, this option is forcibly turned off and the function does not cache the intermediate value.class_weight (N-dimensional array) – An array that contains constant weights that will be multiplied with the loss values along with the second dimension. The shape of this array should be
(x.shape[1],)
. If this is notNone
, each class weightclass_weight[i]
is actually multiplied toy[:, i]
that is the corresponding log-softmax output ofx
and has the same shape asx
before calculating the actual loss value.ignore_label (int) – Label value you want to ignore. Its default value is
-1
. See description of the argument t.reduce (str) – A string that determines whether to reduce the loss values. If it is
'mean'
, it computes the sum of the individual cross entropy and normalize it according tonormalize
option. If it is'no'
, this function computes cross entropy for each instance and does not normalize it (normalize
option is ignored). In this case, the loss value of the ignored instance, which hasignore_label
as its target value, is set to0
.enable_double_backprop (bool) – If
True
, this function uses implementation that supports higher order differentiation. IfFalse
, it uses single-backprop implementation. This function use the single-backprop version because we expect it is faster. So, if you need second or higher derivatives, you need to turn it on explicitly.soft_target_loss (str) – A string that determines what type of method is used to calculate soft target loss. If
'cross-entropy'
and'kl-divergence'
, cross-entropy and KL divergence are used for loss calculation.
- Returns
A variable holding a scalar array of the cross entropy loss. If
reduce
is'mean'
, it is a scalar array. Ifreduce
is'no'
, the shape is same as that oft
.- Return type
Note
This function is differentiable only by
x
.Example
>>> x = np.array([[-1, 0, 1, 2], [2, 0, 1, -1]]).astype(np.float32) >>> x array([[-1., 0., 1., 2.], [ 2., 0., 1., -1.]], dtype=float32) >>> t = np.array([3, 0]).astype(np.int32) >>> t array([3, 0], dtype=int32) >>> y = F.softmax_cross_entropy(x, t) >>> y variable(0.44018972) >>> log_softmax = -F.log_softmax(x) >>> expected_loss = np.mean([log_softmax[row, column].data for row, column in enumerate(t)]) >>> y.array == expected_loss True