chainer.functions.black_out(x, t, W, samples, reduce='mean')[source]

BlackOut loss function.

BlackOut loss function is defined as

\[-\log(p(t)) - \sum_{s \in S} \log(1 - p(s)),\]

where \(t\) is the correct label, \(S\) is a set of negative examples and \(p(\cdot)\) is likelihood of a given label. And, \(p\) is defined as

\[p(y) = \frac{\exp(W_y^\top x)}{ \sum_{s \in samples} \exp(W_s^\top x)}.\]

The output is a variable whose value depends on the value of the option reduce. If it is 'no', it holds the no loss values. If it is 'mean', this function takes a mean of loss values.

  • x (Variable) – Batch of input vectors. Its shape should be \((N, D)\).
  • t (Variable) – Vector of ground truth labels. Its shape should be \((N,)\). Each elements \(v\) should satisfy \(0 \geq v \geq V\) or \(-1\) where \(V\) is the number of label types.
  • W (Variable) – Weight matrix. Its shape should be \((V, D)\)
  • samples (Variable) – Negative samples. Its shape should be \((N, S)\) where \(S\) is the number of negative samples.
  • reduce (str) – Reduction option. Its value must be either 'no' or 'mean'. Otherwise, ValueError is raised.

A variable object holding loss value(s). If reduce is 'no', the output variable holds an array whose shape is \((N,)\) . If it is 'mean', it holds a scalar.

Return type:


See: BlackOut: Speeding up Recurrent Neural Network Language Models With Very Large Vocabularies

See also