chainer.functions.negative_sampling(x, t, W, sampler, sample_size, reduce='sum', *, return_samples=False)[source]

Negative sampling loss function.

In natural language processing, especially language modeling, the number of words in a vocabulary can be very large. Therefore, you need to spend a lot of time calculating the gradient of the embedding matrix.

By using the negative sampling trick you only need to calculate the gradient for a few sampled negative examples.

The loss is defined as follows.

\[f(x, p) = - \log \sigma(x^\top w_p) - \ k E_{i \sim P(i)}[\log \sigma(- x^\top w_i)]\]

where \(\sigma(\cdot)\) is a sigmoid function, \(w_i\) is the weight vector for the word \(i\), and \(p\) is a positive example. It is approximated with \(k\) examples \(N\) sampled from probability \(P(i)\).

\[f(x, p) \approx - \log \sigma(x^\top w_p) - \ \sum_{n \in N} \log \sigma(-x^\top w_n)\]

Each sample of \(N\) is drawn from the word distribution \(P(w) = \frac{1}{Z} c(w)^\alpha\), where \(c(w)\) is the unigram count of the word \(w\), \(\alpha\) is a hyper-parameter, and \(Z\) is the normalization constant.

  • x (Variable or N-dimensional array) – Batch of input vectors.

  • t (Variable or N-dimensional array) – Vector of ground truth labels.

  • W (Variable or N-dimensional array) – Weight matrix.

  • sampler (FunctionType) – Sampling function. It takes a shape and returns an integer array of the shape. Each element of this array is a sample from the word distribution. A WalkerAlias object built with the power distribution of word frequency is recommended.

  • sample_size (int) – Number of samples.

  • reduce (str) – Reduction option. Its value must be either 'sum' or 'no'. Otherwise, ValueError is raised.

  • return_samples (bool) – If True, the sample array is also returned. The sample array is a \((\text{batch_size}, \text{sample_size} + 1)\)-array of integers whose first column is fixed to the ground truth labels and the other columns are drawn from the sampler.


If return_samples is False (default), the output variable holding the loss value(s) calculated by the above equation is returned. Otherwise, a tuple of the output variable and the sample array is returned.

If reduce is 'no', the output variable holds array whose shape is same as one of (hence both of) input variables. If it is 'sum', the output variable holds a scalar value.

Return type

Variable or tuple

See: Distributed Representations of Words and Phrases and their Compositionality

See also

NegativeSampling to manage the model parameter W.