chainer.functions.negative_sampling¶
-
chainer.functions.negative_sampling(x, t, W, sampler, sample_size, reduce='sum')[source]¶ Negative sampling loss function.
In natural language processing, especially language modeling, the number of words in a vocabulary can be very large. Therefore, you need to spend a lot of time calculating the gradient of the embedding matrix.
By using the negative sampling trick you only need to calculate the gradient for a few sampled negative examples.
The loss is defined as follows.
\[f(x, p) = - \log \sigma(x^\top w_p) - \ k E_{i \sim P(i)}[\log \sigma(- x^\top w_i)]\]where \(\sigma(\cdot)\) is a sigmoid function, \(w_i\) is the weight vector for the word \(i\), and \(p\) is a positive example. It is approximated with \(k\) examples \(N\) sampled from probability \(P(i)\).
\[f(x, p) \approx - \log \sigma(x^\top w_p) - \ \sum_{n \in N} \log \sigma(-x^\top w_n)\]Each sample of \(N\) is drawn from the word distribution \(P(w) = \frac{1}{Z} c(w)^\alpha\), where \(c(w)\) is the unigram count of the word \(w\), \(\alpha\) is a hyper-parameter, and \(Z\) is the normalization constant.
Parameters: - x (
Variableor N-dimensional array) – Batch of input vectors. - t (
Variableor N-dimensional array) – Vector of ground truth labels. - W (
Variableor N-dimensional array) – Weight matrix. - sampler (FunctionType) – Sampling function. It takes a shape and
returns an integer array of the shape. Each element of this array
is a sample from the word distribution.
A
WalkerAliasobject built with the power distribution of word frequency is recommended. - sample_size (int) – Number of samples.
- reduce (str) – Reduction option. Its value must be either
'sum'or'no'. Otherwise,ValueErroris raised.
Returns: A variable holding the loss value(s) calculated by the above equation. If
reduceis'no', the output variable holds array whose shape is same as one of (hence both of) input variables. If it is'sum', the output variable holds a scalar value.Return type: See: Distributed Representations of Words and Phrases and their Compositionality
See also
- x (