chainer.functions.crf1d

chainer.functions.crf1d(cost, xs, ys, reduce='mean')[source]

Calculates negative log-likelihood of linear-chain CRF.

It takes a transition cost matrix, a sequence of costs, and a sequence of labels. Let \(c_{st}\) be a transition cost from a label \(s\) to a label \(t\), \(x_{it}\) be a cost of a label \(t\) at position \(i\), and \(y_i\) be an expected label at position \(i\). The negative log-likelihood of linear-chain CRF is defined as

\[L = -\left( \sum_{i=1}^l x_{iy_i} + \ \sum_{i=1}^{l-1} c_{y_i y_{i+1}} - {\log(Z)} \right) ,\]

where \(l\) is the length of the input sequence and \(Z\) is the normalizing constant called partition function.

Note

When you want to calculate the negative log-likelihood of sequences which have different lengths, sort the sequences in descending order of lengths and transpose the sequences. For example, you have three input sequences:

>>> a1 = a2 = a3 = a4 = np.random.uniform(-1, 1, 3).astype('f')
>>> b1 = b2 = b3 = np.random.uniform(-1, 1, 3).astype('f')
>>> c1 = c2 = np.random.uniform(-1, 1, 3).astype('f')
>>> a = [a1, a2, a3, a4]
>>> b = [b1, b2, b3]
>>> c = [c1, c2]

where a1 and all other variables are arrays with (K,) shape. Make a transpose of the sequences:

>>> x1 = np.stack([a1, b1, c1])
>>> x2 = np.stack([a2, b2, c2])
>>> x3 = np.stack([a3, b3])
>>> x4 = np.stack([a4])

and make a list of the arrays:

>>> xs = [x1, x2, x3, x4]

You need to make label sequences in the same fashion. And then, call the function:

>>> cost = chainer.Variable(
...     np.random.uniform(-1, 1, (3, 3)).astype('f'))
>>> ys = [np.zeros(x.shape[0:1], dtype='i') for x in xs]
>>> loss = F.crf1d(cost, xs, ys)

It calculates mean of the negative log-likelihood of the three sequences.

The output is a variable whose value depends on the value of the option reduce. If it is 'no', it holds the elementwise loss values. If it is 'mean', it holds mean of the loss values.

Parameters:
  • cost (Variable) – A \(K \times K\) matrix which holds transition cost between two labels, where \(K\) is the number of labels.
  • xs (list of Variable) – Input vector for each label. len(xs) denotes the length of the sequence, and each Variable holds a \(B \times K\) matrix, where \(B\) is mini-batch size, \(K\) is the number of labels. Note that \(B\) s in all the variables are not necessary the same, i.e., it accepts the input sequences with different lengths.
  • ys (list of Variable) – Expected output labels. It needs to have the same length as xs. Each Variable holds a \(B\) integer vector. When x in xs has the different \(B\), correspoding y has the same \(B\). In other words, ys must satisfy ys[i].shape == xs[i].shape[0:1] for all i.
  • reduce (str) – Reduction option. Its value must be either 'mean' or 'no'. Otherwise, ValueError is raised.
Returns:

A variable holding the average negative

log-likelihood of the input sequences.

Return type:

Variable