chainer.functions.crf1d¶
-
chainer.functions.
crf1d
(cost, xs, ys, reduce='mean')[source]¶ Calculates negative log-likelihood of linear-chain CRF.
It takes a transition cost matrix, a sequence of costs, and a sequence of labels. Let \(c_{st}\) be a transition cost from a label \(s\) to a label \(t\), \(x_{it}\) be a cost of a label \(t\) at position \(i\), and \(y_i\) be an expected label at position \(i\). The negative log-likelihood of linear-chain CRF is defined as
\[L = -\left( \sum_{i=1}^l x_{iy_i} + \ \sum_{i=1}^{l-1} c_{y_i y_{i+1}} - {\log(Z)} \right) ,\]where \(l\) is the length of the input sequence and \(Z\) is the normalizing constant called partition function.
Note
When you want to calculate the negative log-likelihood of sequences which have different lengths, sort the sequences in descending order of lengths and transpose the sequences. For example, you have three input sequences:
>>> a1 = a2 = a3 = a4 = np.random.uniform(-1, 1, 3).astype(np.float32) >>> b1 = b2 = b3 = np.random.uniform(-1, 1, 3).astype(np.float32) >>> c1 = c2 = np.random.uniform(-1, 1, 3).astype(np.float32)
>>> a = [a1, a2, a3, a4] >>> b = [b1, b2, b3] >>> c = [c1, c2]
where
a1
and all other variables are arrays with(K,)
shape. Make a transpose of the sequences:>>> x1 = np.stack([a1, b1, c1]) >>> x2 = np.stack([a2, b2, c2]) >>> x3 = np.stack([a3, b3]) >>> x4 = np.stack([a4])
and make a list of the arrays:
>>> xs = [x1, x2, x3, x4]
You need to make label sequences in the same fashion. And then, call the function:
>>> cost = chainer.Variable( ... np.random.uniform(-1, 1, (3, 3)).astype(np.float32)) >>> ys = [np.zeros(x.shape[0:1], dtype=np.int32) for x in xs] >>> loss = F.crf1d(cost, xs, ys)
It calculates mean of the negative log-likelihood of the three sequences.
The output is a variable whose value depends on the value of the option
reduce
. If it is'no'
, it holds the elementwise loss values. If it is'mean'
, it holds mean of the loss values.Parameters: - cost (Variable) – A \(K \times K\) matrix which holds transition cost between two labels, where \(K\) is the number of labels.
- xs (list of Variable) – Input vector for each label.
len(xs)
denotes the length of the sequence, and eachVariable
holds a \(B \times K\) matrix, where \(B\) is mini-batch size, \(K\) is the number of labels. Note that \(B\)s in all the variables are not necessary the same, i.e., it accepts the input sequences with different lengths. - ys (list of Variable) – Expected output labels. It needs to have the
same length as
xs
. EachVariable
holds a \(B\) integer vector. Whenx
inxs
has the different \(B\), correspodingy
has the same \(B\). In other words,ys
must satisfyys[i].shape == xs[i].shape[0:1]
for alli
. - reduce (str) – Reduction option. Its value must be either
'mean'
or'no'
. Otherwise,ValueError
is raised.
Returns: A variable holding the average negative log-likelihood of the input sequences.
Return type: Note
See detail in the original paper: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data.