chainer.functions.n_step_birnn¶

chainer.functions.n_step_birnn(n_layers, dropout_ratio, hx, ws, bs, xs, activation='tanh')[source]¶

Stacked Bi-directional RNN function for sequence inputs.

This function calculates stacked Bi-directional RNN with sequences. This function gets an initial hidden state \(h_0\), an initial cell state \(c_0\), an input sequence \(x\), weight matrices \(W\), and bias vectors \(b\). This function calculates hidden states \(h_t\) and \(c_t\) for each time \(t\) from input \(x_t\).

\[\begin{split}h^{f}_t &=& f(W^{f}_0 x_t + W^{f}_1 h_{t-1} + b^{f}_0 + b^{f}_1), \\ h^{b}_t &=& f(W^{b}_0 x_t + W^{b}_1 h_{t-1} + b^{b}_0 + b^{b}_1), \\ h_t &=& [h^{f}_t; h^{f}_t], \\\end{split}\]

where \(f\) is an activation function.

Weight matrices \(W\) contains two matrices \(W^{f}\) and \(W^{b}\). \(W^{f}\) is weight matrices for forward directional RNN. \(W^{b}\) is weight matrices for backward directional RNN.

\(W^{f}\) contains \(W^{f}_0\) for an input sequence and \(W^{f}_1\) for a hidden state. \(W^{b}\) contains \(W^{b}_0\) for an input sequence and \(W^{b}_1\) for a hidden state.

Bias matrices \(b\) contains two matrices \(b^{f}\) and \(b^{f}\). \(b^{f}\) contains \(b^{f}_0\) for an input sequence and \(b^{f}_1\) for a hidden state. \(b^{b}\) contains \(b^{b}_0\) for an input sequence and \(b^{b}_1\) for a hidden state.

As the function accepts a sequence, it calculates \(h_t\) for all \(t\) with one call. Two weight matrices and two bias vectors are required for each layer. So, when \(S\) layers exist, you need to prepare \(2S\) weight matrices and \(2S\) bias vectors.

If the number of layers n_layers is greather than \(1\), input of k-th layer is hidden state h_t of k-1-th layer. Note that all input variables except first layer may have different shape from the first layer.

Warning

train and use_cudnn arguments are not supported anymore since v2. Instead, use chainer.using_config('train', train) and chainer.using_config('use_cudnn', use_cudnn) respectively. See chainer.using_config().

Parameters:

n_layers (int) – Number of layers.
dropout_ratio (float) – Dropout ratio.
hx (chainer.Variable) – Variable holding stacked hidden states. Its shape is (2S, B, N) where S is number of layers and is equal to n_layers, B is mini-batch size, and N is dimension of hidden units. Because of bi-direction, the first dimension length is 2S.
ws (list of list of chainer.Variable) – Weight matrices. ws[i + di] represents weights for i-th layer. Note that di = 0 for forward-RNN and di = 1 for backward-RNN. Each ws[i + di] is a list containing two matrices. ws[i + di][j] is corresponding with W^{f}_j if di = 0 and corresponding with W^{b}_j if di = 1 in the equation. Only ws[0][j] and ws[1][j] where 0 <= j < 1 are (I, N) shape as they are multiplied with input variables. All other matrices has (N, N) shape.
bs (list of list of chainer.Variable) – Bias vectors. bs[i + di] represnents biases for i-th layer. Note that di = 0 for forward-RNN and di = 1 for backward-RNN. Each bs[i + di] is a list containing two vectors. bs[i + di][j] is corresponding with b^{f}_j if di = 0 and corresponding with b^{b}_j if di = 1 in the equation. Shape of each matrix is (N,) where N is dimension of hidden units.
xs (list of chainer.Variable) – A list of Variable holding input values. Each element xs[t] holds input value for time t. Its shape is (B_t, I), where B_t is mini-batch size for time t, and I is size of input units. Note that this function supports variable length sequences. When sequneces has different lengths, sort sequences in descending order by length, and transpose the sorted sequence. transpose_sequence() transpose a list of Variable() holding sequence. So xs needs to satisfy xs[t].shape[0] >= xs[t + 1].shape[0].
activation (str) – Activation function name. Please select tanh or relu.

Returns:

This function returns a tuple containing three elements, hy and ys.

hy is an updated hidden states whose shape is same as hx.
ys is a list of Variable . Each element ys[t] holds hidden states of the last layer corresponding to an input xs[t]. Its shape is (B_t, N) where B_t is mini-batch size for time t, and N is size of hidden units. Note that B_t is the same value as xs[t].

Return type:

tuple