chainer.functions.n_step_bilstm¶
-
chainer.functions.n_step_bilstm(n_layers, dropout_ratio, hx, cx, ws, bs, xs)[source]¶ Stacked Bi-directional Long Short-Term Memory function.
This function calculates stacked Bi-directional LSTM with sequences. This function gets an initial hidden state \(h_0\), an initial cell state \(c_0\), an input sequence \(x\), weight matrices \(W\), and bias vectors \(b\). This function calculates hidden states \(h_t\) and \(c_t\) for each time \(t\) from input \(x_t\).
\[\begin{split}i^{f}_t &=& \sigma(W^{f}_0 x_t + W^{f}_4 h_{t-1} + b^{f}_0 + b^{f}_4), \\ f^{f}_t &=& \sigma(W^{f}_1 x_t + W^{f}_5 h_{t-1} + b^{f}_1 + b^{f}_5), \\ o^{f}_t &=& \sigma(W^{f}_2 x_t + W^{f}_6 h_{t-1} + b^{f}_2 + b^{f}_6), \\ a^{f}_t &=& \tanh(W^{f}_3 x_t + W^{f}_7 h_{t-1} + b^{f}_3 + b^{f}_7), \\ c^{f}_t &=& f^{f}_t \cdot c^{f}_{t-1} + i^{f}_t \cdot a^{f}_t, \\ h^{f}_t &=& o^{f}_t \cdot \tanh(c^{f}_t), \\ i^{b}_t &=& \sigma(W^{b}_0 x_t + W^{b}_4 h_{t-1} + b^{b}_0 + b^{b}_4), \\ f^{b}_t &=& \sigma(W^{b}_1 x_t + W^{b}_5 h_{t-1} + b^{b}_1 + b^{b}_5), \\ o^{b}_t &=& \sigma(W^{b}_2 x_t + W^{b}_6 h_{t-1} + b^{b}_2 + b^{b}_6), \\ a^{b}_t &=& \tanh(W^{b}_3 x_t + W^{b}_7 h_{t-1} + b^{b}_3 + b^{b}_7), \\ c^{b}_t &=& f^{b}_t \cdot c^{b}_{t-1} + i^{b}_t \cdot a^{b}_t, \\ h^{b}_t &=& o^{b}_t \cdot \tanh(c^{b}_t), \\ h_t &=& [h^{f}; h^{b}]\end{split}\]where \(W^{f}\) is weight matrices for forward-LSTM, \(W^{b}\) is weight matrices for backward-LSTM.
As the function accepts a sequence, it calculates \(h_t\) for all \(t\) with one call. Eight weight matrices and eight bias vectors are required for each layers. So, when \(S\) layers exists, you need to prepare \(8S\) weigth matrices and \(8S\) bias vectors.
If the number of layers
n_layersis greather than \(1\), input ofk-th layer is hidden stateh_tofk-1-th layer. Note that all input variables except first layer may have different shape from the first layer.Warning
trainanduse_cudnnarguments are not supported anymore since v2. Instead, usechainer.using_config('train', train)andchainer.using_config('use_cudnn', use_cudnn)respectively. Seechainer.using_config().Parameters: - n_layers (int) – Number of layers.
- dropout_ratio (float) – Dropout ratio.
- hx (chainer.Variable) – Variable holding stacked hidden states.
Its shape is
(S, B, N)whereSis number of layers and is equal ton_layers,Bis mini-batch size, andNis dimention of hidden units. - cx (chainer.Variable) – Variable holding stacked cell states.
It has the same shape as
hx. - ws (list of list of chainer.Variable) – Weight matrices.
ws[i]represents weights for i-th layer. Eachws[i]is a list containing eight matrices.ws[i][j]is corresponding withW_jin the equation. Onlyws[0][j]where0 <= j < 4is(I, N)shape as they are multiplied with input variables. All other matrices has(N, N)shape. - bs (list of list of chainer.Variable) – Bias vectors.
bs[i]represnents biases for i-th layer. Eachbs[i]is a list containing eight vectors.bs[i][j]is corresponding withb_jin the equation. Shape of each matrix is(N,)whereNis dimention of hidden units. - xs (list of chainer.Variable) – A list of
Variableholding input values. Each elementxs[t]holds input value for timet. Its shape is(B_t, I), whereB_tis mini-batch size for timet, andIis size of input units. Note that this functions supports variable length sequences. When sequneces has different lengths, sort sequences in descending order by length, and transpose the sorted sequence.transpose_sequence()transpose a list ofVariable()holding sequence. Soxsneeds to satisfyxs[t].shape[0] >= xs[t + 1].shape[0].
Returns: - This functions returns a tuple concaining three elements,
hy,cyandys.hyis an updated hidden states whose shape is same ashx.cyis an updated cell states whose shape is same ascx.ysis a list ofVariable. Each elementys[t]holds hidden states of the last layer corresponding to an inputxs[t]. Its shape is(B_t, N)whereB_tis mini-batch size for timet, andNis size of hidden units. Note thatB_tis the same value asxs[t].
Return type: