Standard Link implementations¶
Chainer provides many Link
implementations in the
chainer.links
package.
Note
Some of the links are originally defined in the chainer.functions
namespace. They are still left in the namespace for backward compatibility,
though it is strongly recommended to use them via the chainer.links
package.
Learnable connections¶
Bias¶

class
chainer.links.
Bias
(axis=1, shape=None)[source]¶ Broadcasted elementwise summation with learnable parameters.
Computes a elementwise summation as
bias()
function does except that its second input is a learnable bias parameter \(b\) the link has.Parameters:  axis (int) – The first axis of the first input of
bias()
function along which its second input is applied.  shape (tuple of ints) – Shape of the learnable bias parameter. If
None
, this link does not have learnable parameters so an explicit bias needs to be given to its__call__
method’s second input.
See also
See
bias()
for details.Variables: b (Variable) – Bias parameter if shape
is given. Otherwise, no attributes. axis (int) – The first axis of the first input of
Bilinear¶

class
chainer.links.
Bilinear
(left_size, right_size, out_size, nobias=False, initialW=None, initial_bias=None)[source]¶ Bilinear layer that performs tensor multiplication.
Bilinear is a primitive link that wraps the
bilinear()
functions. It holds parametersW
,V1
,V2
, andb
corresponding to the arguments ofbilinear()
.Parameters:  left_size (int) – Dimension of input vector \(e^1\) (\(J\))
 right_size (int) – Dimension of input vector \(e^2\) (\(K\))
 out_size (int) – Dimension of output vector \(y\) (\(L\))
 nobias (bool) – If
True
, parametersV1
,V2
, andb
are omitted.  initialW (3D array) – Initial value of \(W\).
Shape of this argument must be
(left_size, right_size, out_size)
. IfNone
, the default initializer is used. May also be a callable that takesnumpy.ndarray
orcupy.ndarray
and edits its value.  initial_bias (tuple) – Initial values of \(V^1\), \(V^2\) and
\(b\). The length of this argument must be 3.
Each element of this tuple must have the shapes of
(left_size, out_size)
,(right_size, out_size)
, and(out_size,)
, respectively. IfNone
, \(V^1\) and \(V^2\) are initialized by the default initializer and \(b\) is set to \(0\). May also be a tuple of callables that takenumpy.ndarray
orcupy.ndarray
and edit its value.
See also
See
chainer.functions.bilinear()
for details.Variables:
Convolution2D¶

class
chainer.links.
Convolution2D
(self, in_channels, out_channels, ksize=None, stride=1, pad=0, nobias=False, initialW=None, initial_bias=None)[source]¶ Twodimensional convolutional layer.
This link wraps the
convolution_2d()
function and holds the filter weight and bias vector as parameters.The output of this function can be nondeterministic when it uses cuDNN. If
chainer.configuration.config.deterministic
isTrue
and cuDNN version is >= v3, it forces cuDNN to use a deterministic algorithm.Warning
deterministic
argument is not supported anymore since v2. Instead, usechainer.using_config('cudnn_deterministic', value
(value is eitherTrue
orFalse
). Seechainer.using_config()
.Parameters:  in_channels (int or None) – Number of channels of input arrays.
If
None
, parameter initialization will be deferred until the first forward data pass at which time the size will be determined.  out_channels (int) – Number of channels of output arrays.
 ksize (int or pair of ints) – Size of filters (a.k.a. kernels).
ksize=k
andksize=(k, k)
are equivalent.  stride (int or pair of ints) – Stride of filter applications.
stride=s
andstride=(s, s)
are equivalent.  pad (int or pair of ints) – Spatial padding width for input arrays.
pad=p
andpad=(p, p)
are equivalent.  nobias (bool) – If
True
, then this link does not use the bias term.  initialW (4D array) – Initial weight value. If
None
, the default initializer is used. May also be a callable that takesnumpy.ndarray
orcupy.ndarray
and edits its value.  initial_bias (1D array) – Initial bias value. If
None
, the bias is set to 0. May also be a callable that takesnumpy.ndarray
orcupy.ndarray
and edits its value.
See also
See
chainer.functions.convolution_2d()
for the definition of twodimensional convolution.Variables: Example
There are several ways to make a Convolution2D link.
Let an input vector
x
be:>>> x = np.arange(1 * 3 * 10 * 10, dtype='f').reshape(1, 3, 10, 10)
Give the first three arguments explicitly:
>>> l = L.Convolution2D(3, 7, 5) >>> y = l(x) >>> y.shape (1, 7, 6, 6)
Omit
in_channels
or fill it withNone
:The below two cases are the same.
>>> l = L.Convolution2D(7, 5) >>> y = l(x) >>> y.shape (1, 7, 6, 6)
>>> l = L.Convolution2D(None, 7, 5) >>> y = l(x) >>> y.shape (1, 7, 6, 6)
When you omit the first argument, you need to specify the other subsequent arguments from
stride
as keyword auguments. So the below two cases are the same.>>> l = L.Convolution2D(7, 5, stride=1, pad=0) >>> y = l(x) >>> y.shape (1, 7, 6, 6)
>>> l = L.Convolution2D(None, 7, 5, 1, 0) >>> y = l(x) >>> y.shape (1, 7, 6, 6)
 in_channels (int or None) – Number of channels of input arrays.
If
ConvolutionND¶

class
chainer.links.
ConvolutionND
(ndim, in_channels, out_channels, ksize, stride=1, pad=0, nobias=False, initialW=None, initial_bias=None, cover_all=False)[source]¶ Ndimensional convolution layer.
This link wraps the
convolution_nd()
function and holds the filter weight and bias vector as parameters.Parameters:  ndim (int) – Number of spatial dimensions.
 in_channels (int) – Number of channels of input arrays.
 out_channels (int) – Number of channels of output arrays.
 ksize (int or tuple of ints) – Size of filters (a.k.a. kernels).
ksize=k
andksize=(k, k, ..., k)
are equivalent.  stride (int or tuple of ints) – Stride of filter application.
stride=s
andstride=(s, s, ..., s)
are equivalent.  pad (int or tuple of ints) – Spatial padding width for input arrays.
pad=p
andpad=(p, p, ..., p)
are equivalent.  nobias (bool) – If
True
, then this function does not use the bias.  initialW (array) – Initial weight array. If
None
, the default initializer is used. May be a callable that takesnumpy.ndarray
orcupy.ndarray
and edits its value.  initial_bias (array) – Initial bias vector. If
None
, the bias is set to zero. May be a callable that takesnumpy.ndarray
orcupy.ndarray
and edits its value.  cover_all (bool) – If
True
, all spatial locations are convoluted into some output pixels. It may make the output size larger.cover_all
needs to beFalse
if you want to use cuDNN.
See also
See
convolution_nd()
for the definition of Ndimensional convolution. Seeconvolution_2d()
for the definition of twodimensional convolution.Variables:
Deconvolution2D¶

class
chainer.links.
Deconvolution2D
(self, in_channels, out_channels, ksize=None, stride=1, pad=0, nobias=False, outsize=None, initialW=None, initial_bias=None)[source]¶ Two dimensional deconvolution function.
This link wraps the
deconvolution_2d()
function and holds the filter weight and bias vector as parameters.Warning
deterministic
argument is not supported anymore since v2. Instead, usechainer.using_config('cudnn_deterministic', value)
(value is eitherTrue
orFalse
). Seechainer.using_config()
.Parameters:  in_channels (int or None) – Number of channels of input arrays.
If
None
, parameter initialization will be deferred until the first forward data pass at which time the size will be determined.  out_channels (int) – Number of channels of output arrays.
 ksize (int or pair of ints) – Size of filters (a.k.a. kernels).
ksize=k
andksize=(k, k)
are equivalent.  stride (int or pair of ints) – Stride of filter applications.
stride=s
andstride=(s, s)
are equivalent.  pad (int or pair of ints) – Spatial padding width for input arrays.
pad=p
andpad=(p, p)
are equivalent.  nobias (bool) – If
True
, then this function does not use the bias term.  outsize (tuple) – Expected output size of deconvolutional operation.
It should be pair of height and width \((out_H, out_W)\).
Default value is
None
and the outsize is estimated by input size, stride and pad.  initialW (4D array) – Initial weight value. If
None
, the default initializer is used. May also be a callable that takesnumpy.ndarray
orcupy.ndarray
and edits its value.  initial_bias (1D array) – Initial bias value. If
None
, the bias vector is set to zero. May also be a callable that takesnumpy.ndarray
orcupy.ndarray
and edits its value.
The filter weight has four dimensions \((c_I, c_O, k_H, k_W)\) which indicate the number of input channels, output channels, height and width of the kernels, respectively. The filter weight is initialized with i.i.d. Gaussian random samples, each of which has zero mean and deviation \(\sqrt{1/(c_I k_H k_W)}\) by default.
The bias vector is of size \(c_O\). Its elements are initialized by
bias
argument. Ifnobias
argument is set to True, then this function does not hold the bias parameter.The output of this function can be nondeterministic when it uses cuDNN. If
chainer.configuration.config.cudnn_deterministic
isTrue
and cuDNN version is >= v3, it forces cuDNN to use a deterministic algorithm.See also
See
chainer.functions.deconvolution_2d()
for the definition of twodimensional convolution.See also
See
chainer.links.Convolution2D()
for the examples of ways to give arguments to this link.Example
There are several ways to make a Deconvolution2D link.
Let an input vector
x
be:>>> x = np.arange(1 * 3 * 10 * 10, dtype='f').reshape(1, 3, 10, 10)
Give the first three arguments explicitly:
In this case, all the other arguments are set to the default values.
>>> l = L.Deconvolution2D(3, 7, 4) >>> y = l(x) >>> y.shape (1, 7, 13, 13)
Omit
in_channels
or fill it withNone
:The below two cases are the same.
>>> l = L.Deconvolution2D(7, 4) >>> y = l(x) >>> y.shape (1, 7, 13, 13)
>>> l = L.Deconvolution2D(None, 7, 4) >>> y = l(x) >>> y.shape (1, 7, 13, 13)
When you omit the first argument, you need to specify the other subsequent arguments from
stride
as keyword arguments. So the below two cases are the same.>>> l = L.Deconvolution2D(None, 7, 4, 2, 1) >>> y = l(x) >>> y.shape (1, 7, 20, 20)
>>> l = L.Deconvolution2D(7, 4, stride=2, pad=1) >>> y = l(x) >>> y.shape (1, 7, 20, 20)
 in_channels (int or None) – Number of channels of input arrays.
If
DeconvolutionND¶

class
chainer.links.
DeconvolutionND
(ndim, in_channels, out_channels, ksize, stride=1, pad=0, nobias=False, outsize=None, initialW=None, initial_bias=None)[source]¶ Ndimensional deconvolution function.
This link wraps
deconvolution_nd()
function and holds the filter weight and bias vector as its parameters.Parameters:  ndim (int) – Number of spatial dimensions.
 in_channels (int) – Number of channels of input arrays.
 out_channels (int) – Number of channels of output arrays.
 ksize (int or tuple of ints) – Size of filters (a.k.a. kernels).
ksize=k
andksize=(k, k, ..., k)
are equivalent.  stride (int or tuple of ints) – Stride of filter application.
stride=s
andstride=(s, s, ..., s)
are equivalent.  pad (int or tuple of ints) – Spatial padding width for input arrays.
pad=p
andpad=(p, p, ..., p)
are equivalent.  nobias (bool) – If
True
, then this function does not use the bias.  outsize (tuple of ints) – Expected output size of deconvolutional
operation. It should be a tuple of ints that represents the output
size of each dimension. Default value is
None
and the outsize is estimated with input size, stride and pad.  initialW (array) – Initial weight array. If
None
, the default initializer is used. May be an initializer instance of another value the same with thatinit_weight()
function can take.  initial_bias (array) – Initial bias array. If
None
, the bias vector is set to zero. May be an initializer instance of another value the same with thatinit_weight()
function can take.
See also
Variables:
DepthwiseConvolution2D¶

class
chainer.links.
DepthwiseConvolution2D
(in_channels, channel_multiplier, ksize, stride=1, pad=0, nobias=False, initialW=None, initial_bias=None)[source]¶ Twodimensional depthwise convolutional layer.
This link wraps the
depthwise_convolution_2d()
function and holds the filter weight and bias vector as parameters.Parameters:  in_channels (int) – Number of channels of input arrays. If
None
, parameter initialization will be deferred until the first forward data pass at which time the size will be determined.  channel_multiplier (int) – Channel multiplier number. Number of output
arrays equal
in_channels * channel_multiplier
.  ksize (int or pair of ints) – Size of filters (a.k.a. kernels).
ksize=k
andksize=(k, k)
are equivalent.  stride (int or pair of ints) – Stride of filter applications.
stride=s
andstride=(s, s)
are equivalent.  pad (int or pair of ints) – Spatial padding width for input arrays.
pad=p
andpad=(p, p)
are equivalent.  nobias (bool) – If
True
, then this link does not use the bias term.  initialW (4D array) – Initial weight value. If
None
, the default initializer is used. May also be a callable that takesnumpy.ndarray
orcupy.ndarray
and edits its value.  initial_bias (1D array) – Initial bias value. If
None
, the bias is set to 0. May also be a callable that takesnumpy.ndarray
orcupy.ndarray
and edits its value.
See also
Variables:  in_channels (int) – Number of channels of input arrays. If
DilatedConvolution2D¶

class
chainer.links.
DilatedConvolution2D
(in_channels, out_channels, ksize=None, stride=1, pad=0, dilate=1, nobias=False, initialW=None, initial_bias=None)[source]¶ Twodimensional dilated convolutional layer.
This link wraps the
dilated_convolution_2d()
function and holds the filter weight and bias vector as parameters.Parameters:  in_channels (int or None) – Number of channels of input arrays.
If
None
, parameter initialization will be deferred until the first forward data pass at which time the size will be determined.  out_channels (int) – Number of channels of output arrays.
 ksize (int or pair of ints) – Size of filters (a.k.a. kernels).
ksize=k
andksize=(k, k)
are equivalent.  stride (int or pair of ints) – Stride of filter applications.
stride=s
andstride=(s, s)
are equivalent.  pad (int or pair of ints) – Spatial padding width for input arrays.
pad=p
andpad=(p, p)
are equivalent.  dilate (int or pair of ints) – Dilation factor of filter applications.
dilate=d
anddilate=(d, d)
are equivalent.  nobias (bool) – If
True
, then this link does not use the bias term.  initialW (4D array) – Initial weight value. If
None
, the defaul initializer is used. May also be a callable that takesnumpy.ndarray
orcupy.ndarray
and edits its value.  initial_bias (1D array) – Initial bias value. If
None
, the default initializer is used. May also be a callable that takesnumpy.ndarray
orcupy.ndarray
and edits its value.
See also
See
chainer.functions.dilated_convolution_2d()
for the definition of twodimensional dilated convolution.Variables: Example
There are several ways to make a DilatedConvolution2D link.
Let an input vector
x
be:>>> x = np.arange(1 * 3 * 10 * 10, dtype='f').reshape(1, 3, 10, 10)
Give the first three arguments explicitly:
>>> l = L.DilatedConvolution2D(3, 7, 5) >>> y = l(x) >>> y.shape (1, 7, 6, 6)
Omit
in_channels
or fill it withNone
:The below two cases are the same.
>>> l = L.DilatedConvolution2D(7, 5) >>> y = l(x) >>> y.shape (1, 7, 6, 6)
>>> l = L.DilatedConvolution2D(None, 7, 5) >>> y = l(x) >>> y.shape (1, 7, 6, 6)
When you omit the first argument, you need to specify the other subsequent arguments from
stride
as keyword auguments. So the below two cases are the same.>>> l = L.DilatedConvolution2D(None, 7, 5, 1, 0, 2) >>> y = l(x) >>> y.shape (1, 7, 2, 2)
>>> l = L.DilatedConvolution2D(7, 5, stride=1, pad=0, dilate=2) >>> y = l(x) >>> y.shape (1, 7, 2, 2)
 in_channels (int or None) – Number of channels of input arrays.
If
EmbedID¶

class
chainer.links.
EmbedID
(in_size, out_size, initialW=None, ignore_label=None)[source]¶ Efficient linear layer for onehot input.
This is a link that wraps the
embed_id()
function. This link holds the ID (word) embedding matrixW
as a parameter.Parameters:  in_size (int) – Number of different identifiers (a.k.a. vocabulary size).
 out_size (int) – Size of embedding vector.
 initialW (2D array) – Initial weight value. If
None
, then the matrix is initialized from the standard normal distribution. May also be a callable that takesnumpy.ndarray
orcupy.ndarray
and edits its value.  ignore_label (int or None) – If
ignore_label
is an int value,i
th column of return value is filled with0
.
See also
Variables: W (Variable) – Embedding parameter matrix.
GRU¶

class
chainer.links.
GRU
(in_size, out_size, init=None, inner_init=None, bias_init=0)[source]¶ Stateful Gated Recurrent Unit function (GRU)
This is an alias of “~chainer.links.StatefulGRU”. Its documented API is identical to the function.
Warning
In Chainer v1,
GRU
was stateless, as opposed to the current implementation. To align with the naming convension of LSTM links, we have changed the naming convension from Chainer v2 so that the shorthand name points the stateful links. You can useStatelessGRU
for stateless version, whose implementation is identical tochainer.linksGRU
in v1.See issue #2537 <https://github.com/pfnet/chainer/issues/2537>_ for detail.
See also
Highway¶

class
chainer.links.
Highway
(in_out_size, nobias=False, activate=<function relu>, init_Wh=None, init_Wt=None, init_bh=None, init_bt=1)[source]¶ Highway module.
In highway network, two gates are added to the ordinal nonlinear transformation (\(H(x) = activate(W_h x + b_h)\)). One gate is the transform gate \(T(x) = \sigma(W_t x + b_t)\), and the other is the carry gate \(C(x)\). For simplicity, the author defined \(C = 1  T\). Highway module returns \(y\) defined as
\[y = activate(W_h x + b_h) \odot \sigma(W_t x + b_t) + x \odot(1  \sigma(W_t x + b_t))\]The output array has the same spatial size as the input. In order to satisfy this, \(W_h\) and \(W_t\) must be square matrices.
Parameters:  in_out_size (int) – Dimension of input and output vectors.
 nobias (bool) – If
True
, then this function does not use the bias.  activate – Activation function of plain array. \(tanh\) is also available.
 init_Wh (2D array) – Initial weight value of plain array.
If
None
, the default initializer is used. May also be a callable that takesnumpy.ndarray
orcupy.ndarray
and edits its value.  init_bh (1D array) – Initial bias value of plain array. If
None
, then \(b_h\) is initialized to zero. May also be a callable that takesnumpy.ndarray
orcupy.ndarray
and edits its value.  init_Wt (2D array) – Initial weight value of transform array.
If
None
, the default initializer is used. May also be a callable that takesnumpy.ndarray
orcupy.ndarray
and edits its value.  init_bt (1D array) – Initial bias value of transform array.
Default value is 1 vector.
May also be a callable that takes
numpy.ndarray
orcupy.ndarray
and edits its value. Negative value is recommended by the author of the paper. (e.g. 1, 3, ...).
 See:
 Highway Networks.
Inception¶

class
chainer.links.
Inception
(in_channels, out1, proj3, out3, proj5, out5, proj_pool, conv_init=None, bias_init=None)[source]¶ Inception module of GoogLeNet.
It applies four different functions to the input array and concatenates their outputs along the channel dimension. Three of them are 2D convolutions of sizes 1x1, 3x3 and 5x5. Convolution paths of 3x3 and 5x5 sizes have 1x1 convolutions (called projections) ahead of them. The other path consists of 1x1 convolution (projection) and 3x3 max pooling.
The output array has the same spatial size as the input. In order to satisfy this, Inception module uses appropriate padding for each convolution and pooling.
See: Going Deeper with Convolutions.
Parameters:  in_channels (int or None) – Number of channels of input arrays.
 out1 (int) – Output size of 1x1 convolution path.
 proj3 (int) – Projection size of 3x3 convolution path.
 out3 (int) – Output size of 3x3 convolution path.
 proj5 (int) – Projection size of 5x5 convolution path.
 out5 (int) – Output size of 5x5 convolution path.
 proj_pool (int) – Projection size of max pooling path.
 conv_init – A callable that takes
numpy.ndarray
orcupy.ndarray
and edits its value. It is used for initialization of the convolution matrix weights. Maybe beNone
to use default initialization.  bias_init – A callable that takes
numpy.ndarray
orcupy.ndarray
and edits its value. It is used for initialization of the convolution bias weights. Maybe beNone
to use default initialization.

__call__
(x)[source]¶ Computes the output of the Inception module.
Parameters: x (Variable) – Input variable. Returns: Output variable. Its array has the same spatial size and the same minibatch size as the input array. The channel dimension has size out1 + out3 + out5 + proj_pool
.Return type: Variable
InceptionBN¶

class
chainer.links.
InceptionBN
(in_channels, out1, proj3, out3, proj33, out33, pooltype, proj_pool=None, stride=1, conv_init=None, dtype=<type 'numpy.float32'>)[source]¶ Inception module of the new GoogLeNet with BatchNormalization.
This chain acts like
Inception
, while InceptionBN uses theBatchNormalization
on top of each convolution, the 5x5 convolution path is replaced by two consecutive 3x3 convolution applications, and the pooling method is configurable.See: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift.
Parameters:  in_channels (int or None) – Number of channels of input arrays.
 out1 (int) – Output size of the 1x1 convolution path.
 proj3 (int) – Projection size of the single 3x3 convolution path.
 out3 (int) – Output size of the single 3x3 convolution path.
 proj33 (int) – Projection size of the double 3x3 convolutions path.
 out33 (int) – Output size of the double 3x3 convolutions path.
 pooltype (str) – Pooling type. It must be either
'max'
or'avg'
.  proj_pool (int or None) – Projection size in the pooling path. If
None
, no projection is done.  stride (int) – Stride parameter of the last convolution of each path.
 conv_init – A callable that takes
numpy.ndarray
orcupy.ndarray
and edits its value. It is used for initialization of the convolution matrix weights. Maybe beNone
to use default initialization.  dtype (numpy.dtype) – Type to use in
~batch_normalization.BatchNormalization
.
See also
Linear¶

class
chainer.links.
Linear
(in_size, out_size=None, nobias=False, initialW=None, initial_bias=None)[source]¶ Linear layer (a.k.a. fullyconnected layer).
This is a link that wraps the
linear()
function, and holds a weight matrixW
and optionally a bias vectorb
as parameters.The weight matrix
W
is initialized with i.i.d. Gaussian samples, each of which has zero mean and deviation \(\sqrt{1/\text{in_size}}\). The bias vectorb
is of sizeout_size
. Each element is initialized with thebias
value. Ifnobias
argument is set to True, then this link does not hold a bias vector.Parameters:  in_size (int or None) – Dimension of input vectors. If
None
, parameter initialization will be deferred until the first forward data pass at which time the size will be determined.  out_size (int) – Dimension of output vectors.
 nobias (bool) – If
True
, then this function does not use the bias.  initialW (2D array) – Initial weight value. If
None
, then the default initializer is used. May also be a callable that takesnumpy.ndarray
orcupy.ndarray
and edits its value.  initial_bias (1D array) – Initial bias value. If
None
, the bias vector is initialized to zero. May also be a callable that takesnumpy.ndarray
orcupy.ndarray
and edits its value.
See also
Variables: Example
There are several ways to make a Linear link.
Define an input vector
x
as:>>> x = np.array([[0, 1, 2, 3, 4]], 'f')
Give the first two arguments explicitly:
Those numbers are considered as the input size and the output size.
>>> l = L.Linear(5, 10) >>> y = l(x) >>> y.shape (1, 10)
 Omit
in_size
(give the output size only as the first argument) or fill it with
None
:In this case, the size of second axis of
x
is used as the input size. So the below two cases are the same.>>> l = L.Linear(10) >>> y = l(x) >>> y.shape (1, 10)
>>> l = L.Linear(None, 10) >>> y = l(x) >>> y.shape (1, 10)
When you omit the first argument, you need to specify the other subsequent arguments from
nobias
as keyword arguments. So the below two cases are the same.>>> l = L.Linear(None, 10, False, None, 0) >>> y = l(x) >>> y.shape (1, 10)
>>> l = L.Linear(10, nobias=False, initialW=None, initial_bias=0) >>> y = l(x) >>> y.shape (1, 10)
 Omit
 in_size (int or None) – Dimension of input vectors. If
LSTM¶

class
chainer.links.
LSTM
(in_size, out_size=None, **kwargs)[source]¶ Fullyconnected LSTM layer.
This is a fullyconnected LSTM layer as a chain. Unlike the
lstm()
function, which is defined as a stateless activation function, this chain holds upward and lateral connections as child links.It also maintains states, including the cell state and the output at the previous time step. Therefore, it can be used as a stateful LSTM.
This link supports variable length inputs. The minibatch size of the current input must be equal to or smaller than that of the previous one. The minibatch size of
c
andh
is determined as that of the first inputx
. When minibatch size ofi
th input is smaller than that of the previous input, this link only updatesc[0:len(x)]
andh[0:len(x)]
and doesn’t change the rest ofc
andh
. So, please sort input sequences in descending order of lengths before applying the function.Parameters:  in_size (int) – Dimension of input vectors. If it is
None
or omitted, parameter initialization will be deferred until the first forward data pass at which time the size will be determined.  out_size (int) – Dimensionality of output vectors.
 lateral_init – A callable that takes
numpy.ndarray
orcupy.ndarray
and edits its value. It is used for initialization of the lateral connections. May beNone
to use default initialization.  upward_init – A callable that takes
numpy.ndarray
orcupy.ndarray
and edits its value. It is used for initialization of the upward connections. May beNone
to use default initialization.  bias_init – A callable that takes
numpy.ndarray
orcupy.ndarray
and edits its value It is used for initialization of the biases of cell input, input gate and output gate.and gates of the upward connection. May be a scalar, in that case, the bias is initialized by this value. If it isNone
, the cellinput bias is initialized to zero.  forget_bias_init – A callable that takes
numpy.ndarray
orcupy.ndarray
and edits its value It is used for initialization of the biases of the forget gate of the upward connection. May be a scalar, in that case, the bias is initialized by this value. If it isNone
, the forget bias is initialized to one.
Variables:  in_size (int) – Dimension of input vectors. If it is
MLPConvolution2D¶

class
chainer.links.
MLPConvolution2D
(self, in_channels, out_channels, ksize=None, stride=1, pad=0, activation=relu.relu, conv_init=None, bias_init=None)[source]¶ Twodimensional MLP convolution layer of Network in Network.
This is an “mlpconv” layer from the Network in Network paper. This layer is a twodimensional convolution layer followed by 1x1 convolution layers and interleaved activation functions.
Note that it does not apply the activation function to the output of the last 1x1 convolution layer.
Parameters:  in_channels (int or None) – Number of channels of input arrays.
If it is
None
or omitted, parameter initialization will be deferred until the first forward data pass at which time the size will be determined.  out_channels (tuple of ints) – Tuple of number of channels. The ith integer indicates the number of filters of the ith convolution.
 ksize (int or pair of ints) – Size of filters (a.k.a. kernels) of the
first convolution layer.
ksize=k
andksize=(k, k)
are equivalent.  stride (int or pair of ints) – Stride of filter applications at the
first convolution layer.
stride=s
andstride=(s, s)
are equivalent.  pad (int or pair of ints) – Spatial padding width for input arrays at
the first convolution layer.
pad=p
andpad=(p, p)
are equivalent.  activation (function) – Activation function for internal hidden units. Note that this function is not applied to the output of this link.
 conv_init – An initializer of weight matrices passed to the convolution layers. This option must be specified as a keyword argument.
 bias_init – An initializer of bias vectors passed to the convolution layers. This option must be specified as a keyword argument.
See: Network in Network.
Variables: activation (function) – Activation function.  in_channels (int or None) – Number of channels of input arrays.
If it is
NStepBiGRU¶

class
chainer.links.
NStepBiGRU
(self, n_layers, in_size, out_size, dropout)[source]¶ Stacked Bidirectional GRU for sequnces.
This link is stacked version of Bidirectional GRU for sequences. It calculates hidden and cell states of all layer at endofstring, and all hidden states of the last layer for each time.
Unlike
chainer.functions.n_step_bigru()
, this function automatically sort inputs in descending order by length, and transpose the seuqnece. Users just need to call the link with a list ofchainer.Variable
holding sequences.Warning
use_cudnn
argument is not supported anymore since v2. Instead, usechainer.using_config('use_cudnn', use_cudnn)
. Seechainer.using_config()
.Parameters: See also
NStepBiLSTM¶

class
chainer.links.
NStepBiLSTM
(self, n_layers, in_size, out_size, dropout)[source]¶ Stacked Bidirectional LSTM for sequnces.
This link is stacked version of Bidirectional LSTM for sequences. It calculates hidden and cell states of all layer at endofstring, and all hidden states of the last layer for each time.
Unlike
chainer.functions.n_step_bilstm()
, this function automatically sort inputs in descending order by length, and transpose the seuqnece. Users just need to call the link with a list ofchainer.Variable
holding sequences.Warning
use_cudnn
argument is not supported anymore since v2. Instead, usechainer.using_config('use_cudnn', use_cudnn)
. Seechainer.using_config()
.Parameters: See also
NStepBiRNNReLU¶

class
chainer.links.
NStepBiRNNReLU
(self, n_layers, in_size, out_size, dropout)[source]¶ Stacked Bidirectional RNN for sequnces.
This link is stacked version of Bidirectional RNN for sequences. Note that the activation function is
relu
. It calculates hidden and cell states of all layer at endofstring, and all hidden states of the last layer for each time.Unlike
chainer.functions.n_step_birnn()
, this function automatically sort inputs in descending order by length, and transpose the seuqnece. Users just need to call the link with a list ofchainer.Variable
holding sequences.Warning
use_cudnn
argument is not supported anymore since v2. Instead, usechainer.using_config('use_cudnn', use_cudnn)
. Seechainer.using_config()
.Parameters: See also
NStepBiRNNTanh¶

class
chainer.links.
NStepBiRNNTanh
(self, n_layers, in_size, out_size, dropout)[source]¶ Stacked Bidirectional RNN for sequnces.
This link is stacked version of Bidirectional RNN for sequences. Note that the activation function is
tanh
. It calculates hidden and cell states of all layer at endofstring, and all hidden states of the last layer for each time.Unlike
chainer.functions.n_step_birnn()
, this function automatically sort inputs in descending order by length, and transpose the seuqnece. Users just need to call the link with a list ofchainer.Variable
holding sequences.Warning
use_cudnn
argument is not supported anymore since v2. Instead, usechainer.using_config('use_cudnn', use_cudnn)
. Seechainer.using_config()
.Parameters: See also
NStepGRU¶

class
chainer.links.
NStepGRU
(self, n_layers, in_size, out_size, dropout)[source]¶ Stacked Unidirectional GRU for sequnces.
This link is stacked version of Unidirectional GRU for sequences. It calculates hidden and cell states of all layer at endofstring, and all hidden states of the last layer for each time.
Unlike
chainer.functions.n_step_gru()
, this function automatically sort inputs in descending order by length, and transpose the seuqnece. Users just need to call the link with a list ofchainer.Variable
holding sequences.Warning
use_cudnn
argument is not supported anymore since v2. Instead, usechainer.using_config('use_cudnn', use_cudnn)
. Seechainer.using_config()
.Parameters: See also
NStepLSTM¶

class
chainer.links.
NStepLSTM
(self, n_layers, in_size, out_size, dropout)[source]¶ Stacked Unidirectional LSTM for sequnces.
This link is stacked version of Unidirectional LSTM for sequences. It calculates hidden and cell states of all layer at endofstring, and all hidden states of the last layer for each time.
Unlike
chainer.functions.n_step_lstm()
, this function automatically sort inputs in descending order by length, and transpose the seuqnece. Users just need to call the link with a list ofchainer.Variable
holding sequences.Warning
use_cudnn
argument is not supported anymore since v2. Instead, usechainer.using_config('use_cudnn', use_cudnn)
. Seechainer.using_config()
.Parameters: See also
NStepRNNReLU¶

class
chainer.links.
NStepRNNReLU
(self, n_layers, in_size, out_size, dropout)[source]¶ Stacked Unidirectional RNN for sequnces.
This link is stacked version of Unidirectional RNN for sequences. Note that the activation function is
relu
. It calculates hidden and cell states of all layer at endofstring, and all hidden states of the last layer for each time.Unlike
chainer.functions.n_step_rnn()
, this function automatically sort inputs in descending order by length, and transpose the seuqnece. Users just need to call the link with a list ofchainer.Variable
holding sequences.Warning
use_cudnn
argument is not supported anymore since v2. Instead, usechainer.using_config('use_cudnn', use_cudnn)
. Seechainer.using_config()
.Parameters: See also
NStepRNNTanh¶

class
chainer.links.
NStepRNNTanh
(self, n_layers, in_size, out_size, dropout)[source]¶ Stacked Unidirectional RNN for sequnces.
This link is stacked version of Unidirectional RNN for sequences. Note that the activation function is
tanh
. It calculates hidden and cell states of all layer at endofstring, and all hidden states of the last layer for each time.Unlike
chainer.functions.n_step_rnn()
, this function automatically sort inputs in descending order by length, and transpose the seuqnece. Users just need to call the link with a list ofchainer.Variable
holding sequences.Warning
use_cudnn
argument is not supported anymore since v2. Instead, usechainer.using_config('use_cudnn', use_cudnn)
. Seechainer.using_config()
.Parameters: See also
Scale¶

class
chainer.links.
Scale
(axis=1, W_shape=None, bias_term=False, bias_shape=None)[source]¶ Broadcasted elementwise product with learnable parameters.
Computes a elementwise product as
scale()
function does except that its second input is a learnable weight parameter \(W\) the link has.Parameters:  axis (int) – The first axis of the first input of
scale()
function along which its second input is applied.  W_shape (tuple of ints) – Shape of learnable weight parameter. If
None
, this link does not have learnable weight parameter so an explicit weight needs to be given to its__call__
method’s second input.  bias_term (bool) – Whether to also learn a bias (equivalent to Scale link + Bias link).
 bias_shape (tuple of ints) – Shape of learnable bias. If
W_shape
isNone
, this should be given to determine the shape. Otherwise, the bias has the same shapeW_shape
with the weight parameter andbias_shape
is ignored.
See also
See
scale()
for details.Variables:  axis (int) – The first axis of the first input of
StatefulGRU¶

class
chainer.links.
StatefulGRU
(in_size, out_size, init=None, inner_init=None, bias_init=0)[source]¶ Stateful Gated Recurrent Unit function (GRU).
Stateful GRU function has six parameters \(W_r\), \(W_z\), \(W\), \(U_r\), \(U_z\), and \(U\). All these parameters are \(n \times n\) matrices, where \(n\) is the dimension of hidden vectors.
Given input vector \(x\), Stateful GRU returns the next hidden vector \(h'\) defined as
\[\begin{split}r &=& \sigma(W_r x + U_r h), \\ z &=& \sigma(W_z x + U_z h), \\ \bar{h} &=& \tanh(W x + U (r \odot h)), \\ h' &=& (1  z) \odot h + z \odot \bar{h},\end{split}\]where \(h\) is current hidden vector.
As the name indicates,
StatefulGRU
is stateful, meaning that it also holds the next hidden vector h’ as a state. UseGRU
as a stateless version of GRU.Parameters:  in_size (int) – Dimension of input vector \(x\).
 out_size (int) – Dimension of hidden vector \(h\).
 init – Initializer for GRU’s input units (\(W\)).
It is a callable that takes
numpy.ndarray
orcupy.ndarray
and edits its value. If it isNone
, the default initializer is used.  inner_init – Initializer for the GRU’s inner
recurrent units (\(U\)).
It is a callable that takes
numpy.ndarray
orcupy.ndarray
and edits its value. If it isNone
, the default initializer is used.  bias_init – Bias initializer.
It is a callable that takes
numpy.ndarray
orcupy.ndarray
and edits its value. IfNone
, the bias is set to zero.
Variables: h (Variable) – Hidden vector that indicates the state of
StatefulGRU
.See also
GRU
StatefulPeepholeLSTM¶

class
chainer.links.
StatefulPeepholeLSTM
(in_size, out_size)[source]¶ Fullyconnected LSTM layer with peephole connections.
This is a fullyconnected LSTM layer with peephole connections as a chain. Unlike the
LSTM
link, this chain holdspeep_i
,peep_f
andpeep_o
as child links besidesupward
andlateral
.Given a input vector \(x\), Peephole returns the next hidden vector \(h'\) defined as
\[\begin{split}a &=& \tanh(upward x + lateral h), \\ i &=& \sigma(upward x + lateral h + peep_i c), \\ f &=& \sigma(upward x + lateral h + peep_f c), \\ c' &=& a \odot i + f \odot c, \\ o &=& \sigma(upward x + lateral h + peep_o c'), \\ h' &=& o \tanh(c'),\end{split}\]where \(\sigma\) is the sigmoid function, \(\odot\) is the elementwise product, \(c\) is the current cell state, \(c'\) is the next cell state and \(h\) is the current hidden vector.
Parameters: Variables:  upward (Linear) – Linear layer of upward connections.
 lateral (Linear) – Linear layer of lateral connections.
 peep_i (Linear) – Linear layer of peephole connections to the input gate.
 peep_f (Linear) – Linear layer of peephole connections to the forget gate.
 peep_o (Linear) – Linear layer of peephole connections to the output gate.
 c (Variable) – Cell states of LSTM units.
 h (Variable) – Output at the current time step.
StatelessLSTM¶

class
chainer.links.
StatelessLSTM
(in_size, out_size=None, lateral_init=None, upward_init=None, bias_init=0, forget_bias_init=1)[source]¶ Stateless LSTM layer.
This is a fullyconnected LSTM layer as a chain. Unlike the
lstm()
function, this chain holds upward and lateral connections as child links. This link doesn’t keep cell and hidden states.Parameters: Variables:  upward (chainer.links.Linear) – Linear layer of upward connections.
 lateral (chainer.links.Linear) – Linear layer of lateral connections.
Example
There are several ways to make a StatelessLSTM link.
Let a twodimensional input array \(x\), a cell state array \(h\), and the output array of the previous step \(h\) be:
>>> x = np.zeros((1, 10), dtype='f') >>> c = np.zeros((1, 20), dtype='f') >>> h = np.zeros((1, 20), dtype='f')
Give both
in_size
andout_size
arguments:>>> l = L.StatelessLSTM(10, 20) >>> c_new, h_new = l(c, h, x) >>> c_new.shape (1, 20) >>> h_new.shape (1, 20)
Omit
in_size
argument or fill it withNone
:The below two cases are the same.
>>> l = L.StatelessLSTM(20) >>> c_new, h_new = l(c, h, x) >>> c_new.shape (1, 20) >>> h_new.shape (1, 20)
>>> l = L.StatelessLSTM(None, 20) >>> c_new, h_new = l(c, h, x) >>> c_new.shape (1, 20) >>> h_new.shape (1, 20)
Activation/loss/normalization functions with parameters¶
BatchNormalization¶

class
chainer.links.
BatchNormalization
(size, decay=0.9, eps=2e05, dtype=<type 'numpy.float32'>, use_gamma=True, use_beta=True, initial_gamma=None, initial_beta=None)[source]¶ Batch normalization layer on outputs of linear or convolution functions.
This link wraps the
batch_normalization()
andfixed_batch_normalization()
functions.It runs in three modes: training mode, finetuning mode, and testing mode.
In training mode, it normalizes the input by batch statistics. It also maintains approximated population statistics by moving averages, which can be used for instant evaluation in testing mode.
In finetuning mode, it accumulates the input to compute population statistics. In order to correctly compute the population statistics, a user must use this mode to feed minibatches running through whole training dataset.
In testing mode, it uses precomputed population statistics to normalize the input variable. The population statistics is approximated if it is computed by training mode, or accurate if it is correctly computed by finetuning mode.
Parameters:  size (int or tuple of ints) – Size (or shape) of channel dimensions.
 decay (float) – Decay rate of moving average. It is used on training.
 eps (float) – Epsilon value for numerical stability.
 dtype (numpy.dtype) – Type to use in computing.
 use_gamma (bool) – If
True
, use scaling parameter. Otherwise, use unit(1) which makes no effect.  use_beta (bool) – If
True
, use shifting parameter. Otherwise, use unit(0) which makes no effect.
See: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Variables:  gamma (Variable) – Scaling parameter.
 beta (Variable) – Shifting parameter.
 avg_mean (Variable) – Population mean.
 avg_var (Variable) – Population variance.
 N (int) – Count of batches given for finetuning.
 decay (float) – Decay rate of moving average. It is used on training.
 eps (float) – Epsilon value for numerical stability. This value is added to the batch variances.

__call__
(self, x, finetune=False)[source]¶ Invokes the forward propagation of BatchNormalization.
In training mode, the BatchNormalization computes moving averages of mean and variance for evaluatino during training, and normalizes the input using batch statistics.
Warning
test
argument is not supported anymore since v2. Instead, usechainer.using_config('train', train)
. Seechainer.using_config()
.Parameters:
LayerNormalization¶

class
chainer.links.
LayerNormalization
(size=None, eps=1e06, initial_gamma=None, initial_beta=None)[source]¶ Layer normalization layer on outputs of linear functions.
This link implements a “layer normalization” layer which normalizes the input units by statistics that are computed along the second axis, scales and shifts them. Parameter initialization will be deferred until the first forward data pass at which time the size will be determined.
Parameters:  size (int) – Size of input units. If
None
, parameter initialization will be deferred until the first forward data pass at which time the size will be determined.  eps (float) – Epsilon value for numerical stability of normalization.
 initial_gamma (Initializer) – Initializer for scaling vector.
If
None
, then the vector is filled by 1. If a scalar, the vector is filled by it. Ifnumpy.ndarray
, the vector is set by it.  initial_beta (Initializer) – Initializer for shifting vector.
If
None
, then the vector is filled by 0. If a scalar, the vector is filled by it. Ifnumpy.ndarray
, the vector is set by it.
Variables:  gamma (Parameter) – Scaling parameter.
 beta (Parameter) – Shifting parameter.
 eps (float) – Epsilon value for numerical stability.
See: Layer Normalization
 size (int) – Size of input units. If
BinaryHierarchicalSoftmax¶

class
chainer.links.
BinaryHierarchicalSoftmax
(in_size, tree)[source]¶ Hierarchical softmax layer over binary tree.
In natural language applications, vocabulary size is too large to use softmax loss. Instead, the hierarchical softmax uses product of sigmoid functions. It costs only \(O(\log(n))\) time where \(n\) is the vocabulary size in average.
At first a user need to prepare a binary tree whose each leaf is corresponding to a word in a vocabulary. When a word \(x\) is given, exactly one path from the root of the tree to the leaf of the word exists. Let \(\mbox{path}(x) = ((e_1, b_1), \dots, (e_m, b_m))\) be the path of \(x\), where \(e_i\) is an index of \(i\)th internal node, and \(b_i \in \{1, 1\}\) indicates direction to move at \(i\)th internal node (1 is left, and 1 is right). Then, the probability of \(x\) is given as below:
\[\begin{split}P(x) &= \prod_{(e_i, b_i) \in \mbox{path}(x)}P(b_i  e_i) \\ &= \prod_{(e_i, b_i) \in \mbox{path}(x)}\sigma(b_i x^\top w_{e_i}),\end{split}\]where \(\sigma(\cdot)\) is a sigmoid function, and \(w\) is a weight matrix.
This function costs \(O(\log(n))\) time as an average length of paths is \(O(\log(n))\), and \(O(n)\) memory as the number of internal nodes equals \(n  1\).
Parameters:  in_size (int) – Dimension of input vectors.
 tree – A binary tree made with tuples like ((1, 2), 3).
Variables: W (Variable) – Weight parameter matrix.
See: Hierarchical Probabilistic Neural Network Language Model [Morin+, AISTAT2005].

__call__
(x, t)[source]¶ Computes the loss value for given input and ground truth labels.
Parameters: Returns: Loss value.
Return type:

static
create_huffman_tree
(word_counts)[source]¶ Makes a Huffman tree from a dictionary containing word counts.
This method creates a binary Huffman tree, that is required for
BinaryHierarchicalSoftmax
. For example,{0: 8, 1: 5, 2: 6, 3: 4}
is converted to((3, 1), (2, 0))
.Parameters: word_counts (dict of int key and int or float values) – Dictionary representing counts of words. Returns: Binary Huffman tree with tuples and keys of word_coutns
.
BlackOut¶

class
chainer.links.
BlackOut
(in_size, counts, sample_size)[source]¶ BlackOut loss layer.
See also
black_out()
for more detail.Parameters: Variables: W (Parameter) – Weight parameter matrix.
CRF1d¶

class
chainer.links.
CRF1d
(n_label)[source]¶ Linearchain conditional random field loss layer.
This link wraps the
crf1d()
function. It holds a transition cost matrix as a parameter.Parameters: n_label (int) – Number of labels. See also
crf1d()
for more detail.Variables: cost (Variable) – Transition cost parameter. 
argmax
(xs)[source]¶ Computes a state that maximizes a joint probability.
Parameters: xs (list of Variable) – Input vector for each label. Returns:  A tuple of
Variable
representing each  loglikelihood and a list representing the argmax path.
Return type: tuple See also
See
crf1d_argmax()
for more detail. A tuple of

SimplifiedDropconnect¶

class
chainer.links.
SimplifiedDropconnect
(in_size, out_size, ratio=0.5, nobias=False, initialW=None, initial_bias=None)[source]¶ Fullyconnected layer with simplified dropconnect regularization.
Notice: This implementation cannot be used for reproduction of the paper. There is a difference between the current implementation and the original one. The original version uses sampling with gaussian distribution before passing activation function, whereas the current implementation averages before activation.
Parameters:  in_size (int) – Dimension of input vectors. If
None
, parameter initialization will be deferred until the first forward data pass at which time the size will be determined.  out_size (int) – Dimension of output vectors.
 nobias (bool) – If
True
, then this link does not use the bias term.  initialW (3D array or None) – Initial weight value.
If
None
, the default initializer is used. May also be a callable that takesnumpy.ndarray
orcupy.ndarray
and edits its value.  initial_bias (2D array, float or None) – Initial bias value.
If
None
, the bias is set to 0. May also be a callable that takesnumpy.ndarray
orcupy.ndarray
and edits its value.
Variables: See also
See also
Li, W., Matthew Z., Sixin Z., Yann L., Rob F. (2013). Regularization of Neural Network using DropConnect. International Conference on Machine Learning. URL

__call__
(x, train=True, mask=None)[source]¶ Applies the simplified dropconnect layer.
Parameters:  x (chainer.Variable or
numpy.ndarray
or cupy.ndarray) – Batch of input vectors. Its first dimensionn
is assumed to be the minibatch dimension.  train (bool) – If
True
, executes simplified dropconnect. Otherwise, simplified dropconnect link works as a linear unit.  mask (None or chainer.Variable or numpy.ndarray or cupy.ndarray) – If
None
, randomized simplified dropconnect mask is generated. Otherwise, The mask must be(n, M, N)
shaped array. Main purpose of this option is debugging. mask array will be used as a dropconnect mask.
Returns: Output of the simplified dropconnect layer.
Return type:  x (chainer.Variable or
 in_size (int) – Dimension of input vectors. If
PReLU¶

class
chainer.links.
PReLU
(shape=(), init=0.25)[source]¶ Parametric ReLU function as a link.
Parameters:  shape (tuple of ints) – Shape of the parameter array.
 init (float) – Initial parameter value.
See the paper for details: Delving Deep into Rectifiers: Surpassing HumanLevel Performance on ImageNet Classification.
See also
Variables: W (Parameter) – Coefficient of parametric ReLU.
Maxout¶

class
chainer.links.
Maxout
(in_size, out_size, pool_size, initialW=None, initial_bias=0)[source]¶ Fullyconnected maxout layer.
Let
M
,P
andN
be an input dimension, a pool size, and an output dimension, respectively. For an input vector \(x\) of sizeM
, it computes\[Y_{i} = \mathrm{max}_{j} (W_{ij\cdot}x + b_{ij}).\]Here \(W\) is a weight tensor of shape
(M, P, N)
, \(b\) an optional bias vector of shape(M, P)
and \(W_{ij\cdot}\) is a subvector extracted from \(W\) by fixing first and second dimensions to \(i\) and \(j\), respectively. Minibatch dimension is omitted in the above equation.As for the actual implementation, this chain has a Linear link with a
(M * P, N)
weight matrix and an optionalM * P
dimensional bias vector.Parameters:  in_size (int) – Dimension of input vectors.
 out_size (int) – Dimension of output vectors.
 pool_size (int) – Number of channels.
 initialW (3D array or None) – Initial weight value.
If
None
, the default initializer is used to initialize the weight matrix.  initial_bias (2D array, float or None) – Initial bias value.
If it is float, initial bias is filled with this value.
If
None
, bias is omitted.
Variables: linear (Link) – The Linear link that performs affine transformation.
See also
See also
Goodfellow, I., Wardefarley, D., Mirza, M., Courville, A., & Bengio, Y. (2013). Maxout Networks. In Proceedings of the 30th International Conference on Machine Learning (ICML13) (pp. 13191327). URL
NegativeSampling¶

class
chainer.links.
NegativeSampling
(in_size, counts, sample_size, power=0.75)[source]¶ Negative sampling loss layer.
This link wraps the
negative_sampling()
function. It holds the weight matrix as a parameter. It also builds a sampler internally given a list of word counts.Parameters: See also
negative_sampling()
for more detail.Variables: W (Variable) – Weight parameter matrix. 
__call__
(x, t, reduce='sum')[source]¶ Computes the loss value for given input and ground truth labels.
Parameters:  x (Variable) – Input of the weight matrix multiplication.
 t (Variable) – Batch of ground truth labels.
 reduce (str) – Reduction option. Its value must be either
'sum'
or'no'
. Otherwise,ValueError
is raised.
Returns: Loss value.
Return type:

Machine learning models¶
Classifier¶

class
chainer.links.
Classifier
(predictor, lossfun=<function softmax_cross_entropy>, accfun=<function accuracy>)[source]¶ A simple classifier model.
This is an example of chain that wraps another chain. It computes the loss and accuracy based on a given input/label pair.
Parameters: Variables:  predictor (Link) – Predictor network.
 lossfun (function) – Loss function.
 accfun (function) – Function that computes accuracy.
 y (Variable) – Prediction for the last minibatch.
 loss (Variable) – Loss value for the last minibatch.
 accuracy (Variable) – Accuracy for the last minibatch.
 compute_accuracy (bool) – If
True
, compute accuracy on the forward computation. The default value isTrue
.

__call__
(*args)[source]¶ Computes the loss value for an input and label pair.
It also computes accuracy and stores it to the attribute.
Parameters: args (list of ~chainer.Variable) – Input minibatch. The all elements of
args
but last one are features and the last element corresponds to ground truth labels. It feeds features to the predictor and compare the result with ground truth labels.Returns: Loss value. Return type: Variable
Pretrained models¶
Pretrained models are mainly used to achieve a good performance with a small
dataset, or extract a semantic feature vector. Although CaffeFunction
automatically loads a pretrained model released as a caffemodel,
the following link models provide an interface for automatically converting
caffemodels, and easily extracting semantic feature vectors.
For example, to extract the feature vectors with VGG16Layers
, which is
a common pretrained model in the field of image recognition,
users need to write the following few lines:
from chainer.links import VGG16Layers
from PIL import Image
model = VGG16Layers()
img = Image.open("path/to/image.jpg")
feature = model.extract([img], layers=["fc7"])["fc7"]
where fc7
denotes a layer before the last fullyconnected layer.
Unlike the usual links, these classes automatically load all the
parameters from the pretrained models during initialization.
VGG16Layers¶

class
chainer.links.
VGG16Layers
(pretrained_model='auto')[source]¶ A pretrained CNN model with 16 layers provided by VGG team.
During initialization, this chain model automatically downloads the pretrained caffemodel, convert to another chainer model, stores it on your local directory, and initializes all the parameters with it. This model would be useful when you want to extract a semantic feature vector from a given image, or finetune the model on a different dataset. Note that this pretrained model is released under Creative Commons Attribution License.
If you want to manually convert the pretrained caffemodel to a chainer model that can be specified in the constructor, please use
convert_caffemodel_to_npz
classmethod instead.See: K. Simonyan and A. Zisserman, Very Deep Convolutional Networks for LargeScale Image Recognition
Parameters: pretrained_model (str) – the destination of the pretrained chainer model serialized as a .npz
file. If this argument is specified asauto
, it automatically downloads the caffemodel from the internet. Note that in this case the converted chainer model is stored on$CHAINER_DATASET_ROOT/pfnet/chainer/models
directory, where$CHAINER_DATASET_ROOT
is set as$HOME/.chainer/dataset
unless you specify another value as a environment variable. The converted chainer model is automatically used from the second time. If the argument is specified asNone
, all the parameters are not initialized by the pretrained model, but the default initializer used in the original paper, i.e.,chainer.initializers.Normal(scale=0.01)
.Variables: available_layers (list of str) – The list of available layer names used by __call__
andextract
methods.
__call__
(self, x, layers=['prob'])[source]¶ Computes all the feature maps specified by
layers
.Warning
test
argument is not supported anymore since v2. Instead, usechainer.using_config('train', train)
. Seechainer.using_config()
.Parameters:  x (Variable) – Input variable.
 layers (list of str) – The list of layer names you want to extract.
Returns: A directory in which the key contains the layer name and the value contains the corresponding feature map variable.
Return type: Dictionary of ~chainer.Variable

classmethod
convert_caffemodel_to_npz
(path_caffemodel, path_npz)[source]¶ Converts a pretrained caffemodel to a chainer model.
Parameters:

extract
(self, images, layers=['fc7'], size=(224, 224))[source]¶ Extracts all the feature maps of given images.
The difference of directly executing
__call__
is that it directly accepts images as an input and automatically transforms them to a proper variable. That is, it is also interpreted as a shortcut method that implicitly callsprepare
and__call__
functions.Warning
test
andvolatile
arguments are not supported anymore since v2. Instead, usechainer.using_config('train', train)
andchainer.using_config('enable_backprop', not volatile)
respectively. Seechainer.using_config()
.Parameters:  images (iterable of PIL.Image or numpy.ndarray) – Input images.
 layers (list of str) – The list of layer names you want to extract.
 size (pair of ints) – The resolution of resized images used as
an input of CNN. All the given images are not resized
if this argument is
None
, but the resolutions of all the images should be the same.
Returns: A directory in which the key contains the layer name and the value contains the corresponding feature map variable.
Return type: Dictionary of ~chainer.Variable

predict
(images, oversample=True)[source]¶ Computes all the probabilities of given images.
Parameters:  images (iterable of PIL.Image or numpy.ndarray) – Input images.
 oversample (bool) – If
True
, it averages results across center, corners, and mirrors. Otherwise, it uses only the center.
Returns: Output that contains the class probabilities of given images.
Return type:


chainer.links.model.vision.vgg.
prepare
(image, size=(224, 224))[source]¶ Converts the given image to the numpy array for VGG models.
Note that you have to call this method before
__call__
because the pretrained vgg model requires to resize the given image, covert the RGB to the BGR, subtract the mean, and permute the dimensions before calling.Parameters:  image (PIL.Image or numpy.ndarray) – Input image.
If an input is
numpy.ndarray
, its shape must be(height, width)
,(height, width, channels)
, or(channels, height, width)
, and the order of the channels must be RGB.  size (pair of ints) – Size of converted images.
If
None
, the given image is not resized.
Returns: The converted output array.
Return type:  image (PIL.Image or numpy.ndarray) – Input image.
If an input is
GoogLeNet¶

class
chainer.links.
GoogLeNet
(pretrained_model='auto')[source]¶ A pretrained GoogLeNet model provided by BVLC.
When you specify the path of the pretrained chainer model serialized as a
.npz
file in the constructor, this chain model automatically initializes all the parameters with it. This model would be useful when you want to extract a semantic feature vector per image, or finetune the model on a different dataset.If you want to manually convert the pretrained caffemodel to a chainer model that can be specified in the constructor, please use
convert_caffemodel_to_npz
classmethod instead.GoogLeNet, which is also called Inceptionv1, is an architecture of convolutional neural network proposed in 2014. This model is relatively lightweight and requires small memory footprint during training compared with modern architectures such as ResNet. Therefore, if you finetune your network based on a model pretrained by Imagenet and need to train it with large batch size, GoogLeNet may be useful. On the other hand, if you just want an offtheshelf classifier, we recommend you to use ResNet50 or other models since they are more accurate than GoogLeNet.
The original model is provided here: https://github.com/BVLC/caffe/tree/master/models/bvlc_googlenet
Parameters: pretrained_model (str) – the destination of the pretrained chainer model serialized as a .npz
file. If this argument is specified asauto
, it automatically downloads the caffemodel from the internet. Note that in this case the converted chainer model is stored on$CHAINER_DATASET_ROOT/pfnet/chainer/models
directory, where$CHAINER_DATASET_ROOT
is set as$HOME/.chainer/dataset
unless you specify another value as a environment variable. The converted chainer model is automatically used from the second time. If the argument is specified asNone
, all the parameters are not initialized by the pretrained model, but the default initializer used in BVLC, i.e.,chainer.initializers.LeCunUniform(scale=1.0)
. Note that, in Caffe, when weight_filler is specified as “xavier” type without variance_norm parameter, the weights are initialized by Uniform(s, s), where \(s = \sqrt{\frac{3}{fan_{in}}}\) and \(fan_{in}\) is the number of input units. This corresponds to LeCunUniform in Chainer but not GlorotUniform.Variables: available_layers (list of str) – The list of available layer names used by __call__
andextract
methods.
__call__
(self, x, layers=['prob'])[source]¶ Computes all the feature maps specified by
layers
.Warning
train
argument is not supported anymore since v2. Instead, usechainer.using_config('train', train)
. Seechainer.using_config()
.Parameters:  x (Variable) – Input variable. It should be prepared by
prepare
function.  layers (list of str) – The list of layer names you want to extract.
Returns: A directory in which the key contains the layer name and the value contains the corresponding feature map variable.
Return type: Dictionary of ~chainer.Variable
 x (Variable) – Input variable. It should be prepared by

classmethod
convert_caffemodel_to_npz
(path_caffemodel, path_npz)[source]¶ Converts a pretrained caffemodel to a chainer model.
Parameters:

extract
(self, images, layers=['pool5'], size=(224, 224))[source]¶ Extracts all the feature maps of given images.
The difference of directly executing
__call__
is that it directly accepts images as an input and automatically transforms them to a proper variable. That is, it is also interpreted as a shortcut method that implicitly callsprepare
and__call__
functions.Warning
train
andvolatile
arguments are not supported anymore since v2. Instead, usechainer.using_config('train', train)
andchainer.using_config('enable_backprop', not volatile)
respectively. Seechainer.using_config()
.Parameters:  images (iterable of PIL.Image or numpy.ndarray) – Input images.
 layers (list of str) – The list of layer names you want to extract.
 size (pair of ints) – The resolution of resized images used as
an input of CNN. All the given images are not resized
if this argument is
None
, but the resolutions of all the images should be the same.
Returns: A directory in which the key contains the layer name and the value contains the corresponding feature map variable.
Return type: Dictionary of ~chainer.Variable

predict
(images, oversample=True)[source]¶ Computes all the probabilities of given images.
Parameters:  images (iterable of PIL.Image or numpy.ndarray) – Input images.
 oversample (bool) – If
True
, it averages results across center, corners, and mirrors. Otherwise, it uses only the center.
Returns: Output that contains the class probabilities of given images.
Return type:


chainer.links.model.vision.googlenet.
prepare
(image, size=(224, 224))[source]¶ Converts the given image to the numpy array for GoogLeNet.
Note that you have to call this method before
__call__
because the pretrained GoogLeNet model requires to resize the given image, covert the RGB to the BGR, subtract the mean, and permute the dimensions before calling.Parameters:  image (PIL.Image or numpy.ndarray) – Input image.
If an input is
numpy.ndarray
, its shape must be(height, width)
,(height, width, channels)
, or(channels, height, width)
, and the order of the channels must be RGB.  size (pair of ints) – Size of converted images.
If
None
, the given image is not resized.
Returns: The converted output array.
Return type:  image (PIL.Image or numpy.ndarray) – Input image.
If an input is
Residual Networks¶

class
chainer.links.model.vision.resnet.
ResNetLayers
(pretrained_model, n_layers)[source]¶ A pretrained CNN model provided by MSRA.
When you specify the path of the pretrained chainer model serialized as a
.npz
file in the constructor, this chain model automatically initializes all the parameters with it. This model would be useful when you want to extract a semantic feature vector per image, or finetune the model on a different dataset. Note that unlikeVGG16Layers
, it does not automatically download a pretrained caffemodel. This caffemodel can be downloaded at GitHub.If you want to manually convert the pretrained caffemodel to a chainer model that can be specified in the constructor, please use
convert_caffemodel_to_npz
classmethod instead.See: K. He et. al., Deep Residual Learning for Image Recognition
Parameters:  pretrained_model (str) – the destination of the pretrained
chainer model serialized as a
.npz
file. If this argument is specified asauto
, it automatically loads and converts the caffemodel from$CHAINER_DATASET_ROOT/pfnet/chainer/models/ResNet{nlayers}model.caffemodel
, where$CHAINER_DATASET_ROOT
is set as$HOME/.chainer/dataset
unless you specify another value by modifying the environment variable and {n_layers} is replaced with the specified number of layers given as the first argment to this costructor. Note that in this case the converted chainer model is stored on the same directory and automatically used from the next time. If this argument is specified asNone
, all the parameters are not initialized by the pretrained model, but the default initializer used in the original paper, i.e.,chainer.initializers.HeNormal(scale=1.0)
.  n_layers (int) – The number of layers of this model. It should be either 50, 101, or 152.
Variables: available_layers (list of str) – The list of available layer names used by
__call__
andextract
methods.
__call__
(self, x, layers=['prob'])[source]¶ Computes all the feature maps specified by
layers
.Warning
test
argument is not supported anymore since v2. Instead, usechainer.using_config('train', train)
. Seechainer.using_config()
.Parameters:  x (Variable) – Input variable.
 layers (list of str) – The list of layer names you want to extract.
Returns: A directory in which the key contains the layer name and the value contains the corresponding feature map variable.
Return type: Dictionary of ~chainer.Variable

classmethod
convert_caffemodel_to_npz
(path_caffemodel, path_npz, n_layers=50)[source]¶ Converts a pretrained caffemodel to a chainer model.
Parameters:

extract
(self, images, layers=['pool5'], size=(224, 224))[source]¶ Extracts all the feature maps of given images.
The difference of directly executing
__call__
is that it directly accepts images as an input and automatically transforms them to a proper variable. That is, it is also interpreted as a shortcut method that implicitly callsprepare
and__call__
functions.Warning
test
andvolatile
arguments are not supported anymore since v2. Instead, usechainer.using_config('train', train)
andchainer.using_config('enable_backprop', not volatile)
respectively. Seechainer.using_config()
.Parameters:  images (iterable of PIL.Image or numpy.ndarray) – Input images.
 layers (list of str) – The list of layer names you want to extract.
 size (pair of ints) – The resolution of resized images used as
an input of CNN. All the given images are not resized
if this argument is
None
, but the resolutions of all the images should be the same.
Returns: A directory in which the key contains the layer name and the value contains the corresponding feature map variable.
Return type: Dictionary of ~chainer.Variable

predict
(images, oversample=True)[source]¶ Computes all the probabilities of given images.
Parameters:  images (iterable of PIL.Image or numpy.ndarray) – Input images.
 oversample (bool) – If
True
, it averages results across center, corners, and mirrors. Otherwise, it uses only the center.
Returns: Output that contains the class probabilities of given images.
Return type:
 pretrained_model (str) – the destination of the pretrained
chainer model serialized as a

class
chainer.links.
ResNet50Layers
(pretrained_model='auto')[source]¶ A pretrained CNN model with 50 layers provided by MSRA.
When you specify the path of the pretrained chainer model serialized as a
.npz
file in the constructor, this chain model automatically initializes all the parameters with it. This model would be useful when you want to extract a semantic feature vector per image, or finetune the model on a different dataset. Note that unlikeVGG16Layers
, it does not automatically download a pretrained caffemodel. This caffemodel can be downloaded at GitHub.If you want to manually convert the pretrained caffemodel to a chainer model that can be specified in the constructor, please use
convert_caffemodel_to_npz
classmethod instead.ResNet50 has 25,557,096 trainable parameters, and it’s 58% and 43% fewer than ResNet101 and ResNet152, respectively. On the other hand, the top5 classification accuracy on ImageNet dataset drops only 0.7% and 1.1% from ResNet101 and ResNet152, respectively. Therefore, ResNet50 may have the best balance between the accuracy and the model size. It would be basically just enough for many cases, but some advanced models for object detection or semantic segmentation use deeper ones as their building blocks, so these deeper ResNets are here for making reproduction work easier.
See: K. He et. al., Deep Residual Learning for Image Recognition
Parameters: pretrained_model (str) – the destination of the pretrained chainer model serialized as a .npz
file. If this argument is specified asauto
, it automatically loads and converts the caffemodel from$CHAINER_DATASET_ROOT/pfnet/chainer/models/ResNet50model.caffemodel
, where$CHAINER_DATASET_ROOT
is set as$HOME/.chainer/dataset
unless you specify another value by modifying the environment variable. Note that in this case the converted chainer model is stored on the same directory and automatically used from the next time. If this argument is specified asNone
, all the parameters are not initialized by the pretrained model, but the default initializer used in the original paper, i.e.,chainer.initializers.HeNormal(scale=1.0)
.Variables: available_layers (list of str) – The list of available layer names used by __call__
andextract
methods.

class
chainer.links.
ResNet101Layers
(pretrained_model='auto')[source]¶ A pretrained CNN model with 101 layers provided by MSRA.
When you specify the path of the pretrained chainer model serialized as a
.npz
file in the constructor, this chain model automatically initializes all the parameters with it. This model would be useful when you want to extract a semantic feature vector per image, or finetune the model on a different dataset. Note that unlikeVGG16Layers
, it does not automatically download a pretrained caffemodel. This caffemodel can be downloaded at GitHub.If you want to manually convert the pretrained caffemodel to a chainer model that can be specified in the constructor, please use
convert_caffemodel_to_npz
classmethod instead.ResNet101 has 44,549,224 trainable parameters, and it’s 43% fewer than ResNet152 model, while the top5 classification accuracy on ImageNet dataset drops 1.1% from ResNet152. For many cases, ResNet50 may have the best balance between the accuracy and the model size.
See: K. He et. al., Deep Residual Learning for Image Recognition
Parameters: pretrained_model (str) – the destination of the pretrained chainer model serialized as a .npz
file. If this argument is specified asauto
, it automatically loads and converts the caffemodel from$CHAINER_DATASET_ROOT/pfnet/chainer/models/ResNet101model.caffemodel
, where$CHAINER_DATASET_ROOT
is set as$HOME/.chainer/dataset
unless you specify another value by modifying the environment variable. Note that in this case the converted chainer model is stored on the same directory and automatically used from the next time. If this argument is specified asNone
, all the parameters are not initialized by the pretrained model, but the default initializer used in the original paper, i.e.,chainer.initializers.HeNormal(scale=1.0)
.Variables: available_layers (list of str) – The list of available layer names used by __call__
andextract
methods.

class
chainer.links.
ResNet152Layers
(pretrained_model='auto')[source]¶ A pretrained CNN model with 152 layers provided by MSRA.
When you specify the path of the pretrained chainer model serialized as a
.npz
file in the constructor, this chain model automatically initializes all the parameters with it. This model would be useful when you want to extract a semantic feature vector per image, or finetune the model on a different dataset. Note that unlikeVGG16Layers
, it does not automatically download a pretrained caffemodel. This caffemodel can be downloaded at GitHub.If you want to manually convert the pretrained caffemodel to a chainer model that can be specified in the constructor, please use
convert_caffemodel_to_npz
classmethod instead.ResNet152 has 60,192,872 trainable parameters, and it’s the deepest ResNet model and it achieves the best result on ImageNet classification task in ILSVRC 2015.
See: K. He et. al., Deep Residual Learning for Image Recognition
Parameters: pretrained_model (str) – the destination of the pretrained chainer model serialized as a .npz
file. If this argument is specified asauto
, it automatically loads and converts the caffemodel from$CHAINER_DATASET_ROOT/pfnet/chainer/models/ResNet152model.caffemodel
, where$CHAINER_DATASET_ROOT
is set as$HOME/.chainer/dataset
unless you specify another value by modifying the environment variable. Note that in this case the converted chainer model is stored on the same directory and automatically used from the next time. If this argument is specified asNone
, all the parameters are not initialized by the pretrained model, but the default initializer used in the original paper, i.e.,chainer.initializers.HeNormal(scale=1.0)
.Variables: available_layers (list of str) – The list of available layer names used by __call__
andextract
methods.

chainer.links.model.vision.resnet.
prepare
(image, size=(224, 224))[source]¶ Converts the given image to the numpy array for ResNets.
Note that you have to call this method before
__call__
because the pretrained resnet model requires to resize the given image, covert the RGB to the BGR, subtract the mean, and permute the dimensions before calling.Parameters:  image (PIL.Image or numpy.ndarray) – Input image.
If an input is
numpy.ndarray
, its shape must be(height, width)
,(height, width, channels)
, or(channels, height, width)
, and the order of the channels must be RGB.  size (pair of ints) – Size of converted images.
If
None
, the given image is not resized.
Returns: The converted output array.
Return type:  image (PIL.Image or numpy.ndarray) – Input image.
If an input is