Standard Function implementations¶

Chainer provides basic Function implementations in the chainer.functions package. Most of them are wrapped by plain Python functions, which users should use.

Note

As of v1.5, the concept of parameterized functions are gone, and they are replaced by corresponding Link implementations. They are still put in the functions namespace for backward compatibility, though it is strongly recommended to use them via the chainer.links package.

Activation functions¶

clipped_relu¶

chainer.functions.clipped_relu(x, z=20.0)[source]¶

Clipped Rectifier Unit function.

For a clipping value \(z(>0)\), it computes

Parameters:	x (`Variable` or `numpy.ndarray` or `cupy.ndarray`) – Input variable. A \((s_1, s_2, ..., s_n)\)-shaped float array. z (float) – Clipping value. (default = 20.0)
Returns:	Output variable. A \((s_1, s_2, ..., s_n)\)-shaped float array.
Return type:	Variable

Example

>>> x = np.random.uniform(-100, 100, (10, 20)).astype('f')
>>> z = 10.0
>>> np.any(x < 0)
True
>>> np.any(x > z)
True
>>> y = F.clipped_relu(x, z=z)
>>> np.any(y.data < 0)
False
>>> np.any(y.data > z)
False

crelu¶

chainer.functions.crelu(x, axis=1)[source]¶

Concatenated Rectified Linear Unit function.

This function is expressed as follows

\[f(x) = (\max(0, x), \max(0, -x)).\]

Here, two output values are concatenated along an axis.

See: https://arxiv.org/abs/1603.05201

Parameters:	x (`Variable` or `numpy.ndarray` or `cupy.ndarray`) – Input variable. A \((s_1, s_2, ..., s_N)\)-shaped float array. axis (int) – Axis that the output values are concatenated along. Default is 1.
Returns:	Output variable of concatenated array. If the axis is 1, A \((s_1, s_2 \times 2, ..., s_N)\)-shaped float array.
Return type:	Variable

Example

>>> x = np.array([[-1, 0], [2, -3]], 'f')
>>> x
array([[-1.,  0.],
       [ 2., -3.]], dtype=float32)
>>> y = F.crelu(x, axis=1)
>>> y.data
array([[ 0.,  0.,  1.,  0.],
       [ 2.,  0.,  0.,  3.]], dtype=float32)

elu¶

chainer.functions.elu(x, alpha=1.0)[source]¶

Exponential Linear Unit function.

For a parameter \(\alpha\), it is expressed as

\[\begin{split}f(x) = \left \{ \begin{array}{ll} x & {\rm if}~ x \ge 0 \\ \alpha (\exp(x) - 1) & {\rm if}~ x < 0, \end{array} \right.\end{split}\]

See: https://arxiv.org/abs/1511.07289

Parameters:	x (`Variable` or `numpy.ndarray` or `cupy.ndarray`) – Input variable. A \((s_1, s_2, ..., s_N)\)-shaped float array. alpha (float) – Parameter \(\alpha\). Default is 1.0.
Returns:	Output variable. A \((s_1, s_2, ..., s_N)\)-shaped float array.
Return type:	Variable

Example

>>> x = np.array([[-1, 0], [2, -3]], 'f')
>>> x
array([[-1.,  0.],
       [ 2., -3.]], dtype=float32)
>>> y = F.elu(x, alpha=1.)
>>> y.data
array([[-0.63212055,  0.        ],
       [ 2.        , -0.95021296]], dtype=float32)

hard_sigmoid¶

chainer.functions.hard_sigmoid(x)[source]¶

Element-wise hard-sigmoid function.

This function is defined as

\[\begin{split}f(x) = \left \{ \begin{array}{ll} 0 & {\rm if}~ x < -2.5 \\ 0.2 x + 0.5 & {\rm if}~ -2.5 < x < 2.5 \\ 1 & {\rm if}~ 2.5 < x. \end{array} \right.\end{split}\]

Parameters:	x (`Variable` or `numpy.ndarray` or `cupy.ndarray`) – Input variable. A \((s_1, s_2, ..., s_N)\)-shaped float array.
Returns:	Output variable. A \((s_1, s_2, ..., s_N)\)-shaped float array.
Return type:	Variable

Example

It maps the input values into the range of \([0, 1]\).

>>> x = np.array([-2.6, -1, 0, 1, 2.6])
>>> x
array([-2.6, -1. ,  0. ,  1. ,  2.6])
>>> F.hard_sigmoid(x).data
array([ 0. ,  0.3,  0.5,  0.7,  1. ])

leaky_relu¶

chainer.functions.leaky_relu(x, slope=0.2)[source]¶

Leaky Rectified Linear Unit function.

This function is expressed as

\[f(x)=\max(x, ax),\]

where \(a\) is a configurable slope value.

Parameters:	x (`Variable` or `numpy.ndarray` or `cupy.ndarray`) – Input variable. A \((s_1, s_2, ..., s_N)\)-shaped float array. slope (float) – Slope value \(a\).
Returns:	Output variable. A \((s_1, s_2, ..., s_N)\)-shaped float array.
Return type:	Variable

Example

>>> x = np.array([[-1, 0], [2, -3], [-2, 1]], 'f')
>>> x
array([[-1.,  0.],
       [ 2., -3.],
       [-2.,  1.]], dtype=float32)
>>> F.leaky_relu(x, slope=0.2).data
array([[-0.2       ,  0.        ],
       [ 2.        , -0.60000002],
       [-0.40000001,  1.        ]], dtype=float32)

log_softmax¶

chainer.functions.log_softmax(x, use_cudnn=True)[source]¶

Channel-wise log-softmax function.

This function computes its logarithm of softmax along the second axis. Let \(c = (c_1, c_2, \dots, c_D)\) be the slice of x along with the second axis. For each slice \(c\), it computes the logarithm of the function \(f(c)\) defined as

\[f(c) = {\exp(c) \over \sum_{d} \exp(c_d)}.\]

This method is theoretically equivalent to log(softmax(x)) but is more stable.

Note

log(softmax(x)) may cause underflow when x is too small, because softmax(x) may returns 0. log_softmax method is more stable.

Parameters:	x (`Variable` or `numpy.ndarray` or `cupy.ndarray`) – Input variable. A \(n\)-dimensional (\(n \geq 2\)) float array. use_cudnn (bool) – If `True` and cuDNN is enabled, then this function uses cuDNN as the core implementation.
Returns:	Output variable. A \(n\)-dimensional (\(n \geq 2\)) float array, which is the same shape with x.
Return type:	Variable

See also

softmax()

Example

>>> x = np.array([[0, 1, 2], [0, 2, 4]], 'f')
>>> x
array([[ 0.,  1.,  2.],
       [ 0.,  2.,  4.]], dtype=float32)
>>> F.log_softmax(x).data
array([[-2.40760589, -1.40760589, -0.40760589],
       [-4.14293146, -2.14293146, -0.14293146]], dtype=float32)
>>> np.allclose(F.log_softmax(x).data, F.log(F.softmax(x)).data)
True

lstm¶

chainer.functions.lstm(c_prev, x)[source]¶

Long Short-Term Memory units as an activation function.

This function implements LSTM units with forget gates. Let the previous cell state c_prev and the input array x.

First, the input array x is split into four arrays \(a, i, f, o\) of the same shapes along the second axis. It means that x ‘s second axis must have 4 times the c_prev ‘s second axis.

The split input arrays are corresponding to:

\(a\) : sources of cell input

\(i\) : sources of input gate

\(f\) : sources of forget gate

\(o\) : sources of output gate

Second, it computes the updated cell state c and the outgoing signal h as:

\[\begin{split}c &= \tanh(a) \sigma(i) + c_{\text{prev}} \sigma(f), \\ h &= \tanh(c) \sigma(o),\end{split}\]

where \(\sigma\) is the elementwise sigmoid function. These are returned as a tuple of two variables.

This function supports variable length inputs. The mini-batch size of the current input must be equal to or smaller than that of the previous one. When mini-batch size of x is smaller than that of c, this function only updates c[0:len(x)] and doesn’t change the rest of c, c[len(x):]. So, please sort input sequences in descending order of lengths before applying the function.

Parameters:	c_prev (`Variable` or `numpy.ndarray` or `cupy.ndarray`) – Variable that holds the previous cell state. The cell state should be a zero array or the output of the previous call of LSTM. x (`Variable` or `numpy.ndarray` or `cupy.ndarray`) – Variable that holds the sources of cell input, input gate, forget gate and output gate. It must have the second dimension whose size is four times of that of the cell state.
Returns:	Two `Variable` objects `c` and `h`. `c` is the updated cell state. `h` indicates the outgoing signal.
Return type:	tuple

See the original paper proposing LSTM with forget gates: Long Short-Term Memory in Recurrent Neural Networks.

See also

LSTM

Example

Assuming y is the current incoming signal, c is the previous cell state, and h is the previous outgoing signal from an lstm function. Each of y, c and h has n_units channels. Most typical preparation of x is:

>>> n_units = 100
>>> y = chainer.Variable(np.zeros((1, n_units), 'f'))
>>> h = chainer.Variable(np.zeros((1, n_units), 'f'))
>>> c = chainer.Variable(np.zeros((1, n_units), 'f'))
>>> model = chainer.Chain(w=L.Linear(n_units, 4 * n_units),
...                       v=L.Linear(n_units, 4 * n_units),)
>>> x = model.w(y) + model.v(h)
>>> c, h = F.lstm(c, x)

It corresponds to calculate the input array x, or the input sources \(a, i, f, o\), from the current incoming signal y and the previous outgoing signal h. Different parameters are used for different kind of input sources.

Note

We use the naming rule below.

incoming signal

The formal input of the formulation of LSTM (e.g. in NLP, word vector or output of lower RNN layer). The input of chainer.links.LSTM is the incoming signal.
input array

The array which is linear transformed from incoming signal and the previous outgoing signal. The input array contains four sources, the sources of cell input, input gate, forget gate and output gate. The input of chainer.functions.LSTM is the input array.

maxout¶

chainer.functions.maxout(x, pool_size, axis=1)[source]¶

Maxout activation function.

It accepts an input tensor x, reshapes the axis dimension (say the size being M * pool_size) into two dimensions (M, pool_size), and takes maximum along the axis dimension.

Parameters:	x (`Variable` or `numpy.ndarray` or `cupy.ndarray`) – Input variable. A \(n\)-dimensional (\(n \ge\) `axis`) float array. In general, its first dimension is assumed to be the minibatch dimension. The other dimensions are treated as one concatenated dimension. pool_size (int) – The size used for downsampling of pooling layer. axis (int) – The `axis` dimension to be reshaped. The size of `axis` dimension should be `M * pool_size`.
Returns:	Output variable. The shape of the output is same as `x` except that `axis` dimension is transformed from `M * pool_size` to `M`.
Return type:	Variable

See also

Maxout

Example

Typically, x is the output of a linear layer or a convolution layer. The following is the example where we use maxout() in combination with a Linear link.

>>> in_size, out_size, pool_size = 10, 10, 10
>>> bias = np.arange(out_size * pool_size).astype('f')
>>> l = L.Linear(in_size, out_size * pool_size, initial_bias=bias)
>>> x = np.zeros((1, in_size), 'f')  # prepare data
>>> x = l(x)
>>> y = F.maxout(x, pool_size)
>>> x.shape
(1, 100)
>>> y.shape
(1, 10)
>>> x.reshape((out_size, pool_size)).data
array([[  0.,   1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.],
       [ 10.,  11.,  12.,  13.,  14.,  15.,  16.,  17.,  18.,  19.],
       [ 20.,  21.,  22.,  23.,  24.,  25.,  26.,  27.,  28.,  29.],
       [ 30.,  31.,  32.,  33.,  34.,  35.,  36.,  37.,  38.,  39.],
       [ 40.,  41.,  42.,  43.,  44.,  45.,  46.,  47.,  48.,  49.],
       [ 50.,  51.,  52.,  53.,  54.,  55.,  56.,  57.,  58.,  59.],
       [ 60.,  61.,  62.,  63.,  64.,  65.,  66.,  67.,  68.,  69.],
       [ 70.,  71.,  72.,  73.,  74.,  75.,  76.,  77.,  78.,  79.],
       [ 80.,  81.,  82.,  83.,  84.,  85.,  86.,  87.,  88.,  89.],
       [ 90.,  91.,  92.,  93.,  94.,  95.,  96.,  97.,  98.,  99.]], dtype=float32)
>>> y.data
array([[  9.,  19.,  29.,  39.,  49.,  59.,  69.,  79.,  89.,  99.]], dtype=float32)

prelu¶

chainer.functions.prelu(x, W)[source]¶

Parametric ReLU function.

It accepts two arguments: an input x and a weight array W and computes the output as \(PReLU(x) = \max(x, W*x)\), where \(*\) is an elementwise multiplication for each sample in the batch.

When the PReLU function is combined with two-dimensional convolution, the elements of parameter \(a\) are typically shared across the same filter of different pixels. In order to support such usage, this function supports the shape of parameter array that indicates leading dimensions of input arrays except the batch dimension.

For example \(W\) has the shape of \((2, 3, 4)\), \(x\) must have the shape of \((B, 2, 3, 4, S1, ..., SN)\) where B is batch size and the number of trailing S’s is arbitrary non-negative integer.

Parameters:	x (Variable) – Input variable. Its first argument is assumed to be the minibatch dimension. W (Variable) – Weight variable.
Returns:	Output variable
Return type:	Variable

See also

PReLU

relu¶

chainer.functions.relu(x, use_cudnn=True)[source]¶

Rectified Linear Unit function.

\[f(x)=\max(0, x).\]

Parameters:	x (`Variable` or `numpy.ndarray` or `cupy.ndarray`) – Input variable. A \((s_1, s_2, ..., s_N)\)-shaped float array. use_cudnn (bool) – If `True` and cuDNN is enabled, then this function uses cuDNN as the core implementation.
Returns:	Output variable. A \((s_1, s_2, ..., s_N)\)-shaped float array.
Return type:	Variable

Example

>>> x = np.array([[-1, 0], [2, -3], [-2, 1]], 'f')
>>> np.any(x < 0)
True
>>> y = F.relu(x)
>>> np.any(y.data < 0)
False
>>> y.shape
(3, 2)

sigmoid¶

chainer.functions.sigmoid(x, use_cudnn=True)[source]¶

Element-wise sigmoid logistic function.

\[f(x)=(1 + \exp(-x))^{-1}.\]

Parameters:	x (`Variable` or `numpy.ndarray` or `cupy.ndarray`) – Input variable. A \((s_1, s_2, ..., s_N)\)-shaped float array. use_cudnn (bool) – If `True` and cuDNN is enabled, then this function uses cuDNN as the core implementation.
Returns:	Output variable. A \((s_1, s_2, ..., s_N)\)-shaped float array.
Return type:	Variable

Example

It maps the input values into the range of \([0, 1]\).

>>> x = np.arange(-2, 3, 2).astype('f')
>>> x
array([-2.,  0.,  2.], dtype=float32)
>>> F.sigmoid(x).data
array([ 0.11920291,  0.5       ,  0.88079709], dtype=float32)

slstm¶

chainer.functions.slstm(c_prev1, c_prev2, x1, x2)[source]¶

S-LSTM units as an activation function.

This function implements S-LSTM unit. It is an extension of LSTM unit applied to tree structures. The function is applied to binary trees. Each node has two child nodes. It gets four arguments, previous cell states c_prev1 and c_prev2, and input arrays x1 and x2.

First both input arrays x1 and x2 are split into eight arrays \(a_1, i_1, f_1, o_1\), and \(a_2, i_2, f_2, o_2\). They have the same shape along the second axis. It means that x1 and x2 ‘s second axis must have 4 times the length of c_prev1 and c_prev2.

The split input arrays are corresponding to:

\(a_i\) : sources of cell input

\(i_i\) : sources of input gate

\(f_i\) : sources of forget gate

\(o_i\) : sources of output gate

It computes the updated cell state c and the outgoing signal h as:

\[\begin{split}c &= \tanh(a_1 + a_2) \sigma(i_1 + i_2) + c_{\text{prev}1} \sigma(f_1) + c_{\text{prev}2} \sigma(f_2), \\ h &= \tanh(c) \sigma(o_1 + o_2),\end{split}\]

where \(\sigma\) is the elementwise sigmoid function. The function returns c and h as a tuple.

Parameters:	c_prev1 (`Variable` or `numpy.ndarray` or `cupy.ndarray`) – Variable that holds the previous cell state of the first child node. The cell state should be a zero array or the output of the previous call of LSTM. c_prev2 (`Variable` or `numpy.ndarray` or `cupy.ndarray`) – Variable that holds the previous cell state of the second child node. x1 (`Variable` or `numpy.ndarray` or `cupy.ndarray`) – Variable that holds the sources of cell input, input gate, forget gate and output gate from the first child node. It must have the second dimension whose size is four times of that of the cell state. x2 (`Variable` or `numpy.ndarray` or `cupy.ndarray`) – Variable that holds the input sources from the second child node.
Returns:	Two `Variable` objects `c` and `h`. `c` is the cell state. `h` indicates the outgoing signal.
Return type:	tuple

See detail in paper: Long Short-Term Memory Over Tree Structures.

Example

Assuming c1, c2 is the previous cell state of children, and h1, h2 is the previous outgoing signal from children. Each of c1, c2, h1 and h2 has n_units channels. Most typical preparation of x1, x2 is:

>>> n_units = 100
>>> h1 = chainer.Variable(np.zeros((1, n_units), 'f'))
>>> h2 = chainer.Variable(np.zeros((1, n_units), 'f'))
>>> c1 = chainer.Variable(np.zeros((1, n_units), 'f'))
>>> c2 = chainer.Variable(np.zeros((1, n_units), 'f'))
>>> model1 = chainer.Chain(w=L.Linear(n_units, 4 * n_units),
...                        v=L.Linear(n_units, 4 * n_units))
>>> model2 = chainer.Chain(w=L.Linear(n_units, 4 * n_units),
...                        v=L.Linear(n_units, 4 * n_units))
>>> x1 = model1.w(c1) + model1.v(h1)
>>> x2 = model2.w(c2) + model2.v(h2)
>>> c, h = F.slstm(c1, c2, x1, x2)

It corresponds to calculate the input array x1, or the input sources \(a_1, i_1, f_1, o_1\) from the previous cell state of first child node c1, and the previous outgoing signal from first child node h1. Different parameters are used for different kind of input sources.

softmax¶

chainer.functions.softmax(x, use_cudnn=True, axis=1)[source]¶

Softmax function.

This function computes its softmax along an axis. Let \(x = (x_1, x_2, \dots, x_d)^{\top}\) be the d dimensional index array and \(f(x)\) be the d dimensional input array. For each index \(x\) of the input array \(f(x)\), it computes the probability \(p(x)\) defined as \(p(x) = {\exp(f(x)) \over \sum_{x_2} \exp(f(x))}\).

Parameters:	x (Variable) – Input variable. use_cudnn (bool) – If `True` and cuDNN is enabled, then this function uses cuDNN as the core implementation. axis – The axis along which the softmax is to be computed.
Returns:	Output variable.
Return type:	Variable

softplus¶

chainer.functions.softplus(x, beta=1.0)[source]¶

Element-wise softplus function.

The softplus function is the smooth approximation of ReLU.

\[f(x)=\frac{1}{\beta}\log(1 + \exp(\beta x)),\]

where \(\beta\) is a parameter. The function becomes curved and akin to ReLU as the \(\beta\) is increasing.

Parameters:	x (`Variable` or `numpy.ndarray` or `cupy.ndarray`) – Input variable. A \((s_1, s_2, ..., s_N)\)-shaped float array. beta (float) – Parameter \(\beta\).
Returns:	Output variable. A \((s_1, s_2, ..., s_N)\)-shaped float array.
Return type:	Variable

Example

>>> x = np.arange(-2, 3, 2).astype('f')
>>> x
array([-2.,  0.,  2.], dtype=float32)
>>> F.softplus(x, beta=1.0).data
array([ 0.126928  ,  0.69314718,  2.12692809], dtype=float32)

tanh¶

chainer.functions.tanh(x, use_cudnn=True)[source]¶

Elementwise hyperbolic tangent function.

\[f(x)=\tanh(x).\]

Parameters:	x (`Variable` or `numpy.ndarray` or `cupy.ndarray`) – Input variable. A \((s_1, s_2, ..., s_N)\)-shaped float array. use_cudnn (bool) – If `True` and cuDNN is enabled, then this function uses cuDNN as the core implementation.
Returns:	Output variable. A \((s_1, s_2, ..., s_N)\)-shaped float array.
Return type:	Variable

Example

>>> x = np.arange(-1, 4, 2).astype('f')
>>> x
array([-1.,  1.,  3.], dtype=float32)
>>> F.tanh(x).data
array([-0.76159418,  0.76159418,  0.99505478], dtype=float32)

Array manipulations¶

broadcast¶

chainer.functions.broadcast(*args)[source]¶

Broadcast given variables.

Parameters:	args (`Variable` or `numpy.ndarray` or `cupy.ndarray`) – Input variables to be broadcasted. Each dimension of the shapes of the input variables must have the same size.
Returns:	`Variable` or tuple of `Variable` objects which are broadcasted from given arguments.
Return type:	Variable

Example

>>> x = np.random.uniform(0, 1, (3, 2)).astype('f')
>>> y = F.broadcast(x)
>>> np.all(x == y.data)
True
>>> z = np.random.uniform(0, 1, (3, 2)).astype('f')
>>> y, w = F.broadcast(x, z)
>>> np.all(x == y.data) & np.all(z == w.data)
True

broadcast_to¶

chainer.functions.broadcast_to(x, shape)[source]¶

Broadcast a given variable to a given shape.

Parameters:	x (`Variable` or `numpy.ndarray` or `cupy.ndarray`) – Input variable be broadcasted. A \((s_1, s_2, ..., s_N)\)-shaped float array. shape (tuple) – Tuple of `int` of the shape of the output variable.
Returns:	Output variable broadcasted to the given shape.
Return type:	Variable

Example

>>> x = np.arange(0, 3)
>>> x
array([0, 1, 2])
>>> y = F.broadcast_to(x, (3, 3))
>>> y.data
array([[0, 1, 2],
       [0, 1, 2],
       [0, 1, 2]])

cast¶

chainer.functions.cast(x, typ)[source]¶

Cast an input variable to a given type.

Parameters:	x (`Variable` or `numpy.ndarray` or `cupy.ndarray`) – Input variable to be casted. A \((s_1, s_2, ..., s_N)\)-shaped float array. typ (`str` of dtype or `numpy.dtype`) – Typecode or data type to cast.
Returns:	Variable holding a casted array.
Return type:	Variable

Example

>>> x = np.arange(0, 3, dtype=np.float64)
>>> x.dtype
dtype('float64')
>>> y = F.cast(x, np.float32)
>>> y.dtype
dtype('float32')
>>> y = F.cast(x, 'float16')
>>> y.dtype
dtype('float16')

concat¶

chainer.functions.concat(xs, axis=1)[source]¶

Concatenates given variables along an axis.

Parameters:	xs (tuple of `Variable` or `numpy.ndarray` or `cupy.ndarray`) – Input variables to be concatenated. The variables must have the same shape, except in the dimension corresponding to axis. axis (int) – The axis along which the arrays will be joined. Default is 1.
Returns:	The concatenated variable.
Return type:	Variable

Example

>>> x = np.arange(0, 12).reshape(3, 4)
>>> x
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
>>> y = np.arange(0, 3).reshape(3, 1)
>>> y
array([[0],
       [1],
       [2]])
>>> z = F.concat((x, y), axis=1)
>>> z.data
array([[ 0,  1,  2,  3,  0],
       [ 4,  5,  6,  7,  1],
       [ 8,  9, 10, 11,  2]])

copy¶

chainer.functions.copy(x, dst)[source]¶

Copies the input variable onto the specified device.

This function copies the array of input variable onto the device specified by dst. When dst == -1, it copies the array onto the host memory. This function supports copies from host to host, from host to device, from device to device and from device to host.

Parameters:	x (`Variable` or `numpy.ndarray` or `cupy.ndarray`) – Variable to be copied. dst (int) – Target device specifier.
Returns:	Output variable.
Return type:	Variable

Example

>>> import chainer.cuda as cuda
>>> x = np.random.uniform(-1, 1, (5, 10))
>>> cuda.get_device_from_array(x).id
-1
>>> y = F.copy(x, 0) # from host to device0
>>> cuda.get_device_from_array(y.data).id
0
>>> z = F.copy(y, -1) # from device0 to host
>>> cuda.get_device_from_array(z.data).id
-1

depth2space¶

chainer.functions.depth2space(X, r)[source]¶

Computes the depth2space transformation for subpixel calculations.

Parameters:	X (`Variable` or `numpy.ndarray` or `cupy.ndarray`) – Variable holding a 4d array of shape `(batch, channel * r * r, dim1, dim2)`. r (int) – the upscaling factor.
Returns:	A variable holding the upscaled array from interspersed depth layers. The shape is `(batch, channel, dim1 * r, dim2 * r)`.
Return type:	Variable

Note

This can be used to compute super-resolution transformations. See https://arxiv.org/abs/1609.05158 for details.

See also

space2depth()

Example

>>> X = np.arange(24).reshape(1, 4, 2, 3).astype('f')
>>> X.shape
(1, 4, 2, 3)
>>> X
array([[[[  0.,   1.,   2.],
         [  3.,   4.,   5.]],

        [[  6.,   7.,   8.],
         [  9.,  10.,  11.]],

        [[ 12.,  13.,  14.],
         [ 15.,  16.,  17.]],

        [[ 18.,  19.,  20.],
         [ 21.,  22.,  23.]]]], dtype=float32)
>>> y = F.depth2space(X, 2)
>>> y.shape
(1, 1, 4, 6)
>>> y.data
array([[[[  0.,   6.,   1.,   7.,   2.,   8.],
         [ 12.,  18.,  13.,  19.,  14.,  20.],
         [  3.,   9.,   4.,  10.,   5.,  11.],
         [ 15.,  21.,  16.,  22.,  17.,  23.]]]], dtype=float32)

dstack¶

chainer.functions.dstack(xs)[source]¶

Concatenate variables along third axis (depth wise).

Parameters:	xs (list of `Variable` or `numpy.ndarray` or `cupy.ndarray`) – Input variables to be concatenated. The variables must have the same `ndim`. When the variables have the third axis (i.e. \(ndim \geq 3\)), the variables must have the same shape along all but the third axis. When the variables do not have the third axis(i.e. \(ndim < 3\)), the variables must have the same shape.
Returns:	Output variable. When the input variables have the third axis (i.e. \(ndim \geq 3\)), the shapes of inputs and output are the same along all but the third axis. The length of third axis is the sum of the lengths of inputs’ third axis. When the shape of variables are `(N1, N2)` (i.e. \(ndim = 2\)), the shape of output is `(N1, N2, 2)`. When the shape of variables are `(N1,)` (i.e. \(ndim = 1\)), the shape of output is `(1, N1, 2)`. When the shape of variables are `()` (i.e. \(ndim = 0\)), the shape of output is `(1, 1, 2)`.
Return type:	Variable

Example

>>> x1 = np.array((1, 2, 3))
>>> x1.shape
(3,)
>>> x2 = np.array((2, 3, 4))
>>> x2.shape
(3,)
>>> y = F.dstack((x1, x2))
>>> y.shape
(1, 3, 2)
>>> y.data
array([[[1, 2],
        [2, 3],
        [3, 4]]])

>>> x1 = np.arange(0, 6).reshape(3, 2)
>>> x1.shape
(3, 2)
>>> x1
array([[0, 1],
       [2, 3],
       [4, 5]])
>>> x2 = np.arange(6, 12).reshape(3, 2)
>>> x2.shape
(3, 2)
>>> x2
array([[ 6,  7],
       [ 8,  9],
       [10, 11]])
>>> y = F.dstack([x1, x2])
>>> y.shape
(3, 2, 2)
>>> y.data
array([[[ 0,  6],
        [ 1,  7]],

       [[ 2,  8],
        [ 3,  9]],

       [[ 4, 10],
        [ 5, 11]]])

>>> x1 = np.arange(0, 12).reshape(3, 2, 2)
>>> x2 = np.arange(12, 18).reshape(3, 2, 1)
>>> y = F.dstack([x1, x2])
>>> y.shape
(3, 2, 3)
>>> y.data
array([[[ 0,  1, 12],
        [ 2,  3, 13]],

       [[ 4,  5, 14],
        [ 6,  7, 15]],

       [[ 8,  9, 16],
        [10, 11, 17]]])

expand_dims¶

chainer.functions.expand_dims(x, axis)[source]¶

Expands dimensions of an input variable without copy.

Parameters:	x (`Variable` or `numpy.ndarray` or `cupy.ndarray`) – Input variable. axis (int) – Position where new axis is to be inserted. The `axis` parameter is acceptable when \(-ndim - 1 \leq axis \leq ndim\). (`ndim` is the dimension of input variables). When \(axis < 0\), the result is the same with \(ndim + 1 - \|axis\|\).
Returns:	Variable that holds a expanded input. The `ndim` of output is one grater than that of `x`.
Return type:	Variable

Example

>>> x = np.array([1, 2, 3])
>>> x.shape
(3,)
>>> y = F.expand_dims(x, axis=0)
>>> y.shape
(1, 3)
>>> y.data
array([[1, 2, 3]])
>>> y = F.expand_dims(x, axis=1)
>>> y.shape
(3, 1)
>>> y.data
array([[1],
       [2],
       [3]])
>>> y = F.expand_dims(x, axis=-2)
>>> y.shape
(1, 3)
>>> y.data
array([[1, 2, 3]])

flatten¶

chainer.functions.flatten(x)[source]¶

Flatten a given array.

Parameters:	x (Varaiable) – Input variable.
Returns:	Output variable.
Return type:	Variable

fliplr¶

chainer.functions.fliplr(a)[source]¶

Flip array in the left/right direction.

Parameters:	xs (Variable) – Input variable.
Returns:	Output variable.
Return type:	Variable

flipud¶

chainer.functions.flipud(a)[source]¶

Flip array in the up/down direction.

Parameters:	xs (Variable) – Input variable.
Returns:	Output variable.
Return type:	Variable

get_item¶

chainer.functions.get_item(x, slices)[source]¶

Extract elements from array with specified shape, axes and offsets.

Parameters:

x (Variable) – A variable to be sliced.
slices (int, slice, Ellipsis, None, integer array-like, boolean array-like or tuple of them) – It is an integer, a slice, an ellipsis, a numpy.newaxis, an integer array-like, a boolean array-like or tuple of them.

Returns:

Variable object: which contains sliced array of x.

Return type:

Variable

Note

It only supports types that are supported by CUDA’s atomicAdd when an integer array is included in slices. The supported types are numpy.float32, numpy.int32, numpy.uint32, numpy.uint64 and numpy.ulonglong.

Note

It does not support slices that contains multiple boolean arrays.

Note

See NumPy document for details of indexing.

hstack¶

chainer.functions.hstack(xs)[source]¶

Concatenate variables horizontally (column wise).

Parameters:	xs (list of `Variable` or `numpy.ndarray` or `cupy.ndarray`) – Input variables to be concatenated. The variables must have the same `ndim`. When the variables have the second axis (i.e. \(ndim \geq 2\)), the variables must have the same shape along all but the second axis. When the variables do not have the second axis(i.e. \(ndim < 2\)), the variables need not to have the same shape.
Returns:	Output variable. When the input variables have the second axis (i.e. \(ndim \geq 2\)), the shapes of inputs and output are the same along all but the second axis. The length of second axis is the sum of the lengths of inputs’ second axis. When the variables do not have the second axis (i.e. \(ndim < 2\)), the shape of output is `(N, )` (`N` is the sum of the input variables’ size).
Return type:	Variable

Example

>>> x1 = np.array((1, 2, 3))
>>> x1.shape
(3,)
>>> x2 = np.array((2, 3, 4))
>>> x2.shape
(3,)
>>> y = F.hstack((x1, x2))
>>> y.shape
(6,)
>>> y.data
array([1, 2, 3, 2, 3, 4])
>>> x1 = np.arange(0, 12).reshape(3, 4)
>>> x1.shape
(3, 4)
>>> x1
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
>>> x2 = np.arange(12, 18).reshape(3, 2)
>>> x2.shape
(3, 2)
>>> x2
array([[12, 13],
       [14, 15],
       [16, 17]])
>>> y = F.hstack([x1, x2])
>>> y.shape
(3, 6)
>>> y.data
array([[ 0,  1,  2,  3, 12, 13],
       [ 4,  5,  6,  7, 14, 15],
       [ 8,  9, 10, 11, 16, 17]])

im2col¶

chainer.functions.im2col(x, ksize, stride=1, pad=0, cover_all=False, dilate=1)[source]¶

Extract patches from an image based on the filter.

This function rearranges patches of an image and put them in the channel dimension of the output.

Patches are extracted at positions shifted by multiples of stride from the first position -pad for each spatial axis. The right-most (or bottom-most) patches do not run over the padded spatial size.

Notation: here is a notation.

\(n\) is the batch size.
\(c\) is the number of the input channels.
\(h\) and \(w\) are the height and width of the input image, respectively.
\(k_H\) and \(k_W\) are the height and width of the filters, respectively.
\(s_Y\) and \(s_X\) are the strides of the filter.
\(p_H\) and \(p_W\) are the spatial padding sizes.
\(d_Y\) and \(d_X\) are the dilation factors of filter application.

The output size \((h_O, w_O)\) is determined by the following equations when cover_all = False:

\[\begin{split}h_O &= (h + 2p_H - k_H - (k_H - 1) * (d_Y - 1)) / s_Y + 1,\\ w_O &= (w + 2p_W - k_W - (k_W - 1) * (d_X - 1)) / s_X + 1.\end{split}\]

When cover_all = True, the output size is determined by the following equations:

\[\begin{split}h_O &= (h + 2p_H - k_H - (k_H - 1) * (d_Y - 1) + s_Y - 1) / s_Y + 1,\\ w_O &= (w + 2p_W - k_W - (k_W - 1) * (d_X - 1) + s_X - 1) / s_X + 1.\end{split}\]

Parameters:	x (Variable) – Input variable of shape \((n, c, h, w)\). ksize (int or pair of ints) – Size of filters (a.k.a. kernels). `ksize=k` and `ksize=(k, k)` are equivalent. stride (int or pair of ints) – Stride of filter applications. `stride=s` and `stride=(s, s)` are equivalent. pad (int or pair of ints) – Spatial padding width for input arrays. `pad=p` and `pad=(p, p)` are equivalent. cover_all (bool) – If `True`, all spatial locations are rearranged into some output pixels. It may make the output size larger. dilate (int or pair of ints) – Dilation factor of filter applications. `dilate=d` and `dilate=(d, d)` are equivalent.
Returns:	Output variable whose shape is \((n, c \cdot k_H \cdot k_W, h_O, w_O)\)
Return type:	Variable

pad¶

chainer.functions.pad(x, pad_width, mode, **keywords)[source]¶

Pad an input variable.

Parameters:	x (chainer.Variable or :class:`numpy.ndarray` or cupy.ndarray) – Input data. pad_width (int or array-like) – Number of values padded to the edges of each axis. mode (str) – Specifies how the function fills the periphery of the array. constant Pads with a constant values. constant_values (int or array-like) – The values are padded for each axis.
Returns:	Output variable.
Return type:	Variable

permutate¶

chainer.functions.permutate(x, indices, axis=0, inv=False)[source]¶

Permutates a given variable along an axis.

This function permutate x with given indices. That means y[i] = x[indices[i]] for all i. Note that this result is same as y = x.take(indices). indices must be a permutation of [0, 1, ..., len(x) - 1].

When inv is True, indices is treated as its inverse. That means y[indices[i]] = x[i].

Parameters:	x (Variable) – Variable to permutate. indices (Variable) – Indices to extract from the variable. axis (int) – Axis that the input array is permutate along. inv (bool) – If `True`, `indices` is treated as its inverse.
Returns:	Output variable.
Return type:	Variable

reshape¶

chainer.functions.reshape(x, shape)[source]¶

Reshapes an input variable without copy.

Parameters:

x (Variable) – Input variable.
shape (tuple of ints) – Target shape.

Returns:

Variable that holds a reshaped version of the input: variable.

Return type:

Variable

resize_images¶

chainer.functions.resize_images(x, output_shape)[source]¶

Resize images to the given shape.

This function resizes 2D data to output_shape. Currently, only bilinear interpolation is supported as the sampling method.

Notatition: here is a notation for dimensionalities.

\(n\) is the batch size.
\(c_I\) is the number of the input channels.
\(h\) and \(w\) are the height and width of the input image, respectively.
\(h_O\) and \(w_O\) are the height and width of the output image.

Parameters:	x (Variable) – Input variable of shape \((n, c_I, h, w)\). output_shape (tuple) – This is a tuple of length 2 whose values are `(h_O, w_O)`. Note that the order of height and width is opposite of the one in OpenCV.
Returns:	Resized image whose shape is \((n, c_I, h_O, w_O)\).
Return type:	Variable

rollaxis¶

chainer.functions.rollaxis(x, axis, start=0)[source]¶

Roll the axis backwards to the given position.

Parameters:	x (Variable) – Input variable. axis (int) – The axis to roll backwards. start (int) – The place to which the axis is moved.
Returns:	Variable whose axis is rolled.
Return type:	Variable

select_item¶

chainer.functions.select_item(x, t)[source]¶

Select elements stored in given indices.

This function returns t.choose(x.T), that means y[i] == x[i, t[i]] for all i.

Parameters:	x (Variable) – Variable storing arrays. t (Variable) – Variable storing index numbers.
Returns:	Variable that holds `t`-th element of `x`.
Return type:	Variable

separate¶

chainer.functions.separate(x, axis=0)[source]¶

Separates an array along a given axis.

This function separates an array along a given axis. For example, shape of an array is (2, 3, 4). When it separates the array with axis=1, it returns three (2, 4) arrays.

This function is an inverse of chainer.functions.stack().

Parameters:	x (chainer.Variable) – Variable to be separated. axis (int) – Axis along which variables are separated.
Returns:	Output variables.
Return type:	tuple of chainer.Variable

See also

chainer.functions.stack()

space2depth¶

chainer.functions.space2depth(X, r)[source]¶

Computes the space2depth transformation for subpixel calculations.

Parameters:	X (`Variable` or `numpy.ndarray` or `cupy.ndarray`) – Variable holding a 4d array of shape `(batch, channel, dim1 * r, dim2 * r)`. r (int) – the downscaling factor.
Returns:	A variable holding the downscaled layer array from subpixel array sampling. The shape is `(batch, channel * r * r, dim1, dim2)`.
Return type:	Variable

Note

This can be used to compute inverse super-resolution transformations. See https://arxiv.org/abs/1609.05158 for details.

See also

depth2space()

Example

>>> X = np.arange(24).reshape(1, 1, 4, 6).astype('f')
>>> X.shape
(1, 1, 4, 6)
>>> X
array([[[[  0.,   1.,   2.,   3.,   4.,   5.],
         [  6.,   7.,   8.,   9.,  10.,  11.],
         [ 12.,  13.,  14.,  15.,  16.,  17.],
         [ 18.,  19.,  20.,  21.,  22.,  23.]]]], dtype=float32)
>>> y = F.space2depth(X, 2)
>>> y.shape
(1, 4, 2, 3)
>>> y.data
array([[[[  0.,   2.,   4.],
         [ 12.,  14.,  16.]],

        [[  1.,   3.,   5.],
         [ 13.,  15.,  17.]],

        [[  6.,   8.,  10.],
         [ 18.,  20.,  22.]],

        [[  7.,   9.,  11.],
         [ 19.,  21.,  23.]]]], dtype=float32)

spatial_transformer_grid¶

chainer.functions.spatial_transformer_grid(theta, output_shape, use_cudnn=True)[source]¶

2D Spatial Transformer grid.

This function generates coordinates of the points sampled from an image to perform warping described in Spatial Transformer Networks.

Given a coordinate in the warped image \((x_i^t, y_i^t)\), the point sampled from the source image \((x_i^s, y_i^s)\) are calculated by the following equation.

\[\begin{split}\left(\begin{matrix} x_i^s \\ y_i^s \end{matrix}\right) = \left(\begin{matrix} \theta_{11} & \theta_{12} & \theta_{13} \\ \theta_{21} & \theta_{22} & \theta_{23} \end{matrix}\right) \left(\begin{matrix} x_i^t \\ y_i^t \\ 1 \end{matrix}\right)\end{split}\]

Notatition: here is a notation for dimensionalities.

\(n\) is the batch size.
\(h_O\) and \(w_O\) are the height and the width of the output image.

Parameters:	theta (Variable) – An array of shape \((n, 2, 3)\). This is a batch of \(2 \times 3\) matrix used for the warping described above. output_shape (tuple) – A tuple of 2 elements: \(h_O, w_O\). use_cudnn (bool) – If `True`, then this function uses cuDNN if available. Note that, cuDNN supports SpatialTransformerGrid from version 5.0.0.
Returns:	A variable of shape \((n, 2, h_O, w_O)\). In the 2nd dimension, the first element is the coordinate along the x axis, and the second element is the coordinate along the y axis. All the coordinates in the image are scaled to fit range \([-1, 1]\). This means that the coordinate \((-1, -1)\) corresponds to the upper-left corner of the input image.
Return type:	Variable

spatial_transformer_sampler¶

chainer.functions.spatial_transformer_sampler(x, grid, use_cudnn=True)[source]¶

2D Spatial Transformer sampler.

This is a differentiable image sampler. With a set of sampling points grid and an input feature map x, this produces a sampled output feature map.

This function currently only supports bilinear interpolation as a sampling kernel.

When coordinates in grid is outside range \([-1, 1]\), values are sampled from a zero padded input image.

Notatition: here is a notation for dimensionalities.

\(n\) is the batch size.
\(c_I\) is the number of the input channels.
\(h\) and \(w\) are the height and width of the input image, respectively.
\(h_O\) and \(w_O\) are the height and width of the output image.

See detail in the following paper: Spatial Transformer Networks.

Parameters:	x (Variable) – Input variable of shape \((n, c_I, h, w)\). grid (Variable) – Coordinate variable of shape \((n, 2, h_O, w_O)\). Each coordinate defines the spatial location in the input where a sampling kernel is applied to get the value at a particular pixel in the output. `grid[idx, :, i, j]` corresponds to the coordinate that is used to sample the values for an output pixel at location \((i, j)\). In the second dimension, the first coordinate corresponds to the location along the horizontal axis, and the second coordinate corresponds to the location along the vertical axis. The coordinate \((-1, -1)\) corresponds to the upper-left corner of the input image. use_cudnn (bool) – If `True`, then this function uses cuDNN if available. Note that, cuDNN supports SpatialTransformerSampler from version 5.0.0.
Returns:	Output feature map of shape \((n, c_I, h_O, w_O)\).
Return type:	Variable

split_axis¶

chainer.functions.split_axis(x, indices_or_sections, axis, force_tuple=False)[source]¶

Splits given variables along an axis.

Parameters:

x (tuple of Variables) – Variables to be split.
indices_or_sections (int or 1-D array) – If this argument is an integer, N, the array will be divided into N equal arrays along axis. If it is a 1-D array of sorted integers, it indicates the positions where the array is split.
axis (int) – Axis that the input array is split along.
force_tuple (bool) – If True, this method returns a tuple even when the number of outputs is one.

Returns:

Tuple of Variable objects: if the number of outputs is more than 1 or Variable otherwise. When force_tuple is True, returned value is always a tuple regardless of the number of outputs.

Return type:

tuple or Variable

Note

This function raises ValueError if at least one of the outputs is split to zero-size (i.e. axis-th value of its shape is zero).

squeeze¶

chainer.functions.squeeze(x, axis=None)[source]¶

Remove demensions of size one from the shape of a ndarray.

Parameters:	x (chainer.Variable or :class:`numpy.ndarray` or cupy.ndarray) – Input data. axis (None or int or tuple of ints) – A subset of the single-dimensional entries in the shape to remove. If `None` is supplied, all of them are removed. The dimension index starts at zero. If an axis with dimension greater than one is selected, an error is raised.
Returns:	Variable whose dimensions of size 1 are removed.
Return type:	Variable

stack¶

chainer.functions.stack(xs, axis=0)[source]¶

Concatenate variables along a new axis.

Parameters:	xs (list of `Variable` or `numpy.ndarray` or `cupy.ndarray`) – Input variables to be concatenated. The variables must have the same shape. axis (int) – The axis along which the arrays will be stacked. The `axis` parameter is acceptable when \(-ndim - 1 \leq axis \leq ndim\). (`ndim` is the dimension of input variables). When \(axis < 0\), the result is the same with \(ndim + 1 - \|axis\|\).
Returns:	Output variable. Let `x_1, x_2, ..., x_n` and `y` be the input variables and the output variable, `y[:, ..., 0, ..., :]` is `x_1`, `y[:, ..., 1, ..., :]` is `x_2` and `y[:, ..., n-1, ..., :]` is `x_n` (The indexed axis indicates the `axis`).
Return type:	Variable

Example

>>> x1 = np.arange(0, 12).reshape(3, 4)
>>> x1.shape
(3, 4)
>>> x1
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
>>> x2 = np.arange(12, 24).reshape(3, 4)
>>> x2.shape
(3, 4)
>>> x2
array([[12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23]])
>>> y = F.stack([x1, x2], axis=0)
>>> y.shape
(2, 3, 4)
>>> y.data
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]])
>>> y = F.stack([x1, x2], axis=1)
>>> y.shape
(3, 2, 4)
>>> y.data
array([[[ 0,  1,  2,  3],
        [12, 13, 14, 15]],

       [[ 4,  5,  6,  7],
        [16, 17, 18, 19]],

       [[ 8,  9, 10, 11],
        [20, 21, 22, 23]]])
>>> y = F.stack([x1, x2], axis=2)
>>> y.shape
(3, 4, 2)
>>> y.data
array([[[ 0, 12],
        [ 1, 13],
        [ 2, 14],
        [ 3, 15]],

       [[ 4, 16],
        [ 5, 17],
        [ 6, 18],
        [ 7, 19]],

       [[ 8, 20],
        [ 9, 21],
        [10, 22],
        [11, 23]]])
>>> y = F.stack([x1, x2], axis=-1)
>>> y.shape
(3, 4, 2)

swapaxes¶

chainer.functions.swapaxes(x, axis1, axis2)[source]¶

Swap two axes of a variable.

Parameters:	x (Variable) – Input variable. axis1 (int) – The first axis to swap. axis2 (int) – The second axis to swap.
Returns:	Variable whose axes are swapped.
Return type:	Variable

tile¶

chainer.functions.tile(x, reps)[source]¶

Construct an array by tiling a given array.

Parameters:	x (chainer.Variable or `numpy.ndarray` or cupy.ndarray) – Input data. reps (int or tuple of ints) – The number of times for each axis with which x is replicated.
Returns:	Variable tiled the given array.
Return type:	Variable

transpose¶

chainer.functions.transpose(x, axes=None)[source]¶

Permute the dimensions of an input variable without copy.

Parameters:	x (Variable) – Input variable. axes (tuple of ints) – By default, reverse the dimensions, otherwise permute the axes according to the values given.
Returns:	Variable whose axes are permuted.
Return type:	Variable

transpose_sequence¶

chainer.functions.transpose_sequence(xs)[source]¶

Transpose a list of Variables.

This function transposes a list of Variable s and returns a list of Variable s. For example a user gives [(0, 1, 2, 3), (4, 5), (6)], the function returns [(0, 4, 6), (1, 5), (2), (3)]. Note that a given list needs to be sorted by each length of Variable.

Parameters:	xs (list of ~chainer.Variable) – Variables to transpose.
Returns:	Transposed list.
Return type:	tuple or Variable

vstack¶

chainer.functions.vstack(xs)[source]¶

Concatenate variables vertically (row wise).

Parameters:	xs (list of `Variable` or `numpy.ndarray` or `cupy.ndarray`) – Input variables to be concatenated. The variables must have the same `ndim`. When the variables have the second axis (i.e. \(ndim \geq 2\)), the variables must have the same shape along all but the first axis. When the variables do not have the second axis(i.e. \(ndim < 2\)), the variables must have the same shape.
Returns:	Output variable. When the input variables have the second axis (i.e. \(ndim \geq 2\)), the shapes of inputs and output are the same along all but the first axis. The length of first axis is the sum of the lengths of inputs’ first axis. When the variables do not have the second axis (i.e. \(ndim < 2\)), the shape of output is `(2, N)` (`N` is the size of the input variable).
Return type:	Variable

Example

>>> x1 = np.array((1, 2, 3))
>>> x1.shape
(3,)
>>> x2 = np.array((2, 3, 4))
>>> x2.shape
(3,)
>>> y = F.vstack((x1, x2))
>>> y.shape
(2, 3)
>>> y.data
array([[1, 2, 3],
       [2, 3, 4]])
>>> x1 = np.arange(0, 12).reshape(3, 4)
>>> x1.shape
(3, 4)
>>> x1
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
>>> x2 = np.arange(12, 20).reshape(2, 4)
>>> x2.shape
(2, 4)
>>> x2
array([[12, 13, 14, 15],
       [16, 17, 18, 19]])
>>> y = F.vstack([x1, x2])
>>> y.shape
(5, 4)
>>> y.data
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19]])

where¶

chainer.functions.where(condition, x, y)[source]¶

Choose elements depending on condition.

This function choose values depending on a given condition. All condition, x, and y must have the same shape.

Parameters:	condition (Variable) – Variable containing the condition. Only boolean array is permitted. x (Variable) – Variable chosen when `condition` is `True`. y (Variable) – Variable chosen when `condition` is `False`.
Returns:	Variable containing chosen values.
Return type:	Variable

Neural network connections¶

bilinear¶

chainer.functions.bilinear(e1, e2, W, V1=None, V2=None, b=None)[source]¶

Applies a bilinear function based on given parameters.

This is a building block of Neural Tensor Network (see the reference paper below). It takes two input variables and one or four parameters, and outputs one variable.

To be precise, denote six input arrays mathematically by \(e^1\in \mathbb{R}^{I\cdot J}\), \(e^2\in \mathbb{R}^{I\cdot K}\), \(W\in \mathbb{R}^{J \cdot K \cdot L}\), \(V^1\in \mathbb{R}^{J \cdot L}\), \(V^2\in \mathbb{R}^{K \cdot L}\), and \(b\in \mathbb{R}^{L}\), where \(I\) is mini-batch size. In this document, we call \(V^1\), \(V^2\), and \(b\) linear parameters.

The output of forward propagation is calculated as

\[y_{il} = \sum_{jk} e^1_{ij} e^2_{ik} W_{jkl} + \ \sum_{j} e^1_{ij} V^1_{jl} + \sum_{k} e^2_{ik} V^2_{kl} + b_{l}.\]

Note that V1, V2, b are optional. If these are not given, then this function omits the last three terms in the above equation.

Note

This function accepts an input variable e1 or e2 of a non-matrix array. In this case, the leading dimension is treated as the batch dimension, and the other dimensions are reduced to one dimension.

Note

In the original paper, \(J\) and \(K\) must be equal and the author denotes \([V^1 V^2]\) (concatenation of matrices) by \(V\).

Parameters:	e1 (Variable) – Left input variable. e2 (Variable) – Right input variable. W (Variable) – Quadratic weight variable. V1 (Variable) – Left coefficient variable. V2 (Variable) – Right coefficient variable. b (Variable) – Bias variable.
Returns:	Output variable.
Return type:	Variable

See:: Reasoning With Neural Tensor Networks for Knowledge Base Completion [Socher+, NIPS2013].

convolution_2d¶

chainer.functions.convolution_2d(x, W, b=None, stride=1, pad=0, use_cudnn=True, cover_all=False, deterministic=False)[source]¶

Two-dimensional convolution function.

This is an implementation of two-dimensional convolution in ConvNets. It takes three variables: the input image x, the filter weight W, and the bias vector b.

Notation: here is a notation for dimensionalities.

\(n\) is the batch size.
\(c_I\) and \(c_O\) are the number of the input and output channels, respectively.
\(h_I\) and \(w_I\) are the height and width of the input image, respectively.
\(h_K\) and \(w_K\) are the height and width of the filters, respectively.
\(h_P\) and \(w_P\) are the height and width of the spatial padding size, respectively.

Then the Convolution2D function computes correlations between filters and patches of size \((h_K, w_K)\) in x. Note that correlation here is equivalent to the inner product between expanded vectors. Patches are extracted at positions shifted by multiples of stride from the first position (-h_P, -w_P) for each spatial axis. The right-most (or bottom-most) patches do not run over the padded spatial size.

Let \((s_Y, s_X)\) be the stride of filter application. Then, the output size \((h_O, w_O)\) is determined by the following equations:

\[\begin{split}h_O &= (h_I + 2h_P - h_K) / s_Y + 1,\\ w_O &= (w_I + 2w_P - w_K) / s_X + 1.\end{split}\]

If cover_all option is True, the filter will cover the all spatial locations. So, if the last stride of filter does not cover the end of spatial locations, an addtional stride will be applied to the end part of spatial locations. In this case, the output size \((h_O, w_O)\) is determined by the following equations:

\[\begin{split}h_O &= (h_I + 2h_P - h_K + s_Y - 1) / s_Y + 1,\\ w_O &= (w_I + 2w_P - w_K + s_X - 1) / s_X + 1.\end{split}\]

If the bias vector is given, then it is added to all spatial locations of the output of convolution.

The two-dimensional convolution function is defined as follows.

Parameters:	x (`Variable` or `numpy.ndarray` or `cupy.ndarray`) – Input variable of shape \((n, c_I, h_I, w_I)\). W (`Variable` or `numpy.ndarray` or `cupy.ndarray`) – Weight variable of shape \((c_O, c_I, h_K, w_K)\). b (`Variable` or `numpy.ndarray` or `cupy.ndarray`) – Bias variable of length \(c_O\) (optional). stride (`int` or pair of `int` s) – Stride of filter applications. `stride=s` and `stride=(s, s)` are equivalent. pad (`int` or pair of `int` s) – Spatial padding width for input arrays. `pad=p` and `pad=(p, p)` are equivalent. use_cudnn (bool) – If `True`, then this function uses cuDNN if available. cover_all (bool) – If `True`, all spatial locations are convoluted into some output pixels. deterministic (bool) – The output of this function can be non-deterministic when it uses cuDNN. If this option is `True`, then it forces cuDNN to use a deterministic algorithm. This option is only available for cuDNN version >= v3.
Returns:	Output variable of shape \((n, c_O, h_O, w_O)\).
Return type:	Variable

See also

Convolution2D

Example

>>> n = 10
>>> c_i, c_o = 3, 1
>>> h_i, w_i = 30, 40
>>> h_k, w_k = 10, 10
>>> h_p, w_p = 5, 5
>>> x = np.random.uniform(0, 1, (n, c_i, h_i, w_i)).astype('f')
>>> x.shape
(10, 3, 30, 40)
>>> W = np.random.uniform(0, 1, (c_o, c_i, h_k, w_k)).astype('f')
>>> W.shape
(1, 3, 10, 10)
>>> b = np.random.uniform(0, 1, (c_o,)).astype('f')
>>> b.shape
(1,)
>>> s_y, s_x = 5, 7
>>> y = F.convolution_2d(x, W, b, stride=(s_y, s_x), pad=(h_p, w_p))
>>> y.shape
(10, 1, 7, 6)
>>> h_o = int((h_i + 2 * h_p - h_k) / s_y + 1)
>>> w_o = int((w_i + 2 * w_p - w_k) / s_x + 1)
>>> y.shape == (n, c_o, h_o, w_o)
True
>>> y = F.convolution_2d(x, W, b, stride=(s_y, s_x), pad=(h_p, w_p), cover_all=True)
>>> y.shape == (n, c_o, h_o, w_o + 1)
True

convolution_nd¶

chainer.functions.convolution_nd(x, W, b=None, stride=1, pad=0, use_cudnn=True, cover_all=False)[source]¶

N-dimensional convolution function.

This is an implementation of N-dimensional convolution which is generalized two-dimensional convolution in ConvNets. It takes three variables: the input x, the filter weight W and the bias vector b.

Notation: here is a notation for dimensionalities.

\(N\) is the number of spatial dimensions.
\(n\) is the batch size.
\(c_I\) and \(c_O\) are the number of the input and output channels, respectively.
\(d_1, d_2, ..., d_N\) are the size of each axis of the input’s spatial dimensions, respectively.
\(k_1, k_2, ..., k_N\) are the size of each axis of the filters, respectively.
\(l_1, l_2, ..., l_N\) are the size of each axis of the output’s spatial dimensions, respectively.
\(p_1, p_2, ..., p_N\) are the size of each axis of the spatial padding size, respectively.

Then the convolution_nd function computes correlations between filters and patches of size \((k_1, k_2, ..., k_N)\) in x. Note that correlation here is equivalent to the inner product between expanded tensors. Patches are extracted at positions shifted by multiples of stride from the first position (-p_1, -p_2, ..., -p_N) for each spatial axis.

Let \((s_1, s_2, ..., s_N)\) be the stride of filter application. Then, the output size \((l_1, l_2, ..., l_N)\) is determined by the following equations:

\[l_n = (d_n + 2p_n - k_n) / s_n + 1 \ \ (n = 1, ..., N)\]

If cover_all option is True, the filter will cover the all spatial locations. So, if the last stride of filter does not cover the end of spatial locations, an addtional stride will be applied to the end part of spatial locations. In this case, the output size is determined by the following equations:

\[l_n = (d_n + 2p_n - k_n + s_n - 1) / s_n + 1 \ \ (n = 1, ..., N)\]

The N-dimensional convolution function is defined as follows.

Parameters:	x (`Variable` or `numpy.ndarray` or `cupy.ndarray`) – Input variable of shape \((n, c_I, d_1, d_2, ..., d_N)\). W (`Variable` or `numpy.ndarray` or `cupy.ndarray`) – Weight variable of shape \((c_O, c_I, k_1, k_2, ..., k_N)\). b (`Variable` or `numpy.ndarray` or `cupy.ndarray`) – One-dimensional bias variable with length \(c_O\) (optional). stride (`int` or `tuple` of `int` s) – Stride of filter applications \((s_1, s_2, ..., s_N)\). `stride=s` is equivalent to `(s, s, ..., s)`. pad (`int` or `tuple` of `int` s) – Spatial padding width for input arrays \((p_1, p_2, ..., p_N)\). `pad=p` is equivalent to `(p, p, ..., p)`. use_cudnn (bool) – If `True`, then this function uses cuDNN if available. See below for the excact conditions. cover_all (bool) – If `True`, all spatial locations are convoluted into some output pixels. It may make the output size larger. cover_all needs to be `False` if you want to use cuDNN.
Returns:	Output variable of shape \((n, c_O, l_1, l_2, ..., l_N)\).
Return type:	Variable

Note

This function uses cuDNN implementation for its forward and backward computation if ALL of the following conditions are satisfied:

cuda.cudnn_enabled is True
use_cudnn is True
The number of spatial dimensions is more than one.
cover_all is False
The input’s dtype is equal to the filter weight’s.
The dtype is FP16, FP32 or FP64. (FP16 is only available when cuDNN version \(\geq\) v3.)

See also

ConvolutionND, convolution_2d()

Example

>>> n = 10
>>> c_i, c_o = 3, 1
>>> d1, d2, d3 = 30, 40, 50
>>> k1, k2, k3 = 10, 10, 10
>>> p1, p2, p3 = 5, 5, 5
>>> x = np.random.uniform(0, 1, (n, c_i, d1, d2, d3)).astype('f')
>>> x.shape
(10, 3, 30, 40, 50)
>>> W = np.random.uniform(0, 1, (c_o, c_i, k1, k2, k3)).astype('f')
>>> W.shape
(1, 3, 10, 10, 10)
>>> b = np.random.uniform(0, 1, (c_o)).astype('f')
>>> b.shape
(1,)
>>> s1, s2, s3 = 2, 4, 6
>>> y = F.convolution_nd(x, W, b, stride=(s1, s2, s3), pad=(p1, p2, p3))
>>> y.shape
(10, 1, 16, 11, 9)
>>> l1 = int((d1 + 2 * p1 - k1) / s1 + 1)
>>> l2 = int((d2 + 2 * p2 - k2) / s2 + 1)
>>> l3 = int((d3 + 2 * p3 - k3) / s3 + 1)
>>> y.shape == (n, c_o, l1, l2, l3)
True
>>> y = F.convolution_nd(x, W, b, stride=(s1, s2, s3), pad=(p1, p2, p3), cover_all=True)
>>> y.shape == (n, c_o, l1, l2, l3 + 1)
True

deconvolution_2d¶

chainer.functions.deconvolution_2d(x, W, b=None, stride=1, pad=0, outsize=None, use_cudnn=True, deterministic=False)[source]¶

Two dimensional deconvolution function.

This is an implementation of two-dimensional deconvolution. In most of deep learning frameworks and papers, this function is called transposed convolution. But because of historical reasons (e.g. paper by Ziller Deconvolutional Networks) and backward compatibility, this function is called deconvolution in Chainer.

It takes three variables: input image x, the filter weight W, and the bias vector b.

Notation: here is a notation for dimensionalities.

\(n\) is the batch size.
\(c_I\) and \(c_O\) are the number of the input and output channels, respectively.
\(h_I\) and \(w_I\) are the height and width of the input image, respectively.
\(h_K\) and \(w_K\) are the height and width of the filters, respectively.
\(h_P\) and \(w_P\) are the height and width of the spatial padding size, respectively.

Let \((s_Y, s_X)\) be the stride of filter application. Then, the output size \((h_O, w_O)\) is estimated by the following equations:

\[\begin{split}h_O &= s_Y (h_I - 1) + h_K - 2h_P,\\ w_O &= s_X (w_I - 1) + w_K - 2w_P.\end{split}\]

Parameters:	x (`Variable` or `numpy.ndarray` or `cupy.ndarray`) – Input variable of shape \((n, c_I, h_I, w_I)\). W (`Variable` or `numpy.ndarray` or `cupy.ndarray`) – Weight variable of shape \((c_I, c_O, h_K, w_K)\). b (`Variable` or `numpy.ndarray` or `cupy.ndarray`) – Bias variable of length \(c_O\) (optional). stride (`int` or pair of `int` s) – Stride of filter applications. `stride=s` and `stride=(s, s)` are equivalent. pad (`int` or pair of `int` s) – Spatial padding width for input arrays. `pad=p` and `pad=(p, p)` are equivalent. outsize (`tuple` of `int`) – Expected output size of deconvolutional operation. It should be pair of height and width \((h_O, w_O)\). Default value is `None` and the outsize is estimated by input size, stride and pad. use_cudnn (bool) – If `True`, then this function uses cuDNN if available. deterministic (bool) – The output of this function can be non-deterministic when it uses cuDNN. If this option is `True`, then it forces cuDNN to use a deterministic algorithm. This option is only available for cuDNN version >= v3.
Returns:	Output variable of shape \((n, c_O, h_O, w_O)\).
Return type:	Variable

Example

>>> n = 10
>>> c_i, c_o = 1, 3
>>> h_i, w_i = 5, 10
>>> h_k, w_k = 10, 10
>>> h_p, w_p = 5, 5
>>> x = np.random.uniform(0, 1, (n, c_i, h_i, w_i)).astype('f')
>>> x.shape
(10, 1, 5, 10)
>>> W = np.random.uniform(0, 1, (c_i, c_o, h_k, w_k)).astype('f')
>>> W.shape
(1, 3, 10, 10)
>>> b = np.random.uniform(0, 1, c_o).astype('f')
>>> b.shape
(3,)
>>> s_y, s_x = 5, 5
>>> y = F.deconvolution_2d(x, W, b, stride=(s_y, s_x), pad=(h_p, w_p))
>>> y.shape
(10, 3, 20, 45)
>>> h_o = s_y * (h_i - 1) + h_k - 2 * h_p
>>> w_o = s_x * (w_i - 1) + w_k - 2 * w_p
>>> y.shape == (n, c_o, h_o, w_o)
True

deconvolution_nd¶

chainer.functions.deconvolution_nd(x, W, b=None, stride=1, pad=0, outsize=None, use_cudnn=True)[source]¶

N-dimensional deconvolution function.

This is an implementation of N-dimensional deconvolution which generalizes two-dimensional one. In most of deep learning frameworks and papers, this function is called transposed convolution. But because of historical reasons (e.g. paper by Ziller Deconvolutional Networks) and backward compatibility, this function is called deconvolution in Chainer.

It takes three variables: the input x, the filter weight W, and the bias vector b.

Notation: here is a notation for dimensionalities.

\(N\) is the number of spatial dimensions.
\(n\) is the batch size.
\(c_I\) and \(c_O\) are the number of the input and output channels, respectively.
\(d_1, d_2, ..., d_N\) are the size of each axis of the input’s spatial dimensions, respectively.
\(k_1, k_2, ..., k_N\) are the size of each axis of the filters, respectively.
\(p_1, p_2, ..., p_N\) are the size of each axis of the spatial padding size, respectively.
\(s_1, s_2, ..., s_N\) are the stride of each axis of filter application, respectively.

If outsize option is None, the output size \((l_1, l_2, ..., l_N)\) is determined by the following equations with the items in the above list:

\[l_n = s_n (d_n - 1) + k_n - 2 p_n \ \ (n = 1, ..., N)\]

If outsize option is given, the output size is determined by outsize. In this case, the outsize \((l_1, l_2, ..., l_N)\) must satisfy the following equations:

\[d_n = \lfloor (l_n + 2p_n - k_n) / s_n \rfloor + 1 \ \ (n = 1, ..., N)\]

Parameters:	x (`Variable` or `numpy.ndarray` or `cupy.ndarray`) – Input variable of shape \((n, c_I, d_1, d_2, ..., d_N)\). W (`Variable` or `numpy.ndarray` or `cupy.ndarray`) – Weight variable of shape \((c_I, c_O, k_1, k_2, ..., k_N)\). b (`Variable` or `numpy.ndarray` or `cupy.ndarray`) – One-dimensional bias variable with length \(c_O\) (optional). stride (`int` or `tuple` of `int` s) – Stride of filter applications \((s_1, s_2, ..., s_N)\). `stride=s` is equivalent to `(s, s, ..., s)`. pad (`int` or `tuple` of `int` s) – Spatial padding width for input arrays \((p_1, p_2, ..., p_N)\). `pad=p` is equivalent to `(p, p, ..., p)`. outsize (`tuple` of `int` s) – Expected output size of deconvolutional operation. It should be a tuple of ints \((l_1, l_2, ..., l_N)\). Default value is `None` and the outsize is estimated by input size, stride and pad. use_cudnn (bool) – If `True`, then this function uses cuDNN if available. Note that cuDNN supports more than one-dimensional deconvolution operations only.
Returns:	Output variable of shape \((n, c_O, l_1, l_2, ..., l_N)\).
Return type:	Variable

See also

links.DeconvolutionND, deconvolution_2d()

Example

Example1: the case when outsize is not given.

>>> n = 10
>>> c_i, c_o = 3, 1
>>> d1, d2, d3 = 5, 10, 15
>>> k1, k2, k3 = 10, 10, 10
>>> p1, p2, p3 = 5, 5, 5
>>> x = np.random.uniform(0, 1, (n, c_i, d1, d2, d3)).astype('f')
>>> x.shape
(10, 3, 5, 10, 15)
>>> W = np.random.uniform(0, 1, (c_i, c_o, k1, k2, k3)).astype('f')
>>> W.shape
(3, 1, 10, 10, 10)
>>> b = np.random.uniform(0, 1, (c_o)).astype('f')
>>> b.shape
(1,)
>>> s1, s2, s3 = 2, 4, 6
>>> y = F.deconvolution_nd(x, W, b, stride=(s1, s2, s3), pad=(p1, p2, p3))
>>> y.shape
(10, 1, 8, 36, 84)
>>> l1 = s1 * (d1 - 1) + k1 - 2 * p1
>>> l2 = s2 * (d2 - 1) + k2 - 2 * p2
>>> l3 = s3 * (d3 - 1) + k3 - 2 * p3
>>> y.shape == (n, c_o, l1, l2, l3)
True

Example2: the case when outsize is given.

>>> n = 10
>>> c_i, c_o = 3, 1
>>> d1, d2, d3 = 5, 10, 15
>>> k1, k2, k3 = 10, 10, 10
>>> p1, p2, p3 = 5, 5, 5
>>> x = np.random.uniform(0, 1, (n, c_i, d1, d2, d3)).astype('f')
>>> x.shape
(10, 3, 5, 10, 15)
>>> W = np.random.uniform(0, 1, (c_i, c_o, k1, k2, k3)).astype('f')
>>> W.shape
(3, 1, 10, 10, 10)
>>> b = np.random.uniform(0, 1, (c_o)).astype('f')
>>> b.shape
(1,)
>>> s1, s2, s3 = 2, 4, 6
>>> l1, l2, l3 = 9, 38, 87
>>> d1 == int((l1 + 2 * p1 - k1) / s1) + 1
True
>>> d2 == int((l2 + 2 * p2 - k2) / s2) + 1
True
>>> d3 == int((l3 + 2 * p3 - k3) / s3) + 1
True
>>> y = F.deconvolution_nd(x, W, b, stride=(s1, s2, s3), pad=(p1, p2, p3), outsize=(l1, l2, l3))
>>> y.shape
(10, 1, 9, 38, 87)
>>> y.shape == (n, c_o, l1, l2, l3)
True

depthwise_convolution_2d¶

chainer.functions.depthwise_convolution_2d(x, W, b=None, stride=1, pad=0)[source]¶

Two-dimensional depthwise convolution function.

This is an implementation of two-dimensional depthwise convolution. It takes two or three variables: the input image x, the filter weight W, and optionally, the bias vector b.

Notation: here is a notation for dimensionalities.

\(n\) is the batch size.
\(c_I\) is the number of the input.
\(c_M\) is the channel multiplier.
\(h\) and \(w\) are the height and width of the input image, respectively.
\(h_O\) and \(w_O\) are the height and width of the output image, respectively.
\(k_H\) and \(k_W\) are the height and width of the filters, respectively.

Parameters:	x (chainer.Variable or `numpy.ndarray` or cupy.ndarray) – Input variable of shape \((n, c_I, h, w)\). W (Variable) – Weight variable of shape \((c_M, c_I, k_H, k_W)\). b (Variable) – Bias variable of length \(c_M * c_I\) (optional). stride (int or pair of ints) – Stride of filter applications. `stride=s` and `stride=(s, s)` are equivalent. pad (int or pair of ints) – Spatial padding width for input arrays. `pad=p` and `pad=(p, p)` are equivalent.
Returns:	Output variable. Its shape is \((n, c_I * c_M, h_O, w_O)\).
Return type:	Variable

Like Convolution2D, DepthwiseConvolution2D function computes correlations between filters and patches of size \((k_H, k_W)\) in x. But unlike Convolution2D, DepthwiseConvolution2D does not add up input channels of filters but concatenates them. For that reason, the shape of outputs of depthwise convolution are \((n, c_I * c_M, h_O, w_O)\), \(c_M\) is called channel_multiplier.

\((h_O, w_O)\) is determined by the equivalent equation of Convolution2D.

If the bias vector is given, then it is added to all spatial locations of the output of convolution.

See: L. Sifre. Rigid-motion scattering for image classification

See also

DepthwiseConvolution2D

Example

>>> x = np.random.uniform(0, 1, (2, 3, 4, 7))
>>> W = np.random.uniform(0, 1, (2, 3, 3, 3))
>>> b = np.random.uniform(0, 1, (6,))
>>> y = F.depthwise_convolution_2d(x, W, b)
>>> y.shape
(2, 6, 2, 5)

dilated_convolution_2d¶

chainer.functions.dilated_convolution_2d(x, W, b=None, stride=1, pad=0, dilate=1, use_cudnn=True, cover_all=False)[source]¶

Two-dimensional dilated convolution function.

This is an implementation of two-dimensional dilated convolution in ConvNets. It takes three variables: the input image x, the filter weight W, and the bias vector b.

Notation: here is a notation for dimensionalities.

\(n\) is the batch size.
\(c_I\) and \(c_O\) are the number of the input and output, respectively.
\(h\) and \(w\) are the height and width of the input image, respectively.
\(k_H\) and \(k_W\) are the height and width of the filters, respectively.

Parameters:	x (Variable) – Input variable of shape \((n, c_I, h, w)\). W (Variable) – Weight variable of shape \((c_O, c_I, k_H, k_W)\). b (Variable) – Bias variable of length \(c_O\) (optional). stride (int or pair of ints) – Stride of filter applications. `stride=s` and `stride=(s, s)` are equivalent. pad (int or pair of ints) – Spatial padding width for input arrays. `pad=p` and `pad=(p, p)` are equivalent. dilate (int or pair of ints) – Dilation factor of filter applications. `dilate=d` and `dilate=(d, d)` are equivalent. use_cudnn (bool) – If `True`, then this function uses cuDNN if available. cover_all (bool) – If `True`, all spatial locations are convoluted into some output pixels. It may make the output size larger.
Returns:	Output variable.
Return type:	Variable

The two-dimensional dilated convolution function is defined as follows. Then the DilatedConvolution2D function computes correlations between filters and patches of size \((k_H, k_W)\) in x. Patches here are extracted at intervals of the dilation factor. Note that correlation here is equivalent to the inner product between expanded vectors. Patches are extracted at intervals of the dilation factor and at positions shifted by multiples of stride from the first position -pad for each spatial axis. The right-most (or bottom-most) patches do not run over the padded spatial size.

Let \((s_Y, s_X)\) be the stride of filter application, \((p_H, p_W)\) the spatial padding size, and \((d_Y, d_X)\) the dilation factor of filter application. Then, the output size \((h_O, w_O)\) is determined by the following equations:

\[\begin{split}h_O &= (h + 2p_H - k_H - (k_H - 1) * (d_Y - 1)) / s_Y + 1,\\ w_O &= (w + 2p_W - k_W - (k_W - 1) * (d_X - 1)) / s_X + 1.\end{split}\]

If the bias vector is given, then it is added to all spatial locations of the output of convolution.

See also

DilatedConvolution2D

embed_id¶

chainer.functions.embed_id(x, W, ignore_label=None)[source]¶

Efficient linear function for one-hot input.

This function implements so called word embedding. It takes two arguments: a set of IDs (words) x in \(B\) dimensional integer vector, and a set of all ID (word) embeddings W in \(V \times d\) float32 matrix. It outputs \(B \times d\) matrix whose i-th column is the x[i]-th column of W.

This function is only differentiable on the input W.

Parameters:	x (Variable) – Batch vectors of IDs. W (Variable) – Representation of each ID (a.k.a. word embeddings). ignore_label (int or None) – If `ignore_label` is an int value, `i`-th column of return value is filled with `0`.
Returns:	Output variable.
Return type:	Variable

See also

EmbedID

linear¶

chainer.functions.linear(x, W, b=None)[source]¶

Linear function, or affine transformation.

It accepts two or three arguments: an input minibatch x, a weight matrix W, and optionally a bias vector b. It computes

\[Y = xW^\top + b.\]

Parameters:	x (`Variable` or `numpy.ndarray` or `cupy.ndarray`) – Input variable, which is a \((s_B, s_1, s_2, ..., s_n)\)-shaped float array. Its first dimension \((s_B)\) is assumed to be the minibatch dimension. The other dimensions are treated as concatenated one dimension whose size must be \((s_1 * ... * s_n = N)\). W (`Variable` or `numpy.ndarray` or `cupy.ndarray`) – Weight variable of shape \((M, N)\), where \((N = s_1 * ... * s_n)\). b (`Variable` or `numpy.ndarray` or `cupy.ndarray`) – Bias variable (optional) of shape \((M,)\).
Returns:	Output variable. A float array with shape of \((s_B, M)\).
Return type:	Variable

See also

Linear

Example

>>> x = np.random.uniform(0, 1, (3, 4)).astype('f')
>>> W = np.random.uniform(0, 1, (5, 4)).astype('f')
>>> b = np.random.uniform(0, 1, (5,)).astype('f')
>>> y = F.linear(x, W, b)
>>> y.shape
(3, 5)

n_step_bigru¶

chainer.functions.n_step_bigru(n_layers, dropout_ratio, hx, ws, bs, xs, train=True, use_cudnn=True)[source]¶

Stacked Bi-directional Gated Recurrent Unit function.

This function calculates stacked Bi-directional GRU with sequences. This function gets an initial hidden state \(h_0\), an input sequence \(x\), weight matrices \(W\), and bias vectors \(b\). This function calculates hidden states \(h_t\) for each time \(t\) from input \(x_t\).

\[\begin{split}r^{f}_t &= \sigma(W^{f}_0 x_t + W^{f}_3 h_{t-1} + b^{f}_0 + b^{f}_3) \\ z^{f}_t &= \sigma(W^{f}_1 x_t + W^{f}_4 h_{t-1} + b^{f}_1 + b^{f}_4) \\ h^{f'}_t &= \tanh(W^{f}_2 x_t + b^{f}_2 + r^{f}_t \cdot (W^{f}_5 h_{t-1} + b^{f}_5)) \\ h^{f}_t &= (1 - z^{f}_t) \cdot h^{f'}_t + z^{f}_t \cdot h_{t-1} \\ r^{b}_t &= \sigma(W^{b}_0 x_t + W^{b}_3 h_{t-1} + b^{b}_0 + b^{b}_3) \\ z^{b}_t &= \sigma(W^{b}_1 x_t + W^{b}_4 h_{t-1} + b^{b}_1 + b^{b}_4) \\ h^{b'}_t &= \tanh(W^{b}_2 x_t + b^{b}_2 + r^{b}_t \cdot (W^{b}_5 h_{t-1} + b^{b}_5)) \\ h^{b}_t &= (1 - z^{b}_t) \cdot h^{b'}_t + z^{b}_t \cdot h_{t-1} \\ h_t &= [h^{f}_t; h^{f}_t] \\\end{split}\]

where \(W^{f}\) is weight matrices for forward-GRU, \(W^{b}\) is weight matrices for backward-GRU.

As the function accepts a sequence, it calculates \(h_t\) for all \(t\) with one call. Six weight matrices and six bias vectors are required for each layers. So, when \(S\) layers exists, you need to prepare \(6S\) weigth matrices and \(6S\) bias vectors.

If the number of layers n_layers is greather than \(1\), input of k-th layer is hidden state h_t of k-1-th layer. Note that all input variables except first layer may have different shape from the first layer.

Parameters:

n_layers (int) – Number of layers.
dropout_ratio (float) – Dropout ratio.
hx (chainer.Variable) – Variable holding stacked hidden states. Its shape is (S, B, N) where S is number of layers and is equal to n_layers, B is mini-batch size, and N is dimention of hidden units.
ws (list of list of chainer.Variable) – Weight matrices. ws[i] represents weights for i-th layer. Each ws[i] is a list containing six matrices. ws[i][j] is corresponding with W_j in the equation. Only ws[0][j] where 0 <= j < 3 is (I, N) shape as they are multiplied with input variables. All other matrices has (N, N) shape.
bs (list of list of chainer.Variable) – Bias vectors. bs[i] represnents biases for i-th layer. Each bs[i] is a list containing six vectors. bs[i][j] is corresponding with b_j in the equation. Shape of each matrix is (N,) where N is dimention of hidden units.
xs (list of chainer.Variable) – A list of Variable holding input values. Each element xs[t] holds input value for time t. Its shape is (B_t, I), where B_t is mini-batch size for time t, and I is size of input units. Note that this functions supports variable length sequences. When sequneces has different lengths, sort sequences in descending order by length, and transpose the sorted sequence. transpose_sequence() transpose a list of Variable() holding sequence. So xs needs to satisfy xs[t].shape[0] >= xs[t + 1].shape[0].
train (bool) – If True, this function executes dropout.
use_cudnn (bool) – If True, this function uses cuDNN if available.
use_bi_direction (bool) – If True, this function uses Bi-direction GRU.

Returns:

This functions returns a tuple concaining three elements,: hy and ys. - hy is an updated hidden states whose shape is same as hx. - ys is a list of Variable . Each element

ys[t] holds hidden states of the last layer corresponding to an input xs[t]. Its shape is (B_t, N) where B_t is mini-batch size for time t, and N is size of hidden units. Note that B_t is the same value as xs[t].

Return type:

tuple

n_step_bilstm¶

chainer.functions.n_step_bilstm(n_layers, dropout_ratio, hx, cx, ws, bs, xs, train=True, use_cudnn=True)[source]¶

Stacked Bi-directional Long Short-Term Memory function.

This function calculates stacked Bi-directional LSTM with sequences. This function gets an initial hidden state \(h_0\), an initial cell state \(c_0\), an input sequence \(x\), weight matrices \(W\), and bias vectors \(b\). This function calculates hidden states \(h_t\) and \(c_t\) for each time \(t\) from input \(x_t\).

\[\begin{split}i^{f}_t &=& \sigma(W^{f}_0 x_t + W^{f}_4 h_{t-1} + b^{f}_0 + b^{f}_4), \\ f^{f}_t &=& \sigma(W^{f}_1 x_t + W^{f}_5 h_{t-1} + b^{f}_1 + b^{f}_5), \\ o^{f}_t &=& \sigma(W^{f}_2 x_t + W^{f}_6 h_{t-1} + b^{f}_2 + b^{f}_6), \\ a^{f}_t &=& \tanh(W^{f}_3 x_t + W^{f}_7 h_{t-1} + b^{f}_3 + b^{f}_7), \\ c^{f}_t &=& f^{f}_t \cdot c^{f}_{t-1} + i^{f}_t \cdot a^{f}_t, \\ h^{f}_t &=& o^{f}_t \cdot \tanh(c^{f}_t), \\ i^{b}_t &=& \sigma(W^{b}_0 x_t + W^{b}_4 h_{t-1} + b^{b}_0 + b^{b}_4), \\ f^{b}_t &=& \sigma(W^{b}_1 x_t + W^{b}_5 h_{t-1} + b^{b}_1 + b^{b}_5), \\ o^{b}_t &=& \sigma(W^{b}_2 x_t + W^{b}_6 h_{t-1} + b^{b}_2 + b^{b}_6), \\ a^{b}_t &=& \tanh(W^{b}_3 x_t + W^{b}_7 h_{t-1} + b^{b}_3 + b^{b}_7), \\ c^{b}_t &=& f^{b}_t \cdot c^{b}_{t-1} + i^{b}_t \cdot a^{b}_t, \\ h^{b}_t &=& o^{b}_t \cdot \tanh(c^{b}_t), \\ h_t &=& [h^{f}; h^{b}]\end{split}\]

where \(W^{f}\) is weight matrices for forward-LSTM, \(W^{b}\) is weight matrices for backward-LSTM.

As the function accepts a sequence, it calculates \(h_t\) for all \(t\) with one call. Eight weight matrices and eight bias vectors are required for each layers. So, when \(S\) layers exists, you need to prepare \(8S\) weigth matrices and \(8S\) bias vectors.

If the number of layers n_layers is greather than \(1\), input of k-th layer is hidden state h_t of k-1-th layer. Note that all input variables except first layer may have different shape from the first layer.

Parameters:

n_layers (int) – Number of layers.
dropout_ratio (float) – Dropout ratio.
hx (chainer.Variable) – Variable holding stacked hidden states. Its shape is (S, B, N) where S is number of layers and is equal to n_layers, B is mini-batch size, and N is dimention of hidden units.
cx (chainer.Variable) – Variable holding stacked cell states. It has the same shape as hx.
ws (list of list of chainer.Variable) – Weight matrices. ws[i] represents weights for i-th layer. Each ws[i] is a list containing eight matrices. ws[i][j] is corresponding with W_j in the equation. Only ws[0][j] where 0 <= j < 4 is (I, N) shape as they are multiplied with input variables. All other matrices has (N, N) shape.
bs (list of list of chainer.Variable) – Bias vectors. bs[i] represnents biases for i-th layer. Each bs[i] is a list containing eight vectors. bs[i][j] is corresponding with b_j in the equation. Shape of each matrix is (N,) where N is dimention of hidden units.
xs (list of chainer.Variable) – A list of Variable holding input values. Each element xs[t] holds input value for time t. Its shape is (B_t, I), where B_t is mini-batch size for time t, and I is size of input units. Note that this functions supports variable length sequences. When sequneces has different lengths, sort sequences in descending order by length, and transpose the sorted sequence. transpose_sequence() transpose a list of Variable() holding sequence. So xs needs to satisfy xs[t].shape[0] >= xs[t + 1].shape[0].
train (bool) – If True, this function executes dropout.
use_cudnn (bool) – If True, this function uses cuDNN if available.

Returns:

This functions returns a tuple concaining three elements,: hy, cy and ys. - hy is an updated hidden states whose shape is same as hx. - cy is an updated cell states whose shape is same as cx. - ys is a list of Variable . Each element

ys[t] holds hidden states of the last layer corresponding to an input xs[t]. Its shape is (B_t, N) where B_t is mini-batch size for time t, and N is size of hidden units. Note that B_t is the same value as xs[t].

Return type:

tuple

n_step_birnn¶

chainer.functions.n_step_birnn(n_layers, dropout_ratio, hx, ws, bs, xs, train=True, use_cudnn=True, activation='tanh')[source]¶

Stacked Bi-directional RNN function for sequence inputs.

This function calculates stacked Bi-directional RNN with sequences. This function gets an initial hidden state \(h_0\), an initial cell state \(c_0\), an input sequence \(x\), weight matrices \(W\), and bias vectors \(b\). This function calculates hidden states \(h_t\) and \(c_t\) for each time \(t\) from input \(x_t\).

\[\begin{split}h^{f}_t &=& f(W^{f}_0 x_t + W^{f}_1 h_{t-1} + b^{f}_0 + b^{f}_1), \\ h^{b}_t &=& f(W^{b}_0 x_t + W^{b}_1 h_{t-1} + b^{b}_0 + b^{b}_1), \\ h_t &=& [h^{f}_t; h^{f}_t], \\\end{split}\]

where \(f\) is an activation function.

Weight matrices \(W\) contains two matrices \(W^{f}\) and \(W^{b}\). \(W^{f}\) is weight matrices for forward directional RNN. \(W^{b}\) is weight matrices for backward directional RNN.

\(W^{f}\) contains \(W^{f}_0\) for an input sequence and \(W^{f}_1\) for a hidden state. \(W^{b}\) contains \(W^{b}_0\) for an input sequence and \(W^{b}_1\) for a hidden state.

Bias matrices \(b\) contains two matrices \(b^{f}\) and \(b^{f}\). \(b^{f}\) contains \(b^{f}_0\) for an input sequence and \(b^{f}_1\) for a hidden state. \(b^{b}\) contains \(b^{b}_0\) for an input sequence and \(b^{b}_1\) for a hidden state.

As the function accepts a sequence, it calculates \(h_t\) for all \(t\) with one call. Two weight matrices and two bias vectors are required for each layer. So, when \(S\) layers exist, you need to prepare \(2S\) weigth matrices and \(2S\) bias vectors.

If the number of layers n_layers is greather than \(1\), input of k-th layer is hidden state h_t of k-1-th layer. Note that all input variables except first layer may have different shape from the first layer.

Parameters:

n_layers (int) – Number of layers.
dropout_ratio (float) – Dropout ratio.
hx (chainer.Variable) – Variable holding stacked hidden states. Its shape is (S, B, N) where S is number of layers and is equal to n_layers, B is mini-batch size, and N is dimention of hidden units.
ws (list of list of chainer.Variable) – Weight matrices. ws[i + di] represents weights for i-th layer. Note that di = 0 for forward-RNN and di = 1 for backward-RNN. Each ws[i + di] is a list containing two matrices. ws[i + di][j] is corresponding with W^{f}_j if di = 0 and corresponding with W^{b}_j if di = 1 in the equation. Only ws[0][j] and ws[1][j] where 0 <= j < 1 are (I, N) shape as they are multiplied with input variables. All other matrices has (N, N) shape.
bs (list of list of chainer.Variable) – Bias vectors. bs[i + di] represnents biases for i-th layer. Note that di = 0 for forward-RNN and di = 1 for backward-RNN. Each bs[i + di] is a list containing two vectors. bs[i + di][j] is corresponding with b^{f}_j if di = 0 and corresponding with b^{b}_j if di = 1 in the equation. Shape of each matrix is (N,) where N is dimention of hidden units.
xs (list of chainer.Variable) – A list of Variable holding input values. Each element xs[t] holds input value for time t. Its shape is (B_t, I), where B_t is mini-batch size for time t, and I is size of input units. Note that this functions supports variable length sequences. When sequneces has different lengths, sort sequences in descending order by length, and transpose the sorted sequence. transpose_sequence() transpose a list of Variable() holding sequence. So xs needs to satisfy xs[t].shape[0] >= xs[t + 1].shape[0].
train (bool) – If True, this function executes dropout.
use_cudnn (bool) – If True, this function uses cuDNN if available.
activation (str) – Activation function name. Please select tanh or relu.

Returns:

This functions returns a tuple concaining three elements,

hy and ys.

hy is an updated hidden states whose shape is same as hx.
ys is a list of Variable . Each element ys[t] holds hidden states of the last layer corresponding to an input xs[t]. Its shape is (B_t, N) where B_t is mini-batch size for time t, and N is size of hidden units. Note that B_t is the same value as xs[t].

Return type:

tuple

n_step_gru¶

chainer.functions.n_step_gru(n_layers, dropout_ratio, hx, ws, bs, xs, train=True, use_cudnn=True)[source]¶

Stacked Uni-directional Gated Recurrent Unit function.

This function calculates stacked Uni-directional GRU with sequences. This function gets an initial hidden state \(h_0\), an input sequence \(x\), weight matrices \(W\), and bias vectors \(b\). This function calculates hidden states \(h_t\) for each time \(t\) from input \(x_t\).

\[\begin{split}r_t &= \sigma(W_0 x_t + W_3 h_{t-1} + b_0 + b_3) \\ z_t &= \sigma(W_1 x_t + W_4 h_{t-1} + b_1 + b_4) \\ h'_t &= \tanh(W_2 x_t + b_2 + r_t \cdot (W_5 h_{t-1} + b_5)) \\ h_t &= (1 - z_t) \cdot h'_t + z_t \cdot h_{t-1}\end{split}\]

As the function accepts a sequence, it calculates \(h_t\) for all \(t\) with one call. Six weight matrices and six bias vectors are required for each layers. So, when \(S\) layers exists, you need to prepare \(6S\) weigth matrices and \(6S\) bias vectors.

If the number of layers n_layers is greather than \(1\), input of k-th layer is hidden state h_t of k-1-th layer. Note that all input variables except first layer may have different shape from the first layer.

Parameters:

n_layers (int) – Number of layers.
dropout_ratio (float) – Dropout ratio.
hx (chainer.Variable) – Variable holding stacked hidden states. Its shape is (S, B, N) where S is number of layers and is equal to n_layers, B is mini-batch size, and N is dimention of hidden units.
ws (list of list of chainer.Variable) – Weight matrices. ws[i] represents weights for i-th layer. Each ws[i] is a list containing six matrices. ws[i][j] is corresponding with W_j in the equation. Only ws[0][j] where 0 <= j < 3 is (I, N) shape as they are multiplied with input variables. All other matrices has (N, N) shape.
bs (list of list of chainer.Variable) – Bias vectors. bs[i] represnents biases for i-th layer. Each bs[i] is a list containing six vectors. bs[i][j] is corresponding with b_j in the equation. Shape of each matrix is (N,) where N is dimention of hidden units.
xs (list of chainer.Variable) – A list of Variable holding input values. Each element xs[t] holds input value for time t. Its shape is (B_t, I), where B_t is mini-batch size for time t, and I is size of input units. Note that this functions supports variable length sequences. When sequneces has different lengths, sort sequences in descending order by length, and transpose the sorted sequence. transpose_sequence() transpose a list of Variable() holding sequence. So xs needs to satisfy xs[t].shape[0] >= xs[t + 1].shape[0].
train (bool) – If True, this function executes dropout.
use_cudnn (bool) – If True, this function uses cuDNN if available.

Returns:

This functions returns a tuple concaining three elements,: hy and ys. - hy is an updated hidden states whose shape is same as hx. - ys is a list of Variable . Each element

ys[t] holds hidden states of the last layer corresponding to an input xs[t]. Its shape is (B_t, N) where B_t is mini-batch size for time t, and N is size of hidden units. Note that B_t is the same value as xs[t].

Return type:

tuple

n_step_lstm¶

chainer.functions.n_step_lstm(n_layers, dropout_ratio, hx, cx, ws, bs, xs, train=True, use_cudnn=True)[source]¶

Stacked Uni-directional Long Short-Term Memory function.

This function calculates stacked Uni-directional LSTM with sequences. This function gets an initial hidden state \(h_0\), an initial cell state \(c_0\), an input sequence \(x\), weight matrices \(W\), and bias vectors \(b\). This function calculates hidden states \(h_t\) and \(c_t\) for each time \(t\) from input \(x_t\).

\[\begin{split}i_t &= \sigma(W_0 x_t + W_4 h_{t-1} + b_0 + b_4) \\ f_t &= \sigma(W_1 x_t + W_5 h_{t-1} + b_1 + b_5) \\ o_t &= \sigma(W_2 x_t + W_6 h_{t-1} + b_2 + b_6) \\ a_t &= \tanh(W_3 x_t + W_7 h_{t-1} + b_3 + b_7) \\ c_t &= f_t \cdot c_{t-1} + i_t \cdot a_t \\ h_t &= o_t \cdot \tanh(c_t)\end{split}\]

As the function accepts a sequence, it calculates \(h_t\) for all \(t\) with one call. Eight weight matrices and eight bias vectors are required for each layers. So, when \(S\) layers exists, you need to prepare \(8S\) weigth matrices and \(8S\) bias vectors.

If the number of layers n_layers is greather than \(1\), input of k-th layer is hidden state h_t of k-1-th layer. Note that all input variables except first layer may have different shape from the first layer.

Parameters:

n_layers (int) – Number of layers.
dropout_ratio (float) – Dropout ratio.
hx (chainer.Variable) – Variable holding stacked hidden states. Its shape is (S, B, N) where S is number of layers and is equal to n_layers, B is mini-batch size, and N is dimention of hidden units.
cx (chainer.Variable) – Variable holding stacked cell states. It has the same shape as hx.
ws (list of list of chainer.Variable) – Weight matrices. ws[i] represents weights for i-th layer. Each ws[i] is a list containing eight matrices. ws[i][j] is corresponding with W_j in the equation. Only ws[0][j] where 0 <= j < 4 is (I, N) shape as they are multiplied with input variables. All other matrices has (N, N) shape.
bs (list of list of chainer.Variable) – Bias vectors. bs[i] represnents biases for i-th layer. Each bs[i] is a list containing eight vectors. bs[i][j] is corresponding with b_j in the equation. Shape of each matrix is (N,) where N is dimention of hidden units.
xs (list of chainer.Variable) – A list of Variable holding input values. Each element xs[t] holds input value for time t. Its shape is (B_t, I), where B_t is mini-batch size for time t, and I is size of input units. Note that this functions supports variable length sequences. When sequneces has different lengths, sort sequences in descending order by length, and transpose the sorted sequence. transpose_sequence() transpose a list of Variable() holding sequence. So xs needs to satisfy xs[t].shape[0] >= xs[t + 1].shape[0].
train (bool) – If True, this function executes dropout.
use_cudnn (bool) – If True, this function uses cuDNN if available.

Returns:

This functions returns a tuple concaining three elements,: hy, cy and ys. - hy is an updated hidden states whose shape is same as hx. - cy is an updated cell states whose shape is same as cx. - ys is a list of Variable . Each element

ys[t] holds hidden states of the last layer corresponding to an input xs[t]. Its shape is (B_t, N) where B_t is mini-batch size for time t, and N is size of hidden units. Note that B_t is the same value as xs[t].

Return type:

tuple

See also

chainer.functions.lstm()

n_step_rnn¶

chainer.functions.n_step_rnn(n_layers, dropout_ratio, hx, ws, bs, xs, train=True, use_cudnn=True, activation='tanh')[source]¶

Stacked Uni-directional RNN function for sequence inputs.

This function calculates stacked Uni-directional RNN with sequences. This function gets an initial hidden state \(h_0\), an initial cell state \(c_0\), an input sequence \(x\), weight matrices \(W\), and bias vectors \(b\). This function calculates hidden states \(h_t\) and \(c_t\) for each time \(t\) from input \(x_t\).

\[h_t = f(W_0 x_t + W_1 h_{t-1} + b_0 + b_1)\]

where \(f\) is an activation function.

Weight matrices \(W\) contains two matrices \(W_0\) and \(W_1\). \(W_0\) is a parameter for an input sequence. \(W_1\) is a parameter for a hidden state. Bias matrices \(b\) contains two matrices \(b_0\) and \(b_1\). \(b_0\) is a parameter for an input sequence. \(b_1\) is a parameter for a hidden state.

As the function accepts a sequence, it calculates \(h_t\) for all \(t\) with one call. Two weight matrices and two bias vectors are required for each layer. So, when \(S\) layers exist, you need to prepare \(2S\) weigth matrices and \(2S\) bias vectors.

If the number of layers n_layers is greather than \(1\), input of k-th layer is hidden state h_t of k-1-th layer. Note that all input variables except first layer may have different shape from the first layer.

Parameters:

n_layers (int) – Number of layers.
dropout_ratio (float) – Dropout ratio.
hx (chainer.Variable) – Variable holding stacked hidden states. Its shape is (S, B, N) where S is number of layers and is equal to n_layers, B is mini-batch size, and N is dimention of hidden units.
ws (list of list of chainer.Variable) – Weight matrices. ws[i] represents weights for i-th layer. Each ws[i] is a list containing two matrices. ws[i][j] is corresponding with W_j in the equation. Only ws[0][j] where 0 <= j < 1 is (I, N) shape as they are multiplied with input variables. All other matrices has (N, N) shape.
bs (list of list of chainer.Variable) – Bias vectors. bs[i] represnents biases for i-th layer. Each bs[i] is a list containing two vectors. bs[i][j] is corresponding with b_j in the equation. Shape of each matrix is (N,) where N is dimention of hidden units.
xs (list of chainer.Variable) – A list of Variable holding input values. Each element xs[t] holds input value for time t. Its shape is (B_t, I), where B_t is mini-batch size for time t, and I is size of input units. Note that this functions supports variable length sequences. When sequneces has different lengths, sort sequences in descending order by length, and transpose the sorted sequence. transpose_sequence() transpose a list of Variable() holding sequence. So xs needs to satisfy xs[t].shape[0] >= xs[t + 1].shape[0].
train (bool) – If True, this function executes dropout.
use_cudnn (bool) – If True, this function uses cuDNN if available.
activation (str) – Activation function name. Please select tanh or relu.

Returns:

This functions returns a tuple concaining three elements,

hy and ys.

hy is an updated hidden states whose shape is same as hx.
ys is a list of Variable . Each element ys[t] holds hidden states of the last layer corresponding to an input xs[t]. Its shape is (B_t, N) where B_t is mini-batch size for time t, and N is size of hidden units. Note that B_t is the same value as xs[t].

Return type:

tuple

Evaluation functions¶

accuracy¶

chainer.functions.accuracy(y, t, ignore_label=None)[source]¶

Computes multiclass classification accuracy of the minibatch.

Parameters:	y (`Variable` or `numpy.ndarray` or `cupy.ndarray`) – Array whose (i, j, k, ...)-th element indicates the score of the class j at the (i, k, ...)-th sample. The prediction label \(\hat t\) is calculated by the formula \(\hat t(i, k, ...) = \operatorname{\mathrm{argmax}}_j y(i, j, k, ...)\). t (`Variable` or `numpy.ndarray` or `cupy.ndarray` of `numpy.int32`) – Array of ground truth labels. ignore_label (int or None) – Skip calculating accuracy if the true label is `ignore_label`.
Returns:	A variable holding a scalar array of the accuracy.
Return type:	Variable

Note

This function is non-differentiable.

Example

We show the most common case, when y is the two dimensional array.

>>> y = np.array([[0.1, 0.7, 0.2], # prediction label is 1
...               [8.0, 1.0, 2.0], # prediction label is 0
...               [-8.0, 1.0, 2.0], # prediction label is 2
...               [-8.0, -1.0, -2.0]]) # prediction label is 1
>>> t = np.array([1, 0, 2, 1], 'i')
>>> F.accuracy(y, t).data # 100% accuracy because all samples are correct
array(1.0)
>>> t = np.array([1, 0, 0, 0], 'i')
>>> F.accuracy(y, t).data # 50% accuracy because 1st and 2nd samples are correct.
array(0.5)
>>> F.accuracy(y, t, ignore_label=0).data # 100% accuracy because of ignoring the 2nd, 3rd and 4th samples.
array(1.0)

binary_accuracy¶

chainer.functions.binary_accuracy(y, t)[source]¶

Computes binary classification accuracy of the minibatch.

Parameters:	y (Variable) – Variable holding a matrix whose i-th element indicates the score of positive at the i-th example. t (Variable) – Variable holding an int32 vector of ground truth labels. If `t[i] == -1`, corresponding `x[i]` is ignored. Accuracy is zero if all ground truth labels are `-1`.
Returns:	A variable holding a scalar array of the accuracy.
Return type:	Variable

Note

This function is non-differentiable.

classification_summary¶

chainer.functions.classification_summary(y, t, label_num=None, beta=1.0, ignore_label=-1)[source]¶

Calculates Precision, Recall, F beta Score, and support.

This function calculates the following quantities for each class.

Precision: \(\frac{\mathrm{tp}}{\mathrm{tp} + \mathrm{fp}}\)
Recall: \(\frac{\mathrm{tp}}{\mathrm{tp} + \mathrm{tn}}\)
F beta Score: The weighted harmonic average of Precision and Recall.
Support: The number of instances of each ground truth label.

Here, tp, fp, and tn stand for the number of true positives, false positives, and true negative, respectively.

label_num specifies the number of classes, that is, each value in t must be an integer in the range of [0, label_num). If label_num is None, this function regards label_num as a maximum of in t plus one.

ignore_label determines which instances should be ignored. Specifically, instances with the given label are not taken into account for calculating the above quantities. By default, it is set to -1 so that all instances are taken into consideration, as labels are supposed to be non-negative integers. Setting ignore_label to a non-negative integer less than label_num is illegal and yields undefined behavior. In the current implementation, it arises RuntimeWarning and ignore_label-th entries in output arrays do not contain correct quantities.

Parameters:	y (Variable) – Variable holding a vector of scores. t (Variable) – Variable holding a vector of ground truth labels. label_num (int) – The number of classes. beta (float) – The parameter which determines the weight of precision in the F-beta score. ignore_label (int) – Instances with this label are ignored.
Returns:	4-tuple of ~chainer.Variable of size `(label_num,)`. Each element represents precision, recall, F beta score, and support of this minibatch.

r2_score¶

chainer.functions.r2_score(pred, true, sample_weight=None, multioutput='uniform_average')[source]¶

Computes R^2(coefficient of determination) regression score function.

Parameters:	pred (Variable) – Variable holding a vector, matrix or tensor of estimated target values. true (Variable) – Variable holding a vector, matrix or tensor of correct target values. sample_weight – This argument is for compatibility with scikit-learn’s implementation of r2_score. Current implementation admits None only. multioutput (string) – [‘uniform_average’, ‘raw_values’]. if ‘uniform_average’, this function returns an average of R^2 score of multiple output. If ‘raw_average’, this function return a set of R^2 score of multiple output.
Returns:	A Variable holding a scalar array of the R^2 score if ‘multioutput’ is ‘uniform_average’ or a vector of R^2 scores if ‘multioutput’ is ‘raw_values’.
Return type:	Variable

Note

This function is non-differentiable.

Loss functions¶

bernoulli_nll¶

chainer.functions.bernoulli_nll(x, y, reduce='sum')[source]¶

Computes the negative log-likelihood of a Bernoulli distribution.

This function calculates the negative log-likelihood of a Bernoulli distribution.

\[-\log B(x; p) = -\sum_i \{x_i \log(p_i) + (1 - x_i)\log(1 - p_i)\},\]

where \(p = \sigma(y)\), \(\sigma(\cdot)\) is a sigmoid function, and \(B(x; p)\) is a Bernoulli distribution.

The output is a variable whose value depends on the value of the option reduce. If it is 'no', it holds the elementwise loss values. If it is 'sum', loss values are summed up.

Note

As this function uses a sigmoid function, you can pass a result of fully-connected layer (that means Linear) to this function directly.

Parameters:	x (`Variable` or `numpy.ndarray` or `cupy.ndarray`) – Input variable. y (`Variable` or `numpy.ndarray` or `cupy.ndarray`) – A variable representing the parameter of Bernoulli distribution. reduce (str) – Reduction option. Its value must be either `'sum'` or `'no'`. Otherwise, `ValueError` is raised.
Returns:	A variable representing the negative log-likelihood. If `reduce` is `'no'`, the output variable holds array whose shape is same as one of (hence both of) input variables. If it is `'sum'`, the output variable holds a scalar value.
Return type:	Variable

black_out¶

chainer.functions.black_out(x, t, W, samples, reduce='mean')[source]¶

BlackOut loss function.

BlackOut loss function is defined as

\[-\log(p(t)) - \sum_{s \in S} \log(1 - p(s)),\]

where \(t\) is the correct label, \(S\) is a set of negative examples and \(p(\cdot)\) is likelihood of a given label. And, \(p\) is defined as

\[p(y) = \frac{\exp(W_y^\top x)}{ \sum_{s \in samples} \exp(W_s^\top x)}.\]

The output is a variable whose value depends on the value of the option reduce. If it is 'no', it holds the no loss values. If it is 'mean', this function takes a mean of loss values.

Parameters:

x (Variable) – Batch of input vectors. Its shape should be \((N, D)\).
t (Variable) – Vector of ground truth labels. Its shape should be \((N,)\). Each elements \(v\) should satisfy \(0 \geq v \geq V\) or \(-1\) where \(V\) is the number of label types.
W (Variable) – Weight matrix. Its shape should be \((V, D)\)
samples (Variable) – Negative samples. Its shape should be \((N, S)\) where \(S\) is the number of negative samples.
reduce (str) – Reduction option. Its value must be either
or 'mean'. Otherwise, ('no') –

:param ValueError is raised.:

Returns:	A variable object holding loss value(s). If `reduce` is `'no'`, the output variable holds an array whose shape is \((N,)\) . If it is `'mean'`, it holds a scalar.
Return type:	Variable

See: BlackOut: Speeding up Recurrent Neural Network Language Models With Very Large Vocabularies

See also

BlackOut.

connectionist_temporal_classification¶

chainer.functions.connectionist_temporal_classification(x, t, blank_symbol, input_length=None, label_length=None, reduce='mean')[source]¶

Connectionist Temporal Classification loss function.

Connectionist Temporal Classification(CTC) [Graves2006] is a loss function of sequence labeling where the alignment between the inputs and target is unknown. See also [Graves2012]

The output is a varialbe whose value depends on the value of the option reduce. If it is 'no', it holds the samplewise loss values. If it is 'mean', it takes the mean of loss values.

Parameters:	x (sequence of Variable) – RNN output at each time. `x` must be a list of `Variable` s. Each element of `x`, `x[i]` is a `Variable` representing output of RNN at time `i`. t (Variable) – Expected label sequence. blank_symbol (int) – Index of blank_symbol. This value must be non-negative. input_length (Variable) – Length of valid sequence for each of mini batch `x` (optional). If input_length is skipped, It regards that all of `x` is valid input. label_length (Variable) – Length of valid sequence for each of mini batch `t` (optional). If label_length is skipped, It regards that all of `t` is valid input. reduce (str) – Reduction option. Its value must be either `'mean'` or `'no'`. Otherwise, `ValueError` is raised.
Returns:	A variable holding a scalar value of the CTC loss. If `reduce` is `'no'`, the output varialbe holds array whose shape is (B,) where B is the number of samples. If it is `'mean'`, it holds a scalar.
Return type:	Variable

Note

You need to input x without applying to activation functions(e.g. softmax function), because this function applies softmax functions to x before calculating CTC loss to avoid numerical limitations. You also need to apply softmax function to forwarded values before you decode it.

Note

This function is differentiable only by x.

Note

This function supports (batch, sequence, 1-dimensional input)-data.

[Graves2006]

Alex Graves, Santiago Fernandez, Faustino Gomez, Jurgen Schmidhuber, Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks

[Graves2012]

Alex Graves, Supervised Sequence Labelling with Recurrent Neural Networks

contrastive¶

chainer.functions.contrastive(x0, x1, y, margin=1, reduce='mean')[source]¶

Computes contrastive loss.

It takes a pair of samples and a label as inputs. The label is \(1\) when those samples are similar, or \(0\) when they are dissimilar.

Let \(N\) and \(K\) denote mini-batch size and the dimension of input variables, respectively. The shape of both input variables x0 and x1 should be (N, K). The loss value of the \(n\)-th sample pair \(L_n\) is

\[L_n = \frac{1}{2} \left( y_n d_n^2 + (1 - y_n) \max ({\rm margin} - d_n, 0)^2 \right)\]

where \(d_n = \| {\bf x_0}_n - {\bf x_1}_n \|_2\), \({\bf x_0}_n\) and \({\bf x_1}_n\) are \(n\)-th K-dimensional vectors of x0 and x1.

The output is a variable whose value depends on the value of the option reduce. If it is 'no', it holds the elementwise loss values. If it is 'mean', this function takes a mean of loss values.

Parameters:	x0 (Variable) – The first input variable. The shape should be (N, K), where N denotes the mini-batch size, and K denotes the dimension of `x0`. x1 (Variable) – The second input variable. The shape should be the same as `x0`. y (Variable) – Labels. All values should be 0 or 1. The shape should be `(N,)`, where N denotes the mini-batch size. margin (float) – A parameter for contrastive loss. It should be positive value. reduce (str) – Reduction option. Its value must be either `'mean'` or `'no'`. Otherwise, `ValueError` is raised.
Returns:	A variable holding the loss value(s) calculated by the above equation. If `reduce` is `'no'`, the output variable holds array whose shape is same as one of (hence both of) input variables. If it is `'mean'`, the output variable holds a scalar value.
Return type:	Variable

Note

This cost can be used to train siamese networks. See Learning a Similarity Metric Discriminatively, with Application to Face Verification for details.

crf1d¶

chainer.functions.crf1d(cost, xs, ys, reduce='mean')[source]¶

Calculates negative log-likelihood of linear-chain CRF.

It takes a transition cost matrix, a sequence of costs, and a sequence of labels. Let \(c_{st}\) be a transition cost from a label \(s\) to a label \(t\), \(x_{it}\) be a cost of a label \(t\) at position \(i\), and \(y_i\) be an expected label at position \(i\). The negative log-likelihood of linear-chain CRF is defined as

\[L = -\left( \sum_{i=1}^l x_{iy_i} + \ \sum_{i=1}^{l-1} c_{y_i y_{i+1}} - {\log(Z)} \right) ,\]

where \(l\) is the length of the input sequence and \(Z\) is the normalizing constant called partition function.

Note

When you want to calculate the negative log-likelihood of sequences which have different lengths, sort the sequences in descending order of lengths and transpose the sequences. For example, you have three input sequences:

>>> a1 = a2 = a3 = a4 = np.random.uniform(-1, 1, 3).astype('f')
>>> b1 = b2 = b3 = np.random.uniform(-1, 1, 3).astype('f')
>>> c1 = c2 = np.random.uniform(-1, 1, 3).astype('f')

>>> a = [a1, a2, a3, a4]
>>> b = [b1, b2, b3]
>>> c = [c1, c2]

where a1 and all other variables are arrays with (K,) shape. Make a transpose of the sequences:

>>> x1 = np.stack([a1, b1, c1])
>>> x2 = np.stack([a2, b2, c2])
>>> x3 = np.stack([a3, b3])
>>> x4 = np.stack([a4])

and make a list of the arrays:

>>> xs = [x1, x2, x3, x4]

You need to make label sequences in the same fashion. And then, call the function:

>>> cost = chainer.Variable(
...     np.random.uniform(-1, 1, (3, 3)).astype('f'))
>>> ys = [np.zeros(x.shape[0:1], dtype='i') for x in xs]
>>> loss = F.crf1d(cost, xs, ys)

It calculates mean of the negative log-likelihood of the three sequences.

The output is a variable whose value depends on the value of the option reduce. If it is 'no', it holds the elementwise loss values. If it is 'mean', it holds mean of the loss values.

Parameters:

cost (Variable) – A \(K \times K\) matrix which holds transition cost between two labels, where \(K\) is the number of labels.
xs (list of Variable) – Input vector for each label. len(xs) denotes the length of the sequence, and each Variable holds a \(B \times K\) matrix, where \(B\) is mini-batch size, \(K\) is the number of labels. Note that \(B\) s in all the variables are not necessary the same, i.e., it accepts the input sequences with different lengths.
ys (list of Variable) – Expected output labels. It needs to have the same length as xs. Each Variable holds a \(B\) integer vector. When x in xs has the different \(B\), correspoding y has the same \(B\). In other words, ys must satisfy ys[i].shape == xs[i].shape[0:1] for all i.
reduce (str) – Reduction option. Its value must be either 'mean' or 'no'. Otherwise, ValueError is raised.

Returns:

A variable holding the average negative: log-likelihood of the input sequences.

Return type:

Variable

Note

See detail in the original paper: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data.

chainer.functions.argmax_crf1d(cost, xs)[source]¶

Computes a state that maximizes a joint probability of the given CRF.

Parameters:

cost (Variable) – A \(K \times K\) matrix which holds transition cost between two labels, where \(K\) is the number of labels.
xs (list of Variable) – Input vector for each label. len(xs) denotes the length of the sequence, and each Variable holds a \(B \times K\) matrix, where \(B\) is mini-batch size, \(K\) is the number of labels. Note that \(B\) s in all the variables are not necessary the same, i.e., it accepts the input sequences with different lengths.

Returns:

A tuple of Variable object s and a: list ps. The shape of s is (B,), where B is the mini-batch size. i-th element of s, s[i], represents log-likelihood of i-th data. ps is a list of numpy.ndarray or cupy.ndarray, and denotes the state that maximizes the point probability. len(ps) is equal to len(xs), and shape of each ps[i] is the mini-batch size of the corresponding xs[i]. That means, ps[i].shape == xs[i].shape[0:1].

Return type:

tuple

cross_covariance¶

chainer.functions.cross_covariance(y, z, reduce='half_squared_sum')[source]¶

Computes the sum-squared cross-covariance penalty between y and z

The output is a variable whose value depends on the value of the option reduce. If it is 'no', it holds the covariant matrix that has as many rows (resp. columns) as the dimension of y (resp.z). If it is 'half_squared_sum', it holds the half of the Frobenius norm (i.e. L2 norm of a matrix flattened to a vector) of the covarianct matrix.

Parameters:	y (Variable) – Variable holding a matrix where the first dimension corresponds to the batches. z (Variable) – Variable holding a matrix where the first dimension corresponds to the batches. reduce (str) – Reduction option. Its value must be either `'half_squared_sum'` or `'no'`. Otherwise, `ValueError` is raised.
Returns:	A variable holding the cross covariance loss. If `reduce` is `'no'`, the output variable holds 2-dimensional array matrix of shape `(M, N)` where `M` (resp. `N`) is the number of columns of `y` (resp. `z`). If it is `'half_squared_sum'`, the output variable holds a scalar value.
Return type:	Variable

Note

This cost can be used to disentangle variables. See https://arxiv.org/abs/1412.6583v3 for details.

gaussian_kl_divergence¶

chainer.functions.gaussian_kl_divergence(mean, ln_var, reduce='sum')[source]¶

Computes the KL-divergence of Gaussian variables from the standard one.

Given two variable mean representing \(\mu\) and ln_var representing \(\log(\sigma^2)\), this function calculates the KL-divergence in elementwise manner between the given multi-dimensional Gaussian \(N(\mu, S)\) and the standard Gaussian \(N(0, I)\)

\[D_{\mathbf{KL}}(N(\mu, S) \| N(0, I)),\]

where \(S\) is a diagonal matrix such that \(S_{ii} = \sigma_i^2\) and \(I\) is an identity matrix.

The output is a variable whose value depends on the value of the option reduce. If it is 'no', it holds the elementwise loss values. If it is 'sum', loss values are summed up.

Parameters:	mean (`Variable` or `numpy.ndarray` or `cupy.ndarray`) – A variable representing mean of given gaussian distribution, \(\mu\). ln_var (`Variable` or `numpy.ndarray` or `cupy.ndarray`) – A variable representing logarithm of variance of given gaussian distribution, \(\log(\sigma^2)\). reduce (str) – Reduction option. Its value must be either `'sum'` or `'no'`. Otherwise, `ValueError` is raised.
Returns:	A variable representing KL-divergence between given gaussian distribution and the standard gaussian. If `reduce` is `'no'`, the output variable holds array whose shape is same as one of (hence both of) input variables. If it is `'sum'`, the output variable holds a scalar value.
Return type:	Variable

gaussian_nll¶

chainer.functions.gaussian_nll(x, mean, ln_var, reduce='sum')[source]¶

Computes the negative log-likelihood of a Gaussian distribution.

Given two variable mean representing \(\mu\) and ln_var representing \(\log(\sigma^2)\), this function computes in elementwise manner the negative log-likelihood of \(x\) on a Gaussianx distribution \(N(\mu, S)\),

\[-\log N(x; \mu, \sigma^2) = \log\left(\sqrt{(2\pi)^D |S|}\right) + \frac{1}{2}(x - \mu)^\top S^{-1}(x - \mu),\]

where \(D\) is a dimension of \(x\) and \(S\) is a diagonal matrix where \(S_{ii} = \sigma_i^2\).

The output is a varialbe whose value depends on the value of the option reduce. If it is 'no', it holds the elementwise loss values. If it is 'sum', loss values are summed up.

Parameters:	x (`Variable` or `numpy.ndarray` or `cupy.ndarray`) – Input variable. mean (`Variable` or `numpy.ndarray` or `cupy.ndarray`) – A variable representing mean of a Gaussian distribution, \(\mu\). ln_var (`Variable` or `numpy.ndarray` or `cupy.ndarray`) – A variable representing logarithm of variance of a Gaussian distribution, \(\log(\sigma^2)\). reduce (str) – Reduction option. Its value must be either `'sum'` or `'no'`. Otherwise, `ValueError` is raised.
Returns:	A variable representing the negative log-likelihood. If `reduce` is `'no'`, the output varialbe holds array whose shape is same as one of (hence both of) input variables. If it is `'sum'`, the output variable holds a scalar value.
Return type:	Variable

hinge¶

chainer.functions.hinge(x, t, norm='L1', reduce='mean')[source]¶

Computes the hinge loss for a one-of-many classification task.

\[L = \frac{1}{N} \sum_{n=1}^N \sum_{k=1}^K \left[ \max(0, 1 - \delta\{t_n = k\} x_{nk}) \right]^p\]

where \(N\) denotes the batch size and \(K\) is the number of classes of interest,

\[\begin{split}\delta \{ {\rm condition} \} = \left \{ \begin{array}{cc} 1 & {\rm if~condition\ is\ true} \\ -1 & {\rm otherwise,} \end{array} \right.\end{split}\]

and

\[\begin{split}p = \left \{ \begin{array}{cc} 1 & {\rm if~norm} = {\rm L1} \\ 2 & {\rm if~norm} = {\rm L2.} \end{array} \right.\end{split}\]

The output is a variable whose value depends on the value of the option reduce. If it is 'no', it holds the elementwise loss values. If it is 'mean', it takes the mean of loss values.

Parameters:	x (Variable) – Input variable. The shape of `x` should be (\(N\), \(K\)). t (Variable) – The \(N\)-dimensional label vector with values \(t_n \in \{0, 1, 2, \dots, K-1\}\). The shape of `t` should be (\(N\),). norm (string) – Specifies norm type. Either `'L1'` or `'L2'` is acceptable. reduce (str) – Reduction option. Its value must be either `'mean'` or `'no'`. Otherwise, `ValueError` is raised.
Returns:	A variable object holding a scalar array of the hinge loss \(L\). If `reduce` is `'no'`, the output variable holds array whose shape is same as one of (hence both of) input variables. If it is `'mean'`, the output variable holds a scalar value.
Return type:	Variable

huber_loss¶

chainer.functions.huber_loss(x, t, delta, reduce='sum_along_second_axis')[source]¶

Loss function which is less sensitive to outliers in data than MSE.

\[a = x - t\]

and

\[\begin{split}L_{\delta}(a) = \left \{ \begin{array}{cc} \frac{1}{2} a^2 & {\rm if~|a| \leq \delta} \\ \delta (|a| - \frac{1}{2} \delta) & {\rm otherwise,} \end{array} \right.\end{split}\]

The output is a variable whose value depends on the value of the option reduce. If it is 'no', it holds the elementwise loss values. If it is 'sum_along_second_axis', loss values are summed up along the second axis (i.e. axis=1).

Parameters:	x (Variable) – Input variable. The shape of `x` should be (\(N\), \(K\)). t (Variable) – Target variable for regression. The shape of `t` should be (\(N\), \(K\)). delta (float) – Constant variable for huber loss function as used in definition. reduce (str) – Reduction option. Its value must be either `'sum_along_second_axis'` or `'no'`. Otherwise, `ValueError` is raised.
Returns:	A variable object holding a scalar array of the huber loss \(L_{\delta}\). If `reduce` is `'no'`, the output variable holds array whose shape is same as one of (hence both of) input variables. If it is `'sum_along_second_axis'`, the shape of the array is same as the input variables, except the second axis is removed.
Return type:	Variable

See:: Huber loss - Wikipedia.

mean_absolute_error¶

chainer.functions.mean_absolute_error(x0, x1)[source]¶

Mean absolute error function.

This function computes mean absolute error between two variables. The mean is taken over the minibatch.

mean_squared_error¶

chainer.functions.mean_squared_error(x0, x1)[source]¶

Mean squared error function.

This function computes mean squared error between two variables. The mean is taken over the minibatch. Note that the error is not scaled by 1/2.

negative_sampling¶

chainer.functions.negative_sampling(x, t, W, sampler, sample_size, reduce='sum')[source]¶

Negative sampling loss function.

In natural language processing, especially language modeling, the number of words in a vocabulary can be very large. Therefore, you need to spend a lot of time calculating the gradient of the embedding matrix.

By using the negative sampling trick you only need to calculate the gradient for a few sampled negative examples.

The objective function is below:

\[f(x, p) = \log \sigma(x^\top w_p) + \ k E_{i \sim P(i)}[\log \sigma(- x^\top w_i)],\]

where \(\sigma(\cdot)\) is a sigmoid function, \(w_i\) is the weight vector for the word \(i\), and \(p\) is a positive example. It is approximated with \(k\) examples \(N\) sampled from probability \(P(i)\), like this:

\[f(x, p) \approx \log \sigma(x^\top w_p) + \ \sum_{n \in N} \log \sigma(-x^\top w_n).\]

Each sample of \(N\) is drawn from the word distribution \(P(w)\). This is calculated as \(P(w) = \frac{1}{Z} c(w)^\alpha\), where \(c(w)\) is the unigram count of the word \(w\), \(\alpha\) is a hyper-parameter, and \(Z\) is the normalization constant.

Parameters:	x (Variable) – Batch of input vectors. t (Variable) – Vector of ground truth labels. W (Variable) – Weight matrix. sampler (FunctionType) – Sampling function. It takes a shape and returns an integer array of the shape. Each element of this array is a sample from the word distribution. A `WalkerAlias` object built with the power distribution of word frequency is recommended. sample_size (int) – Number of samples. reduce (str) – Reduction option. Its value must be either `'sum'` or `'no'`. Otherwise, `ValueError` is raised.
Returns:	A variable holding the loss value(s) calculated by the above equation. If `reduce` is `'no'`, the output variable holds array whose shape is same as one of (hence both of) input variables. If it is `'sum'`, the output variable holds a scalar value.
Return type:	Variable

See: Distributed Representations of Words and Phrases and their Compositionality

See also

NegativeSampling.

sigmoid_cross_entropy¶

chainer.functions.sigmoid_cross_entropy(x, t, use_cudnn=True, normalize=True, reduce='mean')[source]¶

Computes cross entropy loss for pre-sigmoid activations.

Parameters:	x (Variable) – A variable object holding a matrix whose (i, j)-th element indicates the unnormalized log probability of the j-th unit at the i-th example. t (Variable) – Variable holding an int32 vector of ground truth labels. If `t[i] == -1`, corresponding `x[i]` is ignored. Loss is zero if all ground truth labels are `-1`. normalize (bool) – Variable holding a boolean value which determines the normalization constant. If true, this function normalizes the cross entropy loss across all instances. If else, it only normalizes along a batch size. reduce (str) – Variable holding a `str` which determines whether to reduce the shape of the input. If it is `'mean'`, it computes the sum of cross entropy and normalize it according to `normalize` option. If is is `'no'`, this function computes cross entropy for each instance and does not normalize it (`normalize` option is ignored). In this case, the loss value of the ignored instance, which has `-1` as its target value, is set to `0`.
Returns:	A variable object holding an array of the cross entropy. If `reduce` is `'mean'`, it is a scalar array. If `reduce` is `'no'`, the shape is same as `x`.
Return type:	Variable

Note

This function is differentiable only by x.

softmax_cross_entropy¶

chainer.functions.softmax_cross_entropy(x, t, use_cudnn=True, normalize=True, cache_score=True, class_weight=None, ignore_label=-1, reduce='mean')[source]¶

Computes cross entropy loss for pre-softmax activations.

Parameters:	x (Variable) – Variable holding a multidimensional array whose element indicates unnormalized log probability: the first axis of the variable represents the number of samples, and the second axis represents the number of classes. While this function computes a usual softmax cross entropy if the number of dimensions is equal to 2, it computes a cross entropy of the replicated softmax if the number of dimensions is greater than 2. t (Variable) – Variable holding an int32 vector of ground truth labels. If `t[i] == ignore_label`, corresponding `x[i]` is ignored. normalize (bool) – If `True`, this function normalizes the cross entropy loss across all instances. If `False`, it only normalizes along a batch size. cache_score (bool) – When it is `True`, the function stores result of forward computation to use it on backward computation. It reduces computational cost though consumes more memory. class_weight (ndarray or ndarray) – An array that contains constant weights that will be multiplied with the loss values along with the second dimension. The shape of this array should be `(x.shape[1],)`. If this is not `None`, each class weight `class_weight[i]` is actually multiplied to `y[:, i]` that is the corresponding log-softmax output of `x` and has the same shape as `x` before calculating the actual loss value. ignore_label (int) – Label value you want to ignore. Its default value is `-1`. See description of the argument t. reduce (str) – A string that determines whether to reduce the loss values. If it is `'mean'`, it computes the sum of the individual cross entropy and normalize it according to `normalize` option. If it is `'no'`, this function computes cross entropy for each instance and does not normalize it (`normalize` option is ignored). In this case, the loss value of the ignored instance, which has `ignore_label` as its target value, is set to `0`.
Returns:	A variable holding a scalar array of the cross entropy loss. If `reduce` is `'mean'`, it is a scalar array. If `reduce` is `'no'`, the shape is same as that of `x`.
Return type:	Variable

Note

This function is differentiable only by x.

triplet¶

chainer.functions.triplet(anchor, positive, negative, margin=0.2, reduce='mean')[source]¶

Computes triplet loss.

It takes a triplet of variables as inputs, \(a\), \(p\) and \(n\): anchor, positive example and negative example respectively. The triplet defines a relative similarity between samples. Let \(N\) and \(K\) denote mini-batch size and the dimension of input variables, respectively. The shape of all input variables should be \((N, K)\).

\[L(a, p, n) = \frac{1}{N} \left( \sum_{i=1}^N \max \{d(a_i, p_i) - d(a_i, n_i) + {\rm margin}, 0\} \right)\]

where \(d(x_i, y_i) = \| {\bf x}_i - {\bf y}_i \|_2^2\).

The output is a variable whose value depends on the value of the option reduce. If it is 'no', it holds the elementwise loss values. If it is 'mean', this function takes a mean of loss values.

Parameters:	anchor (Variable) – The anchor example variable. The shape should be \((N, K)\), where \(N\) denotes the minibatch size, and \(K\) denotes the dimension of the anchor. positive (Variable) – The positive example variable. The shape should be the same as anchor. negative (Variable) – The negative example variable. The shape should be the same as anchor. margin (float) – A parameter for triplet loss. It should be a positive value. reduce (str) – Reduction option. Its value must be either `'mean'` or `'no'`. Otherwise, `ValueError` is raised.
Returns:	A variable holding a scalar that is the loss value calculated by the above equation. If `reduce` is `'no'`, the output variable holds array whose shape is same as one of (hence both of) input variables. If it is `'mean'`, the output variable holds a scalar value.
Return type:	Variable

Note

This cost can be used to train triplet networks. See Learning Fine-grained Image Similarity with Deep Ranking for details.

Mathematical functions¶

arccos¶

chainer.functions.arccos(x)[source]¶

Elementwise arccosine function.

\[y_i = \arccos x_i.\]

Parameters:	x (Variable) – Input variable.
Returns:	Output variable.
Return type:	Variable

arcsin¶

chainer.functions.arcsin(x)[source]¶

Elementwise arcsine function.

\[y_i = \arcsin x_i.\]

Parameters:	x (Variable) – Input variable.
Returns:	Output variable.
Return type:	Variable

arctan¶

chainer.functions.arctan(x)[source]¶

Elementwise arctangent function.

\[y_i = \arctan x_i.\]

Parameters:	x (Variable) – Input variable.
Returns:	Output variable.
Return type:	Variable

argmax¶

chainer.functions.argmax(x, axis=None)[source]¶

Returns index which holds maximum of array elements over a given axis.

Parameters:	x (Variable) – Array to find maximum elements. axis (None or int) – Axis over which a max is performed. The default (axis = None) is perform a max over all the dimensions of the input array.
Returns:	Output variable.
Return type:	Variable

argmin¶

chainer.functions.argmin(x, axis=None)[source]¶

Returns index which holds minimum of array elements over a given axis.

Parameters:	x (Variable) – Array to find minimum elements. axis (None or int) – Axis over which a min is performed. The default (axis = None) is perform a min over all the dimensions of the input array.
Returns:	Output variable.
Return type:	Variable

average¶

chainer.functions.average(x, axis=None, weights=None, keepdims=False)[source]¶

Calculate weighted average of array elements over a given axis.

Parameters:	x (Variable) – Elements to sum. axis (None or int) – Axis which the method is performed. With the default (axis = None) it performs a mean over all the dimensions of the input array. weights (None or chainer.Variable) – An array holding weights to calculate weighted average. If it is `None`, all weights are assumed to be one. When `axis` is `None`, `weights` must have the same shape of `x`. And when `axis` is `int`, it must be 1-D array satisfing `weights.shape == (x.shape[axis],)`. keepdims (bool) – If `True`, the specified axes are remained as axes of length one.
Returns:	Output variable.
Return type:	Variable

batch_inv¶

chainer.functions.batch_inv(a)[source]¶

Computes the inverse of a batch of square matrices.

Parameters:	a (Variable) – Input array to compute the inverse for. Shape of the array should be `(m, n, n)` where `m` is the number of matrices in the batch, and `n` is the dimensionality of a square matrix.
Returns:	Inverse of every matrix in the batch of matrices.
Return type:	Variable

batch_l2_norm_squared¶

chainer.functions.batch_l2_norm_squared(x)[source]¶

L2 norm (a.k.a. Euclidean norm) squared.

This function implements the square of L2 norm on a vector. No reduction along batch axis is done.

Parameters:	x (Variable) – Input variable. The first dimension is assumed to be the minibatch dimension. If `x` has more than two dimensions all but the first dimension are flattened to one dimension.
Returns:	Two dimensional output variable.
Return type:	Variable

batch_matmul¶

chainer.functions.batch_matmul(a, b, transa=False, transb=False)[source]¶

Computes the batch matrix multiplications of two sets of arrays.

Parameters:

a (Variable) – The left operand of the batch matrix multiplications. A 2-D array of shape (B, N) is considered as B \(N \times 1\) matrices. A 3-D array of shape (B, M, N) is considered as B \(M \times N\) matrices.
b (Variable) – The right operand of the batch matrix multiplications. Its array is treated as matrices in the same way as a‘s array.
transa (bool) – If True, transpose each matrix in a.
transb (bool) – If True, transpose each matrix in b.

Returns:

The result of the batch matrix multiplications as a: 3-D array.

Return type:

Variable

bias¶

chainer.functions.bias(x, y, axis=1)[source]¶

Elementwise summation with broadcasting.

Computes a elementwise summation of two input variables, with the shape of the latter variable broadcasted to match the shape of the former. axis is the first axis of the first variable along which the second variable is applied.

The term “broadcasting” here comes from Caffe’s bias layer so the “broadcasting” with the following arguments:

   x : 100 x 3 x 40 x 60
   y : 3 x 40
axis : 1

is equivalent to the following numpy broadcasting:

x : 100 x 3 x 40 x 60
y :   1 x 3 x 40 x 1

Note that how the axis indicates to which axis of x we apply y.

Parameters:	x (Variable) – Input variable to be summed. y (Variable) – Input variable to sum, broadcasted. axis (int) – The first axis of `x` along which `y` is applied.
Returns:	Output variable.
Return type:	Variable

ceil¶

chainer.functions.ceil(x)[source]¶

Elementwise ceil function.

\[y_i = \lceil x_i \rceil\]

Parameters:	x (Variable) – Input variable.
Returns:	Output variable.
Return type:	Variable

clip¶

chainer.functions.clip(x, x_min, x_max)[source]¶

Clips (limits) elements of input variable.

Given an interval [x_min, xmax], elements outside the interval are clipped to the interval edges.

Parameters:	x (Variable) – Input variable to be clipped. x_min (float) – Minimum value. x_max (float) – Maximum value.
Returns:	Output variable.
Return type:	Variable

cos¶

chainer.functions.cos(x)[source]¶: Elementwise cos function.

cosh¶

chainer.functions.cosh(x)[source]¶

Elementwise hyperbolic cosine function.

\[y_i = \cosh x_i.\]

Parameters:	x (Variable) – Input variable.
Returns:	Output variable.
Return type:	Variable

exp¶

chainer.functions.exp(x)[source]¶: Elementwise exponential function.

fmod¶

chainer.functions.fmod(x, divisor)[source]¶

Elementwise mod function.

\[y_i = x_i \bmod \mathrm{divisor}.\]

Parameters:	x (Variable) – Input variable. divisor (Variable) – Input divisor.
Returns:	Output variable.
Return type:	Variable

floor¶

chainer.functions.floor(x)[source]¶

Elementwise floor function.

\[y_i = \lfloor x_i \rfloor\]

Parameters:	x (Variable) – Input variable.
Returns:	Output variable.
Return type:	Variable

identity¶

chainer.functions.identity(*inputs)[source]¶: Just returns input variables.

inv¶

chainer.functions.inv(a)[source]¶

Computes the inverse of square matrix.

Parameters:	a (Variable) – Input array to compute the inverse for. Shape of the array should be `(n, n)` where `n` is the dimensionality of a square matrix.
Returns:	Matrix inverse of `a`.
Return type:	Variable

linear_interpolate¶

chainer.functions.linear_interpolate(p, x, y)[source]¶

Elementwise linear-interpolation function.

This function is defined as

\[f(p, x, y) = p x + (1 - p) y.\]

Parameters:	p (Variable) – Input variable. x (Variable) – Input variable. y (Variable) – Input variable.
Returns:	Output variable.
Return type:	Variable

log¶

chainer.functions.log(x)[source]¶: Elementwise natural logarithm function.

log10¶

chainer.functions.log10(x)[source]¶

Elementwise logarithm function to the base 10.

\[y_i = \log_{10} x_i.\]

Parameters:	x (Variable) – Input variable.
Returns:	Output variable.
Return type:	Variable

log1p¶

chainer.functions.log1p(x)[source]¶: Elementwise natural logarithm plus one function.

log2¶

chainer.functions.log2(x)[source]¶

Elementwise logarithm function to the base 2.

\[y_i = \log_2 x_i.\]

Parameters:	x (Variable) – Input variable.
Returns:	Output variable.
Return type:	Variable

logsumexp¶

chainer.functions.logsumexp(x, axis=None)[source]¶

Log-sum-exp of array elements over a given axis.

This function calculates logarithm of sum of exponential of array elements.

\[y_i = \log\left(\sum_j \exp(x_{ij})\right)\]

Parameters:	x (Variable) – Elements to log-sum-exp. axis (None, int, or tuple of int) – Axis which a sum is performed. The default (axis = None) is perform a sum over all the dimensions of the input array.
Returns:	Output variable.
Return type:	Variable

matmul¶

chainer.functions.matmul(a, b, transa=False, transb=False)[source]¶

Computes the matrix multiplication of two arrays.

Parameters:

a (Variable) – The left operand of the matrix multiplication. A 1-D array of shape (N,) is considered as an \(N \times 1\) matrix. A 2-D array of shape (M, N) is considered as an \(M \times N\) matrix.
b (Variable) – The right operand of the matrix multiplication. Its array is treated as a matrix in the same way as a‘s array.
transa (bool) – If True, transpose a.
transb (bool) – If True, transpose b.

Returns:

The result of the matrix multiplication as a 2-D: array.

Return type:

Variable

max¶

chainer.functions.max(x, axis=None, keepdims=False)[source]¶

Maximum of array elements over a given axis.

Parameters:	x (Variable) – Array to be maximized. axis (None, int, or tuple of int) – Axis over which a max is performed. The default (axis = None) is perform a max over all the dimensions of the input array.
Returns:	Output variable.
Return type:	Variable

maximum¶

chainer.functions.maximum(x1, x2)[source]¶

Element-wise maximum of input variables.

Parameters:	x1 (Variable) – Input variables to be compared. x2 (Variable) – Input variables to be compared.
Returns:	Output variable.
Return type:	Variable

mean¶

chainer.functions.mean(x, axis=None, weights=None, keepdims=False)¶

Calculate weighted average of array elements over a given axis.

Parameters:	x (Variable) – Elements to sum. axis (None or int) – Axis which the method is performed. With the default (axis = None) it performs a mean over all the dimensions of the input array. weights (None or chainer.Variable) – An array holding weights to calculate weighted average. If it is `None`, all weights are assumed to be one. When `axis` is `None`, `weights` must have the same shape of `x`. And when `axis` is `int`, it must be 1-D array satisfing `weights.shape == (x.shape[axis],)`. keepdims (bool) – If `True`, the specified axes are remained as axes of length one.
Returns:	Output variable.
Return type:	Variable

min¶

chainer.functions.min(x, axis=None, keepdims=False)[source]¶

Minimum of array elements over a given axis.

Parameters:	x (Variable) – Array to be minimized. axis (None, int, or tuple of int) – Axis over which a min is performed. The default (axis = None) is perform a min over all the dimensions of the input array.
Returns:	Output variable.
Return type:	Variable

minimum¶

chainer.functions.minimum(x1, x2)[source]¶

Element-wise minimum of input variables.

Parameters:	x1 (Variable) – Input variables to be compared. x2 (Variable) – Input variables to be compared.
Returns:	Output variable.
Return type:	Variable

rsqrt¶

chainer.functions.rsqrt(x)[source]¶

Computes elementwise reciprocal of square root of input \(x_i\).

\[y_i = {1 \over \sqrt x_i}.\]

Parameters:	x (Variable) – Input variable.
Returns:	Output variable.
Return type:	Variable

See also

sqrt()

scale¶

chainer.functions.scale(x, y, axis=1)[source]¶

Elementwise product with broadcasting.

Computes a elementwise product of two input variables, with the shape of the latter variable broadcasted to match the shape of the former. axis is the first axis of the first variable along which the second variable is applied.

The term “broadcasting” here comes from Caffe’s scale layer so the “broadcasting” with the following arguments:

   x : 100 x 3 x 40 x 60
   y : 3 x 40
axis : 1

is equivalent to the following numpy broadcasting:

x : 100 x 3 x 40 x 60
y :   1 x 3 x 40 x 1

Note that how the axis indicates to which axis of x we apply y.

Parameters:	x (Variable) – Input variable to be scaled. y (Variable) – Input variable to scale, broadcasted. axis (int) – The first axis of `x` along which `y` is applied.
Returns:	Output variable.
Return type:	Variable

sin¶

chainer.functions.sin(x)[source]¶: Elementwise sin function.

sinh¶

chainer.functions.sinh(x)[source]¶

Elementwise hyperbolic sine function.

\[y_i = \sinh x_i.\]

Parameters:	x (Variable) – Input variable.
Returns:	Output variable.
Return type:	Variable

sqrt¶

chainer.functions.sqrt(x)[source]¶

Elementwise square root function.

\[y_i = \sqrt x_i.\]

If the value of \(x_i\) is negative, it returns Nan for \(y_i\) respect to underlying numpy and cupy specification.

Parameters:	x (Variable) – Input variable.
Returns:	Output variable.
Return type:	Variable

square¶

chainer.functions.square(x)[source]¶

Elementwise square function.

\[y_i = x_i ^ 2.\]

Parameters:	x (chainer.Variable or `numpy.ndarray` or cupy.ndarray) – Input variable.
Returns:	Output variable.
Return type:	Variable

squared_difference¶

chainer.functions.squared_difference(x1, x2)[source]¶

Squared difference of input variables.

Parameters:	x1 (Variable) – Input variables to be compared. x2 (Variable) – Input variables to be compared.
Returns:	`(x1 - x2) ** 2` element-wise.
Return type:	Variable

sum¶

chainer.functions.sum(x, axis=None, keepdims=False)[source]¶

Sum of array elements over a given axis.

Parameters:	x (Variable) – Elements to sum. axis (None, int, or tuple of int) – Axis which a sum is performed. The default (axis = None) is perform a sum over all the dimensions of the input array. keepdims (bool) – If `True`, the specified axes are remained as axes of length one.
Returns:	Output variable.
Return type:	Variable

tanh¶

Hyperbolic tangent function is described in “Activation functions” section.

See also

tanh()

tan¶

chainer.functions.tan(x)[source]¶: Elementwise tan function.

Noise injections¶

dropout¶

chainer.functions.dropout(x, ratio=0.5, train=True)[source]¶

Drops elements of input variable randomly.

This function drops input elements randomly with probability ratio and scales the remaining elements by factor 1 / (1 - ratio). In testing mode, it does nothing and just returns x.

Parameters:	x (Variable) – Input variable. ratio (float) – Dropout ratio. train (bool) – If `True`, executes dropout. Otherwise, does nothing.
Returns:	Output variable.
Return type:	Variable

See the paper by G. Hinton: Improving neural networks by preventing co-adaptation of feature detectors.

gaussian¶

chainer.functions.gaussian(mean, ln_var)[source]¶

Gaussian sampling function.

It takes mean \(\mu\) and logarithm of variance \(\log(\sigma^2)\) as input and output a sample drawn from gaussian \(N(\mu, \sigma)\).

Parameters:	mean (Variable) – Input variable representing mean \(\mu\). ln_var (Variable) – Input variable representing logarithm of variance \(\log(\sigma^2)\).
Returns:	Output variable.
Return type:	Variable

simplified_dropconnect¶

chainer.functions.simplified_dropconnect(x, W, b=None, ratio=0.5, train=True, mask=None)[source]¶

Linear unit regularized by simplified dropconnect.

Simplified dropconnect drops weight matrix elements randomly with probability ratio and scales the remaining elements by factor 1 / (1 - ratio). Which element is dropped depends on each sample. It accepts two or three arguments: an input minibatch x, a weight matrix W, and optionally a bias vector b. It computes \(Y = xW^\top + b\).

In testing mode, zero will be used as simplified dropconnect ratio instead of ratio.

Notice: This implementation cannot be used for reproduction of the paper. There is a difference between the current implementation and the original one. The original version uses sampling with gaussian distribution before passing activation function, whereas the current implementation averages before activation.

Parameters:

x (chainer.Variable or numpy.ndarray or cupy.ndarray) – Input variable. Its first dimension n is assumed to be the minibatch dimension. The other dimensions are treated as concatenated one dimension whose size must be N.
W (Variable) – Weight variable of shape (M, N).
b (Variable) – Bias variable (optional) of shape (M,).
ratio (float) – Dropconnect ratio.
train (bool) – If True, executes simplified dropconnect. Otherwise, simplified dropconnect function works as a linear function.

:param mask (None or chainer.Variable or numpy.ndarray or: cupy.ndarray):: If None, randomized dropconnect mask is generated. Otherwise, The mask must be (n, M, N) shaped array. Main purpose of this option is debugging. mask array will be used as a dropconnect mask.

Returns:	Output variable.
Return type:	Variable

See also

Dropconnect

See also

Li, W., Matthew Z., Sixin Z., Yann L., Rob F. (2013). Regularization of Neural Network using DropConnect. International Conference on Machine Learning. URL

Normalization functions¶

batch_normalization¶

chainer.functions.batch_normalization(x, gamma, beta, eps=2e-05, running_mean=None, running_var=None, decay=0.9, use_cudnn=True)[source]¶

Batch normalization function.

It takes the input variable x and two parameter variables gamma and beta. The parameter variables must both have the same dimensionality, which is referred to as the channel shape. This channel shape corresponds to the dimensions in the input which are not averaged over. Since the first dimension of the input corresponds to the batch size, the second dimension of x will correspond to the first dimension of the channel shape, the third dimension of x will correspond to the second channel dimension (if it exists) and so on. Therefore, the dimensionality of the input must be at least one plus the number of channel dimensions. The total effective “batch size” will then be considered to be the product of all dimensions in x except for the channel dimensions.

As an example, if the input is four dimensional and the parameter variables are one dimensional, then it is assumed that the first dimension of the input is the batch size, the second dimension is the channel size, and the remaining two dimensions are considered to be spatial dimensions that will be averaged over along with the batch size in the batch normalization computations. That is, the total batch size will be considered to be the product of all input dimensions except the second dimension.

Note: If this function is called, it will not be possible to access the updated running mean and variance statistics, because they are members of the function object, which cannot be accessed by the caller. If it is desired to access the updated running statistics, it is necessary to get a new instance of the function object, call the object, and then access the running_mean and/or running_var attributes. See the corresponding Link class for an example of how to do this.

Parameters:

x (Variable) – Input variable.
gamma (Variable) – Scaling parameter of normalized data.
beta (Variable) – Shifting parameter of scaled normalized data.
eps (float) – Epsilon value for numerical stability.
running_mean (array) – Running average of the mean. This is a running average of the mean over several mini-batches using the decay parameter. If None, the running average is not computed. If this is None, then runnng_var must also be None.
running_var (array) – Running average of the variance. This is a running average of the variance over several mini-batches using the decay parameter. If None, the running average is not computed. If this is None, then running_mean must also be None.
decay (float) – Decay rate of moving average. It is used during training.
use_cudnn (bool) – If True and cuDNN is enabled, then this function uses cuDNN as the core implementation.

See: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

See also

links.BatchNormalization

fixed_batch_normalization¶

chainer.functions.fixed_batch_normalization(x, gamma, beta, mean, var, eps=2e-05, use_cudnn=True)[source]¶

Batch normalization function with fixed statistics.

This is a variant of batch normalization, where the mean and variance statistics are given by the caller as fixed variables. This is used on testing mode of the batch normalization layer, where batch statistics cannot be used for prediction consistency.

Parameters:

x (Variable) – Input variable.
gamma (Variable) – Scaling parameter of normalized data.
beta (Variable) – Shifting parameter of scaled normalized data.
mean (Variable) – Shifting parameter of input.
var (Variable) – Square of scaling parameter of input.
eps (float) – Epsilon value for numerical stability.
use_cudnn (bool) – If True and cuDNN is enabled, then this function uses cuDNN as the core implementation.

See also

functions.batch_normalization(), links.BatchNormalization

local_response_normalization¶

chainer.functions.local_response_normalization(x, n=5, k=2, alpha=0.0001, beta=0.75)[source]¶

Local response normalization across neighboring channels.

This function implements normalization across channels. Let \(x\) an input image with \(N\) channels. Then, this function computes an output image \(y\) by following formula:

\[y_i = {x_i \over \left( k + \ \alpha \sum_{j=\max{1, i - n/2}}^{\min{N, i + n/2}} \ x_j^2 \right)^\beta}.\]

Parameters:	x (Variable) – Input variable. n (int) – Normalization window width. k (float) – Smoothing parameter. alpha (float) – Normalizer scaling parameter. beta (float) – Normalizer power parameter.
Returns:	Output variable.
Return type:	Variable

See: Section 3.3 of ImageNet Classification with Deep Convolutional Neural Networks

normalize¶

chainer.functions.normalize(x, eps=1e-05, axis=1)[source]¶

L2 norm squared (a.k.a. Euclidean norm).

This function implements L2 normalization on a vector along the given axis. No reduction is done along the normalization axis.

In the case when axis=1 and \(x\) is a vector of dimension \((N, K)\), where \(N\) and \(K\) denote mini-batch size and the dimension of the input variable, this function computes an output vector \(y\) by the following equation:

\[y_i = {x_i \over \| x_i \|_2 + \epsilon}\]

eps is used to avoid division by zero when norm of \(x\) along the given axis is zero.

The default value of axis is determined for backward compatibility.

Parameters:	x (Variable) – Two dimensional output variable. The first dimension is assumed to be the mini-batch dimension. eps (float) – Epsilon value for numerical stability. axis (int) – Axis along which to normalize.
Returns:	The output variable which has the same shape as \(x\).
Return type:	Variable

Spatial pooling¶

average_pooling_2d¶

chainer.functions.average_pooling_2d(x, ksize, stride=None, pad=0, use_cudnn=True)[source]¶

Spatial average pooling function.

This function acts similarly to Convolution2D, but it computes the average of input spatial patch for each channel without any parameter instead of computing the inner products.

Parameters:	x (Variable) – Input variable. ksize (int or pair of ints) – Size of pooling window. `ksize=k` and `ksize=(k, k)` are equivalent. stride (int or pair of ints or None) – Stride of pooling applications. `stride=s` and `stride=(s, s)` are equivalent. If `None` is specified, then it uses same stride as the pooling window size. pad (int or pair of ints) – Spatial padding width for the input array. `pad=p` and `pad=(p, p)` are equivalent. use_cudnn (bool) – If `True` and cuDNN is enabled, then this function uses cuDNN as the core implementation.
Returns:	Output variable.
Return type:	Variable

Note

This function currently does not support cover_all mode as max_pooling_2d(). Average pooling runs in non-cover-all mode.

average_pooling_nd¶

chainer.functions.average_pooling_nd(x, ksize, stride=None, pad=0, use_cudnn=True)[source]¶

N-dimensionally spatial average pooling function.

This function provides a N-dimensionally generalized version of average_pooling_2d(). This acts similarly to ConvolutionND, but it computes the average of input spatial patch for each channel without any parameter instead of computing the inner products.

Parameters:	x (Variable) – Input variable. ksize (int or tuple of ints) – Size of pooling window. `ksize=k` and `ksize=(k, k, ..., k)` are equivalent. stride (int or tuple of ints or None) – Stride of pooling applications. `stride=s` and `stride=(s, s, ..., s)` are equivalent. If `None` is specified, then it uses same stride as the pooling window size. pad (int or tuple of ints) – Spatial padding width for the input array. `pad=p` and `pad=(p, p, ..., p)` are equivalent. use_cudnn (bool) – If `True` and cuDNN is enabled, then this function uses cuDNN as the core implementation. cuDNN supports more than one-dimensional pooling.
Returns:	Output variable.
Return type:	Variable

Note

This function currently does not support cover_all mode as max_pooling_nd(). Average pooling runs in non-cover-all mode.

max_pooling_2d¶

chainer.functions.max_pooling_2d(x, ksize, stride=None, pad=0, cover_all=True, use_cudnn=True)[source]¶

Spatial max pooling function.

This function acts similarly to Convolution2D, but it computes the maximum of input spatial patch for each channel without any parameter instead of computing the inner products.

Parameters:	x (Variable) – Input variable. ksize (int or pair of ints) – Size of pooling window. `ksize=k` and `ksize=(k, k)` are equivalent. stride (int or pair of ints or None) – Stride of pooling applications. `stride=s` and `stride=(s, s)` are equivalent. If `None` is specified, then it uses same stride as the pooling window size. pad (int or pair of ints) – Spatial padding width for the input array. `pad=p` and `pad=(p, p)` are equivalent. cover_all (bool) – If `True`, all spatial locations are pooled into some output pixels. It may make the output size larger. use_cudnn (bool) – If `True` and cuDNN is enabled, then this function uses cuDNN as the core implementation.
Returns:	Output variable.
Return type:	Variable

max_pooling_nd¶

chainer.functions.max_pooling_nd(x, ksize, stride=None, pad=0, cover_all=True, use_cudnn=True)[source]¶

N-dimensionally spatial max pooling function.

This function provides a N-dimensionally generalized version of max_pooling_2d(). This acts similarly to ConvolutionND, but it computes the maximum of input spatial patch for each channel without any parameter instead of computing the inner products.

Parameters:	x (Variable) – Input variable. ksize (int or tuple of ints) – Size of pooling window. `ksize=k` and `ksize=(k, k, ..., k)` are equivalent. stride (int or tuple of ints or None) – Stride of pooling applications. `stride=s` and `stride=(s,s, ..., s)` are equivalent. If `None` is specified, then it uses same stride as the pooling window size. pad (int or tuple of ints) – Spatial padding width for the input array. `pad=p` and `pad=(p, p, ..., p)` are equivalent. cover_all (bool) – If `True`, all spatial locations are pooled into some output pixels. It may make the output size larger. use_cudnn (bool) – If `True` and cuDNN is enabled, then this function uses cuDNN as the core implementation. cuDNN supports more than one-dimensional pooling.
Returns:	Output variable.
Return type:	Variable

roi_pooling_2d¶

chainer.functions.roi_pooling_2d(x, rois, outh, outw, spatial_scale)[source]¶

Spatial Region of Interest (ROI) pooling function.

This function acts similarly to MaxPooling2D, but it computes the maximum of input spatial patch for each channel with the region of interest.

Parameters:	x (Variable) – Input variable. The shape is expected to be 4 dimentional: (n: batch, c: channel, h, height, w: width). rois (Variable) – Input roi variable. The shape is expected to be (n: data size, 5), and each datum is set as below: (batch_index, x_min, y_min, x_max, y_max). outh (int) – Height of output image after pooled. outw (int) – Width of output image after pooled. spatial_scale (float) – Scale of the roi is resized.
Returns:	Output variable.
Return type:	Variable

See the original paper proposing ROIPooling: Fast R-CNN.

spatial_pyramid_pooling_2d¶

chainer.functions.spatial_pyramid_pooling_2d(x, pyramid_height, pooling_class, use_cudnn=True)[source]¶

Spatial pyramid pooling function.

It outputs a fixed-length vector regardless of input feature map size.

It performs pooling operation to the input 4D-array x with different kernel sizes and padding sizes, and then flattens all dimensions except first dimension of all pooling results, and finally concatenates them along second dimension.

At \(i\)-th pyramid level, the kernel size \((k_h^{(i)}, k_w^{(i)})\) and padding size \((p_h^{(i)}, p_w^{(i)})\) of pooling operation are calculated as below:

\[\begin{split}k_h^{(i)} &= \lceil b_h / 2^i \rceil, \\ k_w^{(i)} &= \lceil b_w / 2^i \rceil, \\ p_h^{(i)} &= (2^i k_h^{(i)} - b_h) / 2, \\ p_w^{(i)} &= (2^i k_w^{(i)} - b_w) / 2,\end{split}\]

where \(\lceil \cdot \rceil\) denotes the ceiling function, and \(b_h, b_w\) are height and width of input variable x, respectively. Note that index of pyramid level \(i\) is zero-based.

See detail in paper: Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition.

Parameters:

x (Variable) – Input variable. The shape of x should be (batchsize, # of channels, height, width).
pyramid_height (int) – Number of pyramid levels
pooling_class (MaxPooling2D or AveragePooling2D) – Only MaxPooling2D class can be available for now.
use_cudnn (bool) – If True and cuDNN is enabled, then this function uses cuDNN as the core implementation.

Returns:

Output variable. The shape of the output variable: will be \((batchsize, c \sum_{h=0}^{H-1} 2^{2h}, 1, 1)\), where \(c\) is the number of channels of input variable x and \(H\) is the number of pyramid levels.

Return type:

Variable

Note

This function uses some pooling classes as components to perform spatial pyramid pooling. Now it supports only MaxPooling2D as elemental pooling operator so far.

unpooling_2d¶

chainer.functions.unpooling_2d(x, ksize, stride=None, pad=0, outsize=None, cover_all=True)[source]¶

Inverse operation of pooling for 2d array.

This function acts similarly to Deconvolution2D, but it spreads input 2d array’s value without any parameter instead of computing the inner products.

Parameters:	x (Variable) – Input variable. ksize (int or pair of ints) – Size of pooling window. `ksize=k` and `ksize=(k, k)` are equivalent. stride (int, pair of ints or None) – Stride of pooling applications. `stride=s` and `stride=(s, s)` are equivalent. If `None` is specified, then it uses same stride as the pooling window size. pad (int or pair of ints) – Spatial padding width for the input array. `pad=p` and `pad=(p, p)` are equivalent. outsize (None or pair of ints) – Expected output size (height, width) of array after the operation. If `None`, the size (height or width) is estimated from the size of input array in first batch with `get_deconv_outsize()`. If outsize is not `None`, the result of outsize applied to `get_conv_outsize()` must be equal to the shape of the 2d array in the input batch `x`. cover_all (bool) – If `True`, the output size may be smaller than the size if `cover_all` is `False`. This flag serves to align behavior to the pooling functions which can cover all input locations, see `max_pooling_2d()` and `convolution_2d()`.
Returns:	Output variable.
Return type:	Variable

upsampling_2d¶

chainer.functions.upsampling_2d(x, indexes, ksize, stride=None, pad=0, outsize=None, cover_all=True)[source]¶

Upsampling using pooling indices.

This function produces an upsampled image using pooling indices.

Example

It should be noted that you need to specify use_cudnn=False when you create MaxPooling2D object because if cuDNN used for operating max pooling, indexes is never created and stored in the MaxPooling2D object.

>>> p = F.MaxPooling2D(2, 2, use_cudnn=False)
>>> x = np.arange(1, 37).reshape(1, 1, 6, 6).astype('f')
>>> x = chainer.Variable(x)
>>> x.data
array([[[[  1.,   2.,   3.,   4.,   5.,   6.],
         [  7.,   8.,   9.,  10.,  11.,  12.],
         [ 13.,  14.,  15.,  16.,  17.,  18.],
         [ 19.,  20.,  21.,  22.,  23.,  24.],
         [ 25.,  26.,  27.,  28.,  29.,  30.],
         [ 31.,  32.,  33.,  34.,  35.,  36.]]]], dtype=float32)

This is the original x before max pooling.

>>> pooled_x = p(x)
>>> pooled_x.data
array([[[[  8.,  10.,  12.],
         [ 20.,  22.,  24.],
         [ 32.,  34.,  36.]]]], dtype=float32)

This is the output of the max pooling operation. upsampling_2d needs indexes array stored in the max pooling object p.

>>> upsampled_x = F.upsampling_2d(
...     pooled_x, p.indexes, p.kh, p.sy, p.ph, x.shape[2:])
>>> upsampled_x.shape
(1, 1, 6, 6)
>>> upsampled_x.data
array([[[[  0.,   0.,   0.,   0.,   0.,   0.],
         [  0.,   8.,   0.,  10.,   0.,  12.],
         [  0.,   0.,   0.,   0.,   0.,   0.],
         [  0.,  20.,   0.,  22.,   0.,  24.],
         [  0.,   0.,   0.,   0.,   0.,   0.],
         [  0.,  32.,   0.,  34.,   0.,  36.]]]], dtype=float32)

Parameters:	x (Variable) – Input variable. indexes (ndarray or ndarray) – Index array that was used to calculate x with MaxPooling2D. ksize (int or (int, int)) – ksize attribute of MaxPooling2D object that is used to calculate x stride (int or (int, int)) – stride attribute of MaxPooling2D object that is used to calculate x pad (int or (int, int)) – pad attribute of MaxPooling2D object that is used to calculate x outsize ((int, int)) – Expected output size (height, width). cover_all (bool) – Whether cover_all is used in the MaxPooling2D object or not.
Returns:	Output variable.
Return type:	Variable

Utility functions¶

forget¶

chainer.functions.forget(func, *xs)[source]¶

Call a function without storing internal results.

On a forward propagation Chainer stores all internal results of Function on a computational graph as they are required on backward-propagation. These results consume too much memory when the internal results are too large. This method forgets such internal results on forward propagation, and still supports back-propagation with recalculation.

In a forward propagation, this method calls a given function with given variables without creating a computational graph. That means, no internal results are stored. In a backward propagation this method calls the given function again to create a computational graph to execute back-propagation.

This method reduces internal memory usage. Instead it requires more calculation time as it calls the function twice.

Example

Let f be a function defined as:

>>> def f(a, b):
...   return a + b + a

and, x and y be Variable:

>>> x = chainer.Variable(np.random.uniform(-1, 1, 5).astype('f'))
>>> y = chainer.Variable(np.random.uniform(-1, 1, 5).astype('f'))

When z is calculated as z = f(x, y), its internal result x + y is stored in memory. Instead if you call f with forget():

>>> z = F.forget(f, x, y)

internal x + y is forgotten.

Note

The method does not support functions behaving randomly, such as dropout() and negative_sampling(). It is because first results of these function differ from the second one.

Parameters:	func (callable) – A function to call. It needs to be called with `Variable` object(s) and to return a `Variable` object or a tuple of `Variable` objects. xs (Variable) – Argument variables of the function.
Returns:	A variable `func` returns. If it returns a tuple, the method returns a tuple too.
Return type:	Variable