chainer.functions.convolution_2d¶
- chainer.functions.convolution_2d(x, W, b=None, stride=1, pad=0, cover_all=False, *, dilate=1, groups=1)[source]¶
Two-dimensional convolution function.
This is an implementation of two-dimensional convolution in ConvNets. It takes three variables: the input image
x
, the filter weightW
, and the bias vectorb
.Notation: here is a notation for dimensionalities.
\(n\) is the batch size.
\(c_I\) and \(c_O\) are the number of the input and output channels, respectively.
\(h_I\) and \(w_I\) are the height and width of the input image, respectively.
\(h_K\) and \(w_K\) are the height and width of the filters, respectively.
\(h_P\) and \(w_P\) are the height and width of the spatial padding size, respectively.
Then the
Convolution2D
function computes correlations between filters and patches of size \((h_K, w_K)\) inx
. Note that correlation here is equivalent to the inner product between expanded vectors. Patches are extracted at positions shifted by multiples ofstride
from the first position(-h_P, -w_P)
for each spatial axis. The right-most (or bottom-most) patches do not run over the padded spatial size.Let \((s_Y, s_X)\) be the stride of filter application. Then, the output size \((h_O, w_O)\) is determined by the following equations:
\[\begin{split}h_O &= (h_I + 2h_P - h_K) / s_Y + 1,\\ w_O &= (w_I + 2w_P - w_K) / s_X + 1.\end{split}\]If
cover_all
option isTrue
, the filter will cover the all spatial locations. So, if the last stride of filter does not cover the end of spatial locations, an additional stride will be applied to the end part of spatial locations. In this case, the output size \((h_O, w_O)\) is determined by the following equations:\[\begin{split}h_O &= (h_I + 2h_P - h_K + s_Y - 1) / s_Y + 1,\\ w_O &= (w_I + 2w_P - w_K + s_X - 1) / s_X + 1.\end{split}\]If the bias vector is given, then it is added to all spatial locations of the output of convolution.
The output of this function can be non-deterministic when it uses cuDNN. If
chainer.configuration.config.cudnn_deterministic
isTrue
and cuDNN version is >= v3, it forces cuDNN to use a deterministic algorithm.Convolution links can use a feature of cuDNN called autotuning, which selects the most efficient CNN algorithm for images of fixed-size, can provide a significant performance boost for fixed neural nets. To enable, set chainer.using_config(‘autotune’, True)
When the dilation factor is greater than one, cuDNN is not used unless the version is 6.0 or higher.
- Parameters
x (
Variable
or N-dimensional array) – Input variable of shape \((n, c_I, h_I, w_I)\).W (
Variable
or N-dimensional array) – Weight variable of shape \((c_O, c_I, h_K, w_K)\).b (None or
Variable
or N-dimensional array) – Bias variable of length \(c_O\) (optional).stride (
int
or pair ofint
s) – Stride of filter applications.stride=s
andstride=(s, s)
are equivalent.pad (
int
or pair ofint
s) – Spatial padding width for input arrays.pad=p
andpad=(p, p)
are equivalent.cover_all (
bool
) – IfTrue
, all spatial locations are convoluted into some output pixels.dilate (
int
or pair ofint
s) – Dilation factor of filter applications.dilate=d
anddilate=(d, d)
are equivalent.groups (
int
) – Number of groups of channels. If the number is greater than 1, input tensor \(W\) is divided into some blocks by this value. For each tensor blocks, convolution operation will be executed independently. Input channel size \(c_I\) and output channel size \(c_O\) must be exactly divisible by this value.
- Returns
Output variable of shape \((n, c_O, h_O, w_O)\).
- Return type
See also
Convolution2D
to manage the model parametersW
andb
.Example
>>> n = 10 >>> c_i, c_o = 3, 1 >>> h_i, w_i = 30, 40 >>> h_k, w_k = 10, 10 >>> h_p, w_p = 5, 5 >>> x = np.random.uniform(0, 1, (n, c_i, h_i, w_i)).astype(np.float32) >>> x.shape (10, 3, 30, 40) >>> W = np.random.uniform(0, 1, (c_o, c_i, h_k, w_k)).astype(np.float32) >>> W.shape (1, 3, 10, 10) >>> b = np.random.uniform(0, 1, (c_o,)).astype(np.float32) >>> b.shape (1,) >>> s_y, s_x = 5, 7 >>> y = F.convolution_2d(x, W, b, stride=(s_y, s_x), pad=(h_p, w_p)) >>> y.shape (10, 1, 7, 6) >>> h_o = int((h_i + 2 * h_p - h_k) / s_y + 1) >>> w_o = int((w_i + 2 * w_p - w_k) / s_x + 1) >>> y.shape == (n, c_o, h_o, w_o) True >>> y = F.convolution_2d(x, W, b, stride=(s_y, s_x), pad=(h_p, w_p), cover_all=True) >>> y.shape == (n, c_o, h_o, w_o + 1) True