chainerx.conv¶
-
chainerx.
conv
(x, w, b=None, stride=1, pad=0, cover_all=False)¶ N-dimensional convolution.
This is an implementation of N-dimensional convolution which is generalized two-dimensional convolution in ConvNets. It takes three arrays: the input
x
, the filter weightw
and the bias vectorb
.Notation: here is a notation for dimensionalities.
\(N\) is the number of spatial dimensions.
\(n\) is the batch size.
\(c_I\) and \(c_O\) are the number of the input and output channels, respectively.
\(d_1, d_2, ..., d_N\) are the size of each axis of the input’s spatial dimensions, respectively.
\(k_1, k_2, ..., k_N\) are the size of each axis of the filters, respectively.
\(l_1, l_2, ..., l_N\) are the size of each axis of the output’s spatial dimensions, respectively.
\(p_1, p_2, ..., p_N\) are the size of each axis of the spatial padding size, respectively.
Then the
conv
function computes correlations between filters and patches of size \((k_1, k_2, ..., k_N)\) inx
. Note that correlation here is equivalent to the inner product between expanded tensors. Patches are extracted at positions shifted by multiples ofstride
from the first position(-p_1, -p_2, ..., -p_N)
for each spatial axis.Let \((s_1, s_2, ..., s_N)\) be the stride of filter application. Then, the output size \((l_1, l_2, ..., l_N)\) is determined by the following equations:
\[l_n = (d_n + 2p_n - k_n) / s_n + 1 \ \ (n = 1, ..., N)\]If
cover_all
option isTrue
, the filter will cover the all spatial locations. So, if the last stride of filter does not cover the end of spatial locations, an additional stride will be applied to the end part of spatial locations. In this case, the output size is determined by the following equations:\[l_n = (d_n + 2p_n - k_n + s_n - 1) / s_n + 1 \ \ (n = 1, ..., N)\]- Parameters
x (
ndarray
) – Input array of shape \((n, c_I, d_1, d_2, ..., d_N)\).w (
ndarray
) – Weight array of shape \((c_O, c_I, k_1, k_2, ..., k_N)\).b (None or
ndarray
) – One-dimensional bias array with length \(c_O\) (optional).stride (
int
ortuple
ofint
s) – Stride of filter applications \((s_1, s_2, ..., s_N)\).stride=s
is equivalent to(s, s, ..., s)
.pad (
int
ortuple
ofint
s) – Spatial padding width for input arrays \((p_1, p_2, ..., p_N)\).pad=p
is equivalent to(p, p, ..., p)
.cover_all (bool) – If
True
, all spatial locations are convoluted into some output pixels. It may make the output size larger. cover_all needs to beFalse
if you want to usecuda
backend.
- Returns
Output array of shape \((n, c_O, l_1, l_2, ..., l_N)\).
- Return type
Note
In
cuda
backend, this function uses cuDNN implementation for its forward and backward computation.Note
In
cuda
backend, this function has following limitations yet:The
cover_all=True
option is not supported yet.The
dtype
must befloat32
orfloat64
(float16
is not supported yet.)
Note
During backpropagation, this function propagates the gradient of the output array to input arrays
x
,w
, andb
.See also
Example
>>> n = 10 >>> c_i, c_o = 3, 1 >>> d1, d2, d3 = 30, 40, 50 >>> k1, k2, k3 = 10, 10, 10 >>> p1, p2, p3 = 5, 5, 5 >>> x = chainerx.random.uniform(0, 1, (n, c_i, d1, d2, d3)).astype(np.float32) >>> x.shape (10, 3, 30, 40, 50) >>> w = chainerx.random.uniform(0, 1, (c_o, c_i, k1, k2, k3)).astype(np.float32) >>> w.shape (1, 3, 10, 10, 10) >>> b = chainerx.random.uniform(0, 1, (c_o)).astype(np.float32) >>> b.shape (1,) >>> s1, s2, s3 = 2, 4, 6 >>> y = chainerx.conv(x, w, b, stride=(s1, s2, s3), pad=(p1, p2, p3)) >>> y.shape (10, 1, 16, 11, 9) >>> l1 = int((d1 + 2 * p1 - k1) / s1 + 1) >>> l2 = int((d2 + 2 * p2 - k2) / s2 + 1) >>> l3 = int((d3 + 2 * p3 - k3) / s3 + 1) >>> y.shape == (n, c_o, l1, l2, l3) True >>> y = chainerx.conv(x, w, b, stride=(s1, s2, s3), pad=(p1, p2, p3), cover_all=True) >>> y.shape == (n, c_o, l1, l2, l3 + 1) True