# chainer.functions.spatial_pyramid_pooling_2d¶

chainer.functions.spatial_pyramid_pooling_2d(x, pyramid_height, pooling=None)[source]

Spatial pyramid pooling function.

It outputs a fixed-length vector regardless of input feature map size.

It performs pooling operation to the input 4D-array x with different kernel sizes and padding sizes, and then flattens all dimensions except first dimension of all pooling results, and finally concatenates them along second dimension.

At $$i$$-th pyramid level, the kernel size $$(k_h^{(i)}, k_w^{(i)})$$ and padding size $$(p_h^{(i)}, p_w^{(i)})$$ of pooling operation are calculated as below:

$\begin{split}k_h^{(i)} &= \lceil b_h / 2^i \rceil, \\ k_w^{(i)} &= \lceil b_w / 2^i \rceil, \\ p_h^{(i)} &= (2^i k_h^{(i)} - b_h) / 2, \\ p_w^{(i)} &= (2^i k_w^{(i)} - b_w) / 2,\end{split}$

where $$\lceil \cdot \rceil$$ denotes the ceiling function, and $$b_h, b_w$$ are height and width of input variable x, respectively. Note that index of pyramid level $$i$$ is zero-based.

See detail in paper: Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition.

Parameters: x (Variable) – Input variable. The shape of x should be (batchsize, # of channels, height, width). pyramid_height (int) – Number of pyramid levels pooling (str) – Currently, only max is supported, which performs a 2d max pooling operation. Output variable. The shape of the output variable will be $$(batchsize, c \sum_{h=0}^{H-1} 2^{2h}, 1, 1)$$, where $$c$$ is the number of channels of input variable x and $$H$$ is the number of pyramid levels. Variable