Reduction

Image size can be easily reduced by subsampling, e.g., getting rid of every other pixel in each row and column:

$\displaystyle f_{4 \times 4}=\left[ \begin{array}{cccc} 1 & 2 & 4 & 3 \\ 2 & 99... ...nd{array} \right], \left[ \begin{array}{cc} 2 & 3 \\ 3 & 1 \end{array} \right],$ or $\displaystyle \left[ \begin{array}{cc} 99 & 8 \\ 2 & 4 \end{array} \right]$

(9)

In any of the four possible subsampling cases, three fourths of the information contained in the original image is lost. A better way is to find the average of a $2 \times 2$ neighborhood as the resulting pixel:

$\displaystyle \left[ \begin{array}{cccc} 1 & 2 & 4 & 3 \\ 2 & 99 & 3 & 8 \\ 1 ... ... \Rightarrow \left[ \begin{array}{cc} 26 & 4.5 \\ 2.5 & 2.5 \end{array} \right]$

(10)

This is called average pooling in convolutional neural network (CNN). Alternatively, the maximum of the neighborhood can be used and it is called maximum pooling in CNN.

Again, the average pooling can be implemented in a two-step process:

Regional averaging by convolving with

$\displaystyle H_{2 \times 2}=\frac{1}{4}\left[ \begin{array}{cc} 1 & 1 \\ 1 & 1 \end{array} \right]$ (11)

to get

$\displaystyle f_{4 \times 4} * H_{2 \times 2}=f'_{4 \times 4} =\left[ \begin{ar... ...& 2.5 \\ 2.5 & 2.5 & 2.5 & 1.5 \\ 1.25 & 0.75 & 1.25 & 1.0 \end{array} \right]$ (12)
Subsampling $f'_{4 \times 4}$ to get

$\displaystyle \left[ \begin{array}{cc} 26 & 4.5 \\ 2.5 & 2.5 \end{array} \right]$ (13)