To show how the convolution (in the context of CNNs) can be viewed as matrix-vector multiplication, let's suppose that we want to apply a $3 \times 3$ kernel to a $4 \times 4$ input, with no padding and with unit stride.

Here's an illustration of this convolutional layer (where, in blue, we have the input, in dark blue, the kernel, and, in green, the feature map or output of the convolution).

Now, let the kernel be defined as follows

$$
\mathbf{W} =
\begin{bmatrix}
w_{0, 0} & w_{0, 1} & w_{0, 2} \\
w_{1, 0} & w_{1, 1} & w_{1, 2} \\
w_{2, 0} & w_{2, 1} & w_{2, 2}
\end{bmatrix}
\in
\mathbb{R}^{3 \times 3}
$$

Similarly, let the input be defined as

$$
\mathbf{I} =
\begin{bmatrix}
i_{0, 0} & i_{0, 1} & i_{0, 2} & i_{0, 3} \\
i_{1, 0} & i_{1, 1} & i_{1, 2} & i_{1, 3} \\
i_{2, 0} & i_{2, 1} & i_{2, 2} & i_{2, 3} \\
i_{3, 0} & i_{3, 1} & i_{3, 2} & i_{3, 3} \\
\end{bmatrix}
\in
\mathbb{R}^{4 \times 4}
$$

Then the convolution above (without padding and with stride 1) can be computed as a matrix-vector multiplication as follows. First, we redefine the kernel $\mathbf{W}$ as a sparse matrix $\mathbf{W}' \in \mathbb{R}^{4 \times 16}$ (which is a *circulant matrix* because of its circular nature) as follows.

$$
{\scriptscriptstyle
\mathbf{W}' =
\begin{bmatrix}
w_{0, 0} & w_{0, 1} & w_{0, 2} & 0 & w_{1, 0} & w_{1, 1} & w_{1, 2} & 0 & w_{2, 0} & w_{2, 1} & w_{2, 2} & 0 & 0 & 0 & 0 & 0 \\
0 & w_{0, 0} & w_{0, 1} & w_{0, 2} & 0 & w_{1, 0} & w_{1, 1} & w_{1, 2} & 0 & w_{2, 0} & w_{2, 1} & w_{2, 2} & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & w_{0, 0} & w_{0, 1} & w_{0, 2} & 0 & w_{1, 0} & w_{1, 1} & w_{1, 2} & 0 & w_{2, 0} & w_{2, 1} & w_{2, 2} & 0 \\
0 & 0 & 0 & 0 & 0 & w_{0, 0} & w_{0, 1} & w_{0, 2} & 0 & w_{1, 0} & w_{1, 1} & w_{1, 2} & 0 & w_{2, 0} & w_{2, 1} & w_{2, 2}
\end{bmatrix}
}
$$
Similarly, we reshape the input $\mathbf{I}$ as a 16-dimensional vector $\mathbf{I}' \in \mathbb{R}^{16}$.

$$
{\scriptstyle
\mathbf{I}' =
\begin{bmatrix}
i_{0, 0} & i_{0, 1} & i_{0, 2} & i_{0, 3} & i_{1, 0} & i_{1, 1} & i_{1, 2} & i_{1, 3} & i_{2, 0} & i_{2, 1} & i_{2, 2} & i_{2, 3} & i_{3, 0} & i_{3, 1} & i_{3, 2} & i_{3, 3}
\end{bmatrix}^T
}
$$

Then the convolution of $\mathbf{W}$ and $\mathbf{I}$, that is

$$\mathbf{W} \circledast \mathbf{I} = \mathbf{O} \in \mathbb{R}^{2 \times 2},$$
where $\circledast$ is the convolution operator, is equivalently defined as $$\mathbf{W}' \cdot \mathbf{I}' = \mathbf{O}' \in \mathbb{R}^{4},$$ where $\cdot$ is the matrix-vector multiplication operator. The produced vector $\mathbf{O}'$ can then be reshaped as a $2 \times 2$ feature map.

You can easily verify that this representation is correct by multiplying e.g. the 16-dimensional input vector $\mathbf{I}'$ with the first row of $\mathbf{W}'$ to obtain the top-left entry of the feature map.

$$w_{0, 0} i_{0, 0} + w_{0, 1} i_{0, 1} + w_{0, 2} i_{0, 2} + 0 i_{0, 3} + w_{1, 0} i_{1, 0} + w_{1, 1} i_{1, 1} + w_{1, 2}i_{1, 2} + 0 i_{1, 3} + w_{2, 0} i_{2, 0} + w_{2, 1}i_{2, 1} + w_{2, 2} i_{2, 2} + 0 i_{2, 3} + 0 i_{3, 0} + 0 i_{3, 1} + 0 i_{3, 2} + 0 i_{3, 3} = \\
w_{0, 0} i_{0, 0} + w_{0, 1} i_{0, 1} + w_{0, 2} i_{0, 2} + w_{1, 0} i_{1, 0} + w_{1, 1} i_{1, 1} + w_{1, 2}i_{1, 2} + w_{2, 0} i_{2, 0} + w_{2, 1}i_{2, 1} + w_{2, 2} i_{2, 2} = \\
\mathbf{O}'_{0} \in \mathbb{R} ,$$ which is equivalent to an element-wise multiplication of $\mathbf{W}$ with the top-left $3 \times 3$ sub-matrix of the input followed by a summation over all elements (i.e. convolution), that is

$$
\sum \left(
\begin{bmatrix}
w_{0, 0} & w_{0, 1} & w_{0, 2} \\
w_{1, 0} & w_{1, 1} & w_{1, 2} \\
w_{2, 0} & w_{2, 1} & w_{2, 2}
\end{bmatrix}
\odot
\begin{bmatrix}
i_{0, 0} & i_{0, 1} & i_{0, 2} \\
i_{1, 0} & i_{1, 1} & i_{1, 2} \\
i_{2, 0} & i_{2, 1} & i_{2, 2}
\end{bmatrix}
\right) = \mathbf{O}_{0, 0} = \mathbf{O}'_{0} \in \mathbb{R},
$$
where $\odot$ is the element-wise multiplication and $\sum$ is the summation over all elements of the resulting matrix.

The advantage of this representation (and computation) is that back-propagation can be computed more easily by just transposing $\mathbf{W}'$, i.e. with $\mathbf{W}'^T$.