next up previous
Next: About this document ... Up: Support Vector Machines (SVM) Previous: 1-Norm Soft Margin

Kernel Mapping

The algorithm above converges only for linearly separable data. If the data set is not linearly separable, we can map the samples ${\bf x}$ into a feature space of higher dimensions:

\begin{displaymath}
{\bf x} \longrightarrow \phi({\bf x})
\end{displaymath}

in which the classes can be linearly separated. The decision function in the new space becomes:

\begin{displaymath}
f({\bf x})=\phi({\bf x})^T {\bf w}+b=
\sum_{j=1}^m \alpha_j y_j (\phi({\bf x})^T \phi({\bf x}_j))+b
\end{displaymath}

where

\begin{displaymath}{\bf w}=\sum_{j=1}^m \alpha_j y_j \phi({\bf x}_j) \end{displaymath}

and $b$ are the parameters of the decision plane in the new space. As the vectors ${\bf x}_i$ appear only in inner products in both the decision function and the learning law, the mapping function $\phi({\bf x})$ does not need to be explicitly specified. Instead, all we need is the inner product of the vectors in the new space. The function $\phi({\bf x})$ is a kernel-induced implicit mapping.

Definition: A kernel is a function that takes two vectors ${\bf x}_i$ and ${\bf x}_j$ as arguments and returns the value of the inner product of their images $\phi({\bf x}_i)$ and $\phi({\bf x}_j)$:

\begin{displaymath}
K({\bf x}_1,{\bf x}_2)=\phi({\bf x}_1)^T\phi({\bf x}_2)
\end{displaymath}

As only the inner product of the two vectors in the new space is returned, the dimensionality of the new space is not important.

The learning algorithm in the kernel space can be obtained by replacing all inner products in the learning algorithm in the original space with the kernels:

\begin{displaymath}
f({\bf x})=\phi({\bf x})^T {\bf w}+b=\sum_{j=1}^m \alpha_j y_j K({\bf x},{\bf x}_j)+b
\end{displaymath}

The parameter $b$ can be found from any support vectors ${\bf x}_i$:

\begin{displaymath}
b=y_i-\phi({\bf x}_i)^T {\bf w}=y_i-\sum_{j=1}^m \alpha_j y_...
...f x}_j))=y_i-\sum_{j=1}^m \alpha_j y_j K({\bf x}_i,{\bf x}_j)
\end{displaymath}

Example 0: linear kernel

Assume ${\bf x}=[x_1,\cdots,x_n]^T$, ${\bf z}=[z_1,\cdots,z_n]^T$,


\begin{displaymath}K({\bf x},{\bf z})={\bf x}^T {\bf z}=\sum_{i=1}^n x_1z_1 \end{displaymath}

Example 1: polynomial kernels

Assume ${\bf x}=[x_1, x_2]^T$, ${\bf z}=[z_1,z_2]^T$,

$\displaystyle K({\bf x},{\bf z})=({\bf x}^T{\bf z})^2$ $\textstyle =$ $\displaystyle (x_1z_1+x_2z_2)^2=x_1^2z_1^2+x_2^2z_2^2+2x_1z_1x_2z_2$  
  $\textstyle =$ $\displaystyle <(x_1^2,x_2^2,\sqrt{2}x_1x_2),(z_1^2,z_2^2,\sqrt{2}z_1z_2)>
=\phi({\bf x})^T \phi({\bf z})$  

This is a mapping from a 2-D space to a 3-D space. The order can be changed from 2 to general d.

Example 2:

\begin{displaymath}K({\bf x},{\bf z})=e^{-\vert\vert{\bf x}-{\bf z}\vert\vert^2/2\sigma^2} \end{displaymath}

Example 3:

\begin{displaymath}K({\bf x},{\bf z})=K({\bf x},{\bf z})K({\bf x},{\bf x})^{-1/2}K({\bf z},{\bf z})^{-1/2} \end{displaymath}


next up previous
Next: About this document ... Up: Support Vector Machines (SVM) Previous: 1-Norm Soft Margin
Ruye Wang 2016-08-24