next up previous
Next: Support Vector Machine Up: Support Vector Machines (SVM) Previous: Linear separation of a

The learning problem:

Given a set $m$ training samples from two linearly separable classes P and N:

\begin{displaymath}\{ ({\bf x}_i, y_i), i=1,\cdots,m \} \end{displaymath}

where $y_i \in \{1,-1\}$ labels ${\bf x}_i$ to belong to either of the two classes. we want to find a hyper-plane in terms of $W$ and $b$, that linearly separates the two classes.

Before ${\bf w}$ is properly trained, the actual output $y'=sign(f({\bf x}))$ may not be the same as the desired output $y$. There are four possible cases:


\begin{displaymath}\begin{tabular}{c\vert l\vert l\vert l} \hline
& Input $({\bf...
...f x},y=-1)$ & $y'=-1= y $ & corrrect  \hline
\end{tabular} \end{displaymath}

The weight vector ${\bf w}$ is updated whenever the result is incorrect (mistake driven):

Summarizing the two cases, we get the learning law:

\begin{displaymath}\mbox{if} \;\;y f({\bf x})= y ({\bf x}^T{\bf w}^{old}+b)<0,\;\;\mbox{then}\;\;
{\bf w}^{new}={\bf w}^{old}+\eta y {\bf x} \end{displaymath}

The two correct cases can also be summarized as

\begin{displaymath}y f({\bf x})= y ({\bf x}^T {\bf w}+b)\ge 0 \end{displaymath}

which is the condition a successful classifier should satisfy.

We assume initially ${\bf w}=0$, and the $m$ training samples are presented repeatedly, the learning law during training will yield eventually:

\begin{displaymath}{\bf w}=\sum_{j=1}^m \alpha_j y_j {\bf x}_j \end{displaymath}

where $\alpha_i>0$. Note that ${\bf w}$ is expressed as a linear combination of the training samples. After receiving a new sample $({\bf x}_i,y_i)$, vector ${\bf w}$ is updated by
    $\displaystyle \mbox{if} \;\;y_i f({\bf x}_i)=y_i ({\bf x}_i^T{\bf w}^{old}+b)=y_i(\sum_{j=1}^m \alpha_j y_j({\bf x}_i^T{\bf x}_j)+b)<0,$  
    $\displaystyle \mbox{then}\;\; {\bf w}^{new}={\bf w}^{old}+\eta y_i {\bf x}_i=\s...
...
+\eta y_i {\bf x}_i,\;\;\;\mbox{i.e.}\;\;\;
\alpha_i^{new}=\alpha_i^{old}+\eta$  

Now both the decision function

\begin{displaymath}f({\bf x})={\bf x}^T {\bf w}+b=\sum_{j=1}^m \alpha_j y_j ({\bf x}_i^T {\bf x}_j)+b \end{displaymath}

and the learning law

\begin{displaymath}\mbox{if} \;\;y_i(\sum_{j=1}^m \alpha_j y_j({\bf x}_i^T{\bf x...
...)<0,\;\;\;
\mbox{then}\;\; \alpha_i^{new}=\alpha_i^{old}+\eta \end{displaymath}

are expressed in terms of the inner production of input vectors.


next up previous
Next: Support Vector Machine Up: Support Vector Machines (SVM) Previous: Linear separation of a
Ruye Wang 2015-08-13