next up previous
Next: Support Vector Machine Up: Support Vector Machines (SVM) Previous: Linear separation of a

The learning problem

Given a set $K$ training samples from two linearly separable classes P and N:

\begin{displaymath}\{ ({\bf x}_k, y_k), k=1,\cdots,K \} \end{displaymath}

where $y_k \in \{1,-1\}$ labels ${\bf x}_k$ to belong to either of the two classes. we want to find a hyper-plane in terms of ${\bf w}$ and $b$, that linearly separates the two classes.

Before the classifier is properly trained, the actual output $y'=sign(f({\bf x}))$ may not be the same as the desired output $y$. There are four possible cases:


\begin{displaymath}
\begin{tabular}{c\vert l\vert l\vert l} \hline
& Input $({\b...
...f x},y=-1)$ & $y'=-1= y $ & corrrect  \hline
\end{tabular} \end{displaymath}

The weight vector ${\bf w}$ is updated whenever the result is incorrect (mistake driven):

Summarizing the two cases, we get the learning law:

\begin{displaymath}\mbox{if} \;\;y f({\bf x})= y ({\bf x}^T{\bf w}^{old}+b)<0,\;\;\mbox{then}\;\;
{\bf w}^{new}={\bf w}^{old}+\eta y {\bf x} \end{displaymath}

The two correct cases (cases 1 and 4) can also be summarized as

\begin{displaymath}y f({\bf x})= y ({\bf x}^T {\bf w}+b)\ge 0 \end{displaymath}

which is the condition a successful classifier should satisfy.

We assume initially ${\bf w}=0$, and the $K$ training samples are presented repeatedly, the learning law during training will yield eventually:

\begin{displaymath}
{\bf w}=\sum_{i=1}^K \alpha_i y_i {\bf x}_i
\end{displaymath}

where $\alpha_i>0$. Note that ${\bf w}$ is expressed as a linear combination of the training samples. After receiving a new sample $({\bf x}_i,y_i)$, vector ${\bf w}$ is updated by
    $\displaystyle \mbox{if} \;\;y_i f({\bf x}_i)=y_i ({\bf x}_i^T{\bf w}^{old}+b)
=y_i\left(\sum_{j=1}^m \alpha_j y_j({\bf x}_i^T{\bf x}_j)+b\right)<0,$  
    $\displaystyle \mbox{then}\;\; {\bf w}^{new}={\bf w}^{old}+\eta y_i {\bf x}_i
=...
...+\eta y_i {\bf x}_i,\;\;\;\mbox{i.e.}\;\;\;
\alpha_i^{new}=\alpha_i^{old}+\eta$  

Now both the decision function

\begin{displaymath}f({\bf x})={\bf x}^T {\bf w}+b=\sum_{j=1}^m \alpha_j y_j ({\bf x}^T {\bf x}_j)+b \end{displaymath}

and the learning law

\begin{displaymath}\mbox{if} \;\;y_i\left(\sum_{j=1}^m \alpha_j y_j({\bf x}_i^T{...
...)<0,\;\;\;
\mbox{then}\;\; \alpha_i^{new}=\alpha_i^{old}+\eta \end{displaymath}

are expressed in terms of the inner production of input vectors.


next up previous
Next: Support Vector Machine Up: Support Vector Machines (SVM) Previous: Linear separation of a
Ruye Wang 2016-08-24