The learning problem

Next: Support Vector Machine Up: Support Vector Machines (SVM) Previous: Linear separation of a

The learning problem

Given a set training samples from two linearly separable classes P and N:

$\begin{displaymath}\{ ({\bf x}_k, y_k), k=1,\cdots,K \} \end{displaymath}$

where $y_k \in \{1,-1\}$ labels ${\bf x}_k$ to belong to either of the two classes. we want to find a hyper-plane in terms of ${\bf w}$ and

, that linearly separates the two classes.

Before the classifier is properly trained, the actual output $y'=sign(f({\bf x}))$ may not be the same as the desired output . There are four possible cases:

$\begin{displaymath} \begin{tabular}{c\vert l\vert l\vert l} \hline & Input $({\b... ...f x},y=-1)$ & $y'=-1= y $ & corrrect \hline \end{tabular} \end{displaymath}$

The weight vector ${\bf w}$ is updated whenever the result is incorrect (mistake driven):

If $({\bf x},y=-1)$ but $y'=1\ne y$ (case 2 above), then

$\begin{displaymath}{\bf x}^{new}={\bf w}^{old}+\eta y {\bf x}={\bf w}^{old}-\eta {\bf x} \end{displaymath}$

When the same ${\bf x}$ is presented again, we have

$\begin{displaymath}f({\bf x})={\bf x}^T{\bf w}^{new}+b ={\bf x}^T{\bf w}^{old}-\eta {\bf x}^T{\bf x}+b<{\bf x}^T{\bf w}^{old}+b \end{displaymath}$

The output $y'=sign(f({\bf x}))$ is more likely to be as desired. Here $0 < \eta < 1$ is the learning rate.
If $({\bf x},y=1)$ but $y'=-1\ne y$ (case 3 above), then

$\begin{displaymath}{\bf w}^{new}={\bf w}^{old}+\eta y {\bf x}={\bf w}^{old}+\eta {\bf x} \end{displaymath}$

When the same ${\bf x}$ is presented again, we have

$\begin{displaymath}f({\bf x})={\bf x}^T{\bf w}^{new}+b={\bf x}^T{\bf w}^{old}+\eta {\bf x}^T{\bf x}+b>{\bf x}^T{\bf w}^{old}+b \end{displaymath}$

The output $y'=sign(f({\bf x}))$ is more likely to be as desired.

Summarizing the two cases, we get the learning law:

$\begin{displaymath}\mbox{if} \;\;y f({\bf x})= y ({\bf x}^T{\bf w}^{old}+b)<0,\;\;\mbox{then}\;\; {\bf w}^{new}={\bf w}^{old}+\eta y {\bf x} \end{displaymath}$

The two correct cases (cases 1 and 4) can also be summarized as

$\begin{displaymath}y f({\bf x})= y ({\bf x}^T {\bf w}+b)\ge 0 \end{displaymath}$

which is the condition a successful classifier should satisfy.

We assume initially ${\bf w}=0$ , and the training samples are presented repeatedly, the learning law during training will yield eventually:

$\begin{displaymath} {\bf w}=\sum_{i=1}^K \alpha_i y_i {\bf x}_i \end{displaymath}$

where $\alpha_i>0$ . Note that ${\bf w}$ is expressed as a linear combination of the training samples. After receiving a new sample $({\bf x}_i,y_i)$ , vector ${\bf w}$ is updated by

		$\displaystyle \mbox{if} \;\;y_i f({\bf x}_i)=y_i ({\bf x}_i^T{\bf w}^{old}+b) =y_i\left(\sum_{j=1}^m \alpha_j y_j({\bf x}_i^T{\bf x}_j)+b\right)<0,$
		$\displaystyle \mbox{then}\;\; {\bf w}^{new}={\bf w}^{old}+\eta y_i {\bf x}_i =... ...+\eta y_i {\bf x}_i,\;\;\;\mbox{i.e.}\;\;\; \alpha_i^{new}=\alpha_i^{old}+\eta$

Now both the decision function

$\begin{displaymath}f({\bf x})={\bf x}^T {\bf w}+b=\sum_{j=1}^m \alpha_j y_j ({\bf x}^T {\bf x}_j)+b \end{displaymath}$

and the learning law

$\begin{displaymath}\mbox{if} \;\;y_i\left(\sum_{j=1}^m \alpha_j y_j({\bf x}_i^T{... ...)<0,\;\;\; \mbox{then}\;\; \alpha_i^{new}=\alpha_i^{old}+\eta \end{displaymath}$

are expressed in terms of the inner production of input vectors.

Next: Support Vector Machine Up: Support Vector Machines (SVM) Previous: Linear separation of a

Ruye Wang 2016-08-24