Soft Margin SVM

Next: 2-Norm Soft Margin Up: Support Vector Machines (SVM) Previous: Support Vector Machine

Soft Margin SVM

When the two classes are not linearly separable, the condition for the optimal hyper-plane can be relaxed by including an extra term:

$\begin{displaymath}y_i ({\bf x}_i^T{\bf w} +b) \ge 1-\xi_i,\;\;\;(i=1,\cdots,m) \end{displaymath}$

For minimum error, $\xi_i \ge 0$ should be minimized as well as $\vert\vert{\bf w}\vert\vert$ , and the objective function becomes:

	$\textstyle \mbox{minimize}$	$\displaystyle {\bf w}^T {\bf w}+C\sum_{i=1}^m \xi_i^p$
	$\textstyle \mbox{subject to}$	$\displaystyle y_i ({\bf x}_i^T {\bf w}+b) \ge 1-\xi_i, \;\;\;\mbox{and}\;\;\;\xi_i \ge 0;\;\;\;(i=1,\cdots,m)$

Here

is a regularization parameter that controls the trade-off between maximizing the margin and minimizing the training error. Small C tends to emphasize the margin while ignoring the outliers in the training data, while large C may tend to overfit the training data.

When , it is called 2-norm soft margin problem:

	$\textstyle \mbox{minimize}$	$\displaystyle {\bf w}^T {\bf w}+C\sum_{i=1}^m \xi_i^2$
	$\textstyle \mbox{subject to}$	$\displaystyle y_i ({\bf x}_i^T {\bf w}+b) \ge 1-\xi_i,\;\;\;(i=1,\cdots,m)$

Note that the condition $\xi_i \ge 0$ is dropped, as if $\xi_i<0$ , we can set it to zero and the objective function is further reduced.) Alternatively, if we let

, the problem can be formulated as

	$\textstyle \mbox{minimize}$	$\displaystyle {\bf w}^T {\bf w}+C\sum_{i=1}^m \xi_i$
	$\textstyle \mbox{subject to}$	$\displaystyle y_i ({\bf x}_i^T {\bf w}+b) \ge 1-\xi_i \;\;\;\mbox{and}\;\;\;\xi_i \ge 0;\;\;\;(i=1,\cdots,m)$

This is called 1-norm soft margin problem. The algorithm based on 1-norm setup, when compared to 2-norm algorithm, is less sensitive to outliers in training data. When the data is noisy, 1-norm method should be used to ignore the outliers.

Subsections

Next: 2-Norm Soft Margin Up: Support Vector Machines (SVM) Previous: Support Vector Machine

Ruye Wang 2016-08-24