The support vector machine (SVM) is a supervised binary classifier
trained by a dsta set
of
which data sample
, a vector in the d-dimensional feature
space, belongs to either class
if labeled by
or class
if labeled by
. The result of the training process is
the decision boundaries or decision plane
in the
feature space, described by the linear decision equation:
(62) |
if |
(63) |
The initial setup of the SVM algorithm seems the same as the
method of linear regression
as a linear binary classifier when the linear regression function
is thresholded by zero to become
. But here for the SVM, the parameters
and
are determined in such a way that the corresponding decision
plane separates the training samples belonging to the two classes
optimally, so that its distance to the closest samples on either
side, called support vectors, is maximized. Also, as kernel
mapping can be applied to the SVM, it can also be used for nonlinear
classification.
We rewrite the decision equation
as
(64) |
(66) |
We desire to find the optimal decision plane that separates
the training samples belonging to the two different classes, assumed
to be linearly separable, in such a way that its distance to the
support vectors, denoted by
, is maximized:
(67) |
(69) |
For these planes to correctly separate all samples in the training set
, they have to satisfy the
following two conditions:
(71) |
Example:
The straight line in 2D space shown above, denoted by ,
is described by the following linear equation
(74) |
(75) |
These two points
and
with equal distance to
, the straight line
,
can be assumed to be two support vectors
on either side of
, and
(76) |
(77) |
(78) |
(79) |
For reasons to be discussed later, instead of directly solving the constrained minimization problem in Eq. (73), now called the primal problem, we actually solve the dual problem. Specifically, we first construct the Lagrangian function of the primal problem:
where
We next find the minimum (or infimum) as the lower bound of the primal
function
in Eq. (80), by setting to
zero its partial derivatives with respect to both
and
:
(82) |
(84) |
We can now find the normal direction of the optimal decision
plane based on Eq. (83):
Having found
, we can also find
based on the equality of Eq. (72) for any of the
support vectors:
(87) |
We note that both and
depend only on the support vectors
on planes
and
, corresponding to a positive
,
while all remaining samples corresponding to
are farther
away from the decision plane
, behind either
or
. We
see that the SVM is solely based on those support vectors in the
training set, once they are identified during the training process,
all other non-support vectors are irrelevant.
Having obtained and
as shown above as the training process,
any unlabeled sample
can be classified into either of the two
classes depending on whether the following decision function is greater
or smaller than zero:
(90) |
(91) |
The SVM algorithm for binary classification can now by summarized as the following steps:
We prefer to solve this dual problem not only because it is easier
than the original primal problem, but also, more importantly, for
the reason that all data points appear only in the form of an inner
product
in the dual problem, allowing the
kernel method to be used, as discussed later.
The Matlab code for the essential part of the algorithm is listed
below, where X
and y
are respectively the data array
composed of training vectors
and
their corresponding labelings
, and
IPmethod
is a function that implements the
interior point method
for solving a general QP problem of minimizing
subject to the linear
equality constraints
and
.
[X y]=getData; % get the training data [m n]=size(X); Q=(y'*y).*(X'*X); % compute the quadratic matrix c=-ones(n,1); % coefficients of linear term A=y; % coefficient matrix and b=0; % constants of linear equality constraints alpha=0.1*ones(n,1); % initial guess of solution [alpha mu lambda]=IPmethod(Q,A,c,b,alpha); % solve QP to find alphas % by interior point method I=find(abs(alpha)>10^(-3)); % indecies of non-zero alphas asv=alpha(I); % non-zero alphas Xsv=X(:,I); % support vectors ysv=y(:,I); % their labels w=sum(repmat(ysv.*asv,m,1).*Xsv,2); % normal vector (not needed) bias=mean(ysv-w'*Xsv); % bias
Example: The training set contains two classess of 2-D points with Gaussian distributions generated based on the following mean vectors and covariance matrices:
(92) |
(93) |