CrammerSinger WestonWatkins WangXue Bredensteiner
The SVM method is inherently a binary classifier, but it can be
adapted to classification problems of more than two classes. In
the following, we consider a general K-class classifier based on a
training set
, of which each
sample
is labeled by
to indicate
.
We first consider two straight forward and imperical methods for multiclass classification based directly on binary SVM.
Any unlabeled is classified to a class which receives
the maximum votes out of the
binary classifications
between every pair of the
classes.
This method converts a K-class problem () into K binary
problems. First, we regroup the
classes
into
two classes
and
and
find the corresponding decision function
representing
quantitatively to what extent a given
belongs to class
(if
), instead of
containing all
remaining
classes (if
). After this process
has been carried out for all
, the
can be
classified as below:
if |
(159) |
We further consider another method for multiclass classifiction
by directly generalizing the binary SVM so that the decision
margin between any two of the classes is maximized. First,
we define a linear function for each of the
classes:
(160) |
According to the classification rule in Eq. (161),
the 2-D feature space is partitioned into three regions each for
one of the classes by the decision boundaries
, of which boundary
between classes
and
is composed of all points
satisfying
. Same as in binary SVM,
we further define for each of the
classes two additional
straight lines (planes or hyperplanes if
) parallel to
, denoted by
and represented by equations
. Their distances
to
can be found as (same as in Eq. (70)):
(162) |
Now the multiclass classification problem can be formulated as
to find the parameters and
for all
so that
is minimized
and thereby the decision margins for all boundaries
between
and
are maximized, under the constraint that
every training sample
labeled by
is correctly
classified with a decision margin of 1 (the distance between the
support vectors on both sides of
is 2):
minimize: |
|||
subject to: |
(163) |
Similar to Eq. (115) in soft margin SVM, the
contraints of the optimization problem above can be relaxed by
including in each contraint an extra error term
,
which is to be minimized. Now the problem can be reformulated as
minimize: |
|||
subject to: |
|||
(164) |
The Lagrangian function of this constrained optimization problem is
(165) |
(166) |
(167) |
(168) |
(169) |
(170) |
(171) |
The training of the classifier is to find all weight vectors in
based on the training dataset. To do so, we first define
a cost function
////