CrammerSinger WestonWatkins WangXue Bredensteiner
The SVM method is inherently a binary classifier, but it can be adapted to classification problems of more than two classes. In the following, we consider a general K-class classifier based on a training set , of which each sample is labeled by to indicate .
We first consider two straight forward and imperical methods for multiclass classification based directly on binary SVM.
Any unlabeled is classified to a class which receives the maximum votes out of the binary classifications between every pair of the classes.
This method converts a K-class problem () into K binary problems. First, we regroup the classes into two classes and and find the corresponding decision function representing quantitatively to what extent a given belongs to class (if ), instead of containing all remaining classes (if ). After this process has been carried out for all , the can be classified as below:
ifthen | (159) |
We further consider another method for multiclass classifiction by directly generalizing the binary SVM so that the decision margin between any two of the classes is maximized. First, we define a linear function for each of the classes:
(160) |
According to the classification rule in Eq. (161), the 2-D feature space is partitioned into three regions each for one of the classes by the decision boundaries , of which boundary between classes and is composed of all points satisfying . Same as in binary SVM, we further define for each of the classes two additional straight lines (planes or hyperplanes if ) parallel to , denoted by and represented by equations . Their distances to can be found as (same as in Eq. (70)):
(162) |
Now the multiclass classification problem can be formulated as
to find the parameters and for all
so that
is minimized
and thereby the decision margins for all boundaries
between and are maximized, under the constraint that
every training sample labeled by is correctly
classified with a decision margin of 1 (the distance between the
support vectors on both sides of is 2):
minimize: | |||
subject to: | (163) |
Similar to Eq. (115) in soft margin SVM, the
contraints of the optimization problem above can be relaxed by
including in each contraint an extra error term
,
which is to be minimized. Now the problem can be reformulated as
minimize: | |||
subject to: | |||
(164) |
The Lagrangian function of this constrained optimization problem is
(165) |
(166) |
(167) |
(168) |
(169) |
(170) |
(171) |
The training of the classifier is to find all weight vectors in based on the training dataset. To do so, we first define a cost function
////