Next: Performance Measurements
Up: classify
Previous: Distance Measurements
In feature selection, we need to evaluate how separable a set of classes are
in an M-dimensional feature space by some criteria.
- Total number of samples:
where
is the number of samples in class
- Overall mean vector:
where
is the a priori probability of class
- Scatter matrix of class
(same as the covariance matrix of
the class):
- Within-class scatter matrix:
- Between-class scatter matrix:
- Total scatter matrix:
We can show that
, i.e., the total scatteredness is the
sum of within-class scatteredness and between-class scatteredness.
* Proof:
We can use the traces of these scatter matrices as some scalar criteria
to measure the separability between the classes:
and
We see that
is the weighted average of the Euclidean distances between
and
for all
classes, and
is the weighted average
of the Euclidean distances between
and
for all
classes.
It is obviously desirable to maximize
while at the same time also minimize
, so that the classes are maximally separated for the classification to be
most effectively carried out. We can therefore construct a new scalar criterion
to be used in feature selection
or, equivalently, due to the relationship
,
Next: Performance Measurements
Up: classify
Previous: Distance Measurements
Ruye Wang
2016-11-30