Next: Information conservation in feature
Up: Feature Selection
Previous: Optimal transformation for maximizing
The previous method only maximizes the between-class scatteredness
without taking into consideration the within-class scatteredness
,
or, equivalently, the total scatteredness
. As
it is possible that in the space after the transform
the between-class scatteredness
is maximized but so is the
total scatteredness
, while it is desirable to maximize
and minimize
at the same time in order to maximize
the separability. A better criterion for feature selection can be
as it relative separability.
In order to find the optimal transform matrix
,
we consider the maximization of the objective function
:
where
and
Note that as
is not symmetric (the product of two symmetric
matrices is in general not symmetric), the KLT method used for the maximization
of
considered previously can no longer be used to maximize
.
To find the transform matrix
so that
is maximized, we first
consider the case of
to find a single feature
,
obtained by an
vector
. The objective function
above becomes
This function of
is the
Rayleigh quotient
of the two symmetric matrices
and
. The optimal
transform vector
that maximizes
can be found by solving
the corresponding
generalized eigenvalue problem
or in matrix form:
where
, and
is the
ith eigenvector of
, and the
corresponding eigenvalue is
, which is maximized
by the eigenvector
(
) corresponding to
the greatest eigenvalue
.
By solving the above generalized eigen-equation, we get the eigenvector matrix
, which can be used as the linear transform matrix
by
which both
and
can be diagonalized at the same
time:
and we get
We can therefore construct a transform matrix
composed of the
eigenvectors
corresponding
to the
largest eigenvalues the Rayleigh quotient
representing the separability in the mth
dimension for
, (
), so that in the resulting
M-D space,
- the signal components in
are completely
decorrelated, i.e., they each carry some separability information independent
of others.
- the separability along each signal component
is
maximized
The generalized eigen-equation above can be further written as
where
is a diagonal matrix composed of
on its diagonal:
We see that using the
greatest eigenvalues
of
to
maximize
is equivalent to using the
greatest eigenvalues
of
to maximize
.
Next: Information conservation in feature
Up: Feature Selection
Previous: Optimal transformation for maximizing
Ruye Wang
2016-11-30