The main goal of pattern classification is to classify a
set of patterns, objects of interest each described
by features represented as a d-dimensional vector
in the feature space, into
one of some
classes
. In supervised
classification, we are to partition the feature space into
regions each corresponding to one of the classes based on
the provided training set
, of
which each training sample in
is labeled by the corresponding component in
. For example, if
,
then
. However, different types of labeling
can be used depending on the specific classification algorithms,
such as binary labeling used in certain regression algorithms
considered previously.
When the dimensionality of the feature space is high, there
is the need to reduce it significantly to
while still
maintaining most of the information relevant to the task of
classification, i.e., the separability of the classes. This is
called feature selection, which can be done by either
selecting
features directly from the
original ones
(there are
ways to do so), or generating
new features as certain linear combinations of the
original ones. In either case, the separability of the data are
to be maximally maintained in the resulting space spanned by the
features, so that the classification can be carried out both
efficiently and effectively in the much lower dimensional feature
space.