As one of the most important tasks in machine learning, pattern classification is to classify some objects of interest, generically referred to as patterns and described by a set of features or attributes that characterizes the patterns, to one of some classes or categories. Each pattern is represented by a vector (or a point) in a d-dimensional feature space, where is a variable for the measurement of the ith feature. Symbolically, the classes can be denoted , and a pattern belonging to the kth class is denoted by . Pattern classification can therefore be considered as the process by which the d-dimensional feature space is partitioned into regions each corresponding to one of the classes. The boundaries between these regions, called decision boundaries, are to be determined by the specific algorithm, called a classifier, used for the classification.
Pattern classification can be carried out as either a supervised or unsupervised learning process, depending on the availability of a training set containing patterns of known class identities. Specifically, the training set contains a set of patterns in , labeled respectively by the corresponding component in representing the class identities of the corresponding patterns in some way. For example, we can use to indicate . In the special case when , there are only two classes and , and the classifier becomes binary based on training pattern , each labeled by if or if .
We assume there are training samples all labeled to belong to , and in total samples in the training set. If the training set is a fair representation of all patterns of different classes in the entire dataset, then can be treated as an estimate of the a priori probability that any randomly selected pattern happens to belong to class , without any prior knowledge of the pattern.
Once a classifier is properly trained according to a specific algorithm based on the traning set, the feature space is partitioned into regions corresponding to the different classes and any unlabeled pattern of unknown class as a vector in the feature space can be classified into one of the classes.
Supervised classification can be considered as a process of establishing the corresponding relationship between the patterns treated as the independent or input variables to the classifier, and the classes the input patterns belong, treated as the dependent or output variables. Therefore regression and classification can be considered as the same supervised learning process: modeling the relationship between the data points in and their corresponding labelings (or targets) in . This process is regression when the labelings take continous real values, but it is classification when they are discrete categorical representing different classes. Some methods in the previous chapter on regression analysis are actually used as classifiers, such as logistic and solfmax regressions, and the method of Gaussian process can also be used for classification.
If the training data of labeled patterns are unavailable, various unsupervised learning methods can be used to assign each unlabeled patterns into one of the different groups, called clusters, according to its position in the feature space, based on the overall spatial structure and distribution of the data set in the feature space. This process is called clustering analysis or simply clustering.
There exsit a variety of methods for learning, including both regression and classification, based on different models assumed. One way to characterize these methods is to put them all in a probabilistic framework, in terms of the probabilities of the given dataset incuding data points and the corresponding labeling . Now a method can be categorized into either of the following two groups:
A discriminative method stablishes a model that maps a data pont to a class labeling . Such a model is either a pure or traditional discriminative model if it is deterministic and aims to fit the training set in some optimal way, or a conditional model if it is probabilistic in nature, such as the conditional probability . The model parameters in are obtained in some optimal way based on the training set. Then prediction can be made for any unlabeled in terms of the corresponding . As a discriminative method aims at finding the decision boundaries between different classes based on the training set, only those data samples that are close to the boundaries play an important role, while all other samples farther away from the boundaries are mostly ignored.
Typical discrimiative methods include:
A generative method first assumes certain probabilistic model for the underlying structure of the observed data, such the joint probability based on all data samples available. It then estimates the parameter of the model based on the training dataset, and obtains the conditional probability (by Bayes' theorem) based on which a prediction can be made for any unlabeled to find the corresponding . As in general the generative method is based on some probabilistic model of the data, it can be used for unsupervised learning as well as supervised.
Typical generative methods include:
Here are some comparisons between the two approaches: