Both supervised classification and unsupervised clustering can be carried out in a hierarchical fashion to classify the input patterns or group them into clusters, very much like the hierarchy of biological classifications with different taxonomic ranks (domain, kingdom, phylum, class, order, family, genus, and species).
The hierarchical clustering can be obtained in either a top-down or bottom up manner.
All patterns in the data set are initially treated as a single cluster as the root of the tree, which is then subdivided (split) into a set of two or more smaller clusters, each represented as a node in the tree structure. This process is carried out recursively until eventually each cluster contains only one pattern, represented as a leaf node of the tree.
every pattern in the data set is initially treated as a cluster as a leaf node of the tree, which will then be merged to form larger clusters. Again, this process is carried out recursively until eventually all patterns are merged into a single cluster at the root of the tree.
In either the top-down or the bottom-up method, the specific method for the splitting or merging at each tree node is based on certain similarity measurement such as the distance between two clusters. The resulting tree structure obtained by either method can then be truncated at any level between the root and the leaf nodes to obtain a set of clusters, depending on the desired number and sizes of these clusters.
If labeled training data are available, both the top-down and the bottom-up clustering methods can also be used in the training stage of the supervised classification methods, with the only difference that now the splitting or merging is applied to labeled classes instead of individual patterns, and each leaf node represents one of the classes, rather than a single pattern. After the tree structure is obtained, the training is complete and any unlabeled pattern can be classified at the tree root and then subsequently the tree nodes at lower levels until it is classified into one of the leaf nodes of the tree, corresponding to a specific class.
This hierarchical classification method is especially useful when the number of classes and the number of feature are both large. In this case it may be very difficult to select a subset of features good for separating all classes for a single-level classifier, by which all classes need to be classified at the same time, requiring, most likely, all features. However, for a tree classifier, since each node is a two-class classifier, it is possible to select a small number of features that are most relevant and suitable to represent the two subsets of classes.
In the following we consider both the bottom-up and top-down methods for hierarchical clustering/classification.
Bottom-Up method
The bottom-up hierarchical classifier is trained based on classes , each containing ( ) labeled patterns .
(194) |
(195) |
(196) |
Top-Down method
Generate a binary tree by recursively partitioning all classes into two sub-groups with the maximum Bhattacharyya distance
(197) |
Once the hierarchical structure is constructed by either the bottom-up or top-down method, we need to build a binary classifier at each node of structure, by which any given pattern is classified into either the left group or right group :
(198) |
Example
The hierarchical clustering method is applied to a dataset composed of seven normally distributed clusters each containing 25 sample vectors in an dimensional space. The PCA method is used to project the data in 4-D space into a 2-D space spanned by the first two principal components, as shown below:
The clustering result is shown below. Each column in the display represents the four components of a 4-D vector, color coded by a spectrum from red (low values) through green (middle) to blue (high values).
See more examples in clustering analysis applied to gene data analysis in bioinformatics.
An example of this method is available here.