Hierarchical (Tree) Classifiers

Both supervised classification and unsupervised clustering can be carried out in a hierarchical fashion to classify the input patterns or group them into clusters, very much like the hierarchy of biological classifications with different taxonomic ranks (domain, kingdom, phylum, class, order, family, genus, and species).

In the following we consider both the bottom-up and top-down methods for hierarchical clustering/classification.

Bottom-Up method

The bottom-up hierarchical classifier is trained based on $K$ classes $C_1,\cdots,C_K$, each containing $n_k$ ( $k=1,\cdots,K$) labeled patterns ${\bf x}\in C_k$.

  1. Compute the $K(K-1)/2$ pairwise Bhattacharyya distances between every two classes $C_i$ and $C_j$:

    $\displaystyle d_B(C_i,C_j)
=\frac{1}{4}({\bf m}_i-{\bf m}_j)^T\left[\frac{{\bf\...
...vert{\bf\Sigma}_i\right\vert\;\left\vert{\bf\Sigma}_j\right\vert)^{1/2}}\right]$ (194)

  2. Merge the two classes corresponding to the smallest $d_B$ to form a new class $C_i \cup C_j = C_k$, compute its mean and covariance:

    $\displaystyle {\bf m}_k=\frac{1}{n_i+n_j}[n_i {\bf m}_i+n_j {\bf m}_j]$ (195)

    and

    $\displaystyle {\bf\Sigma}_k=\frac{1}{n_i+n_j}
[n_i (\Sigma_i+({\bf m}_i-{\bf m}...
...i-{\bf m}_k)^T)+
n_j (\Sigma_j+({\bf m}_j-{\bf m}_k)({\bf m}_j-{\bf m}_k)^T ) ]$ (196)

    Delete the old classes $C_i$ and $C_j$. Now there are $K-1$ classes left.

  3. Compute the distance between the new class $C_k$ and all $K-2$ remaining classes.

  4. Repeat the previous steps until eventually all classes are merged into a single group containing all $K$ classes, the binary tree structure is thus obtained.

Top-Down method

Generate a binary tree by recursively partitioning all classes into two sub-groups with the maximum Bhattacharyya distance

  1. Compute the between-class scatter matrix ${\bf S}_B$ of the $K$ classes, find its maximum eigenvalue $\lambda_i$ and the corresponding eigenvectors ${\bf v}_i$;

  2. Project all data points onto ${\bf v}_1$:

    $\displaystyle y_n={\bf x}^T_n {\bf v}\;\;\;\;\;\;(n=1,\cdots,N)$ (197)

  3. Sort all data points $\{ y_1,\cdots,y_N\}$ along this 1-D space and partition them into two subgroups with maximum Bhattacharyya (between-group) distance.

  4. Carry out the steps above recursively to each of the two subgroups, until eventually every subgroup contains only one classes

Once the hierarchical structure is constructed by either the bottom-up or top-down method, we need to build a binary classifier at each node of structure, by which any given pattern is classified into either the left group $G_l$ or right group $G_r$:

Example

The hierarchical clustering method is applied to a dataset composed of seven normally distributed clusters each containing 25 sample vectors in an $N=4$ dimensional space. The PCA method is used to project the data in 4-D space into a 2-D space spanned by the first two principal components, as shown below:

TreeClustering1.png

The clustering result is shown below. Each column in the display represents the four components of a 4-D vector, color coded by a spectrum from red (low values) through green (middle) to blue (high values).

TreeClustering.png

See more examples in clustering analysis applied to gene data analysis in bioinformatics.

An example of this method is available here.