No Title

Tree Classifiers

When both the number of classes c and the number of features n are large, the feature selection and classification discussed before encounter difficulties because

feature selection is no longer effective as it is difficult to find m features from n which are suitable for separating all the c classes (some features may be good from some classes but not good for others).
classification is costly as a large number of features are necessary.

The solution is to do classification in several steps implemented as a tree classifier. One method to design the tree classifier in the bottom-up merge algorithm described in the following steps, which is considered as the training process.

From the training samples of each class $\omega_i\;\;\;(i=1,\cdots,c)$ , estimate the mean and covariance:

$\begin{displaymath}M_i=\frac{1}{N_i}\sum_{X \sim \omega_i} X \end{displaymath}$

and

$\begin{displaymath}\Sigma_i=\frac{1}{N_i}\sum_{X \sim \omega_i} (X-M_i)(X-M_)^T \end{displaymath}$
Compute pair-wise Bhattacharrya distances between classes for all $i \neq j$ (c(c-1)/2 of them in total):

$\begin{displaymath}D_{ij}=\frac{1}{4}(M_i-M_j)^T\left[\frac{\Sigma_i+\Sigma_j}{2... ...gma_i\right\vert\;\left\vert\Sigma_j\right\vert)^{1/2}}\right] \end{displaymath}$
Merge the two classes with the smallest D_ij to form a combined class: $\omega_i \cup \omega_j = \omega_{new}$ with the following mean and covariance:

$\begin{displaymath}M_{new}=\frac{1}{N_i+N_j}[N_i M_i+N_j M_j] \end{displaymath}$

and

$\begin{displaymath}\Sigma_{new}=\frac{1}{N_i+N_j} [N_i (\Sigma_i+(M_i-M_{mew})(M_i-M_{mew})^T)+ N_j (\Sigma_j+(M_j-M_{mew})(M_j-M_{mew})^T ) ] \end{displaymath}$

Delete the old classes $\omega_i$ and $\omega_j$ .
Compute the distance between the new class $\omega_{new}$ and all the remaining classes (excluding $\omega_i$ and $\omega_j$ ).
Repeat the above steps until eventually all classes are merged into one and a binary tree structure is thus obtained.
At each node of the tree build a 2-class classifier to be used to classify a sample into one of the two children G_l and G_r representing the two groups of classes. According to the classification method used, we find the discriminant functions D_l(X) and D_r(X).
At each node of the tree adaptively select features that are best for separating the two groups of classes G_l and G_r. Any feature selection method can be used here, such as between-class distance (Mahananobis distance), or orthogonal transform (KLT, DFT, WHT, etc.). A small number of selected features may be needed as here only two groups of classes need to be distinguished.

After the classifier is built and trained, the classification is carried out in the following manner:

A testing sample X of unknown class enters the classifier at the root of the tree and is classified to either the left or the right child of the node according to

$\begin{displaymath}X \sim \left\{ \begin{array}{ll} G_l & if D_l(X) > D_r(X) \\ G_r & if D_l(X) < D_r(X) \end{array} \right. \end{displaymath}$

This process is repeated recursively in the same fashion at the child node (either G_l or G_r) and its child and so on, until eventually X reaches a leaf node corresponding to a single class, to which the sample X is therefore classified. In this classification method.

About this document ...

Next: About this document ...

Ruye Wang
1999-06-10