next up previous
Next: Hierarchical Classifiers Up: classify Previous: The AdaBoost Algorithm

Unsupervised Classification - Clustering

A clustering algorithm groups the given samples, each represented as a vector ${\bf x}=[x_1,\cdots,x_N]^T $ in the N-dimensional feature space, into a set of clusters according to their spatial distribution in the N-D space. Clustering is an unsupervised classification as no a priori knowledge (such as samples of known classes) is assumed to be available.

The K-Means Algorithm

This method is simple, but it has the main drawback that the number of clusters $K$ needs to be estimated based on some prior knowledge, and it stays fixed through out the clustering process, even it may turn out later that more or fewer clusters would fit the data better. One way to resolve this is to carry out the algorithm multiple times with different $K$, and then evaluate each result by some separability criteria, such as $tr({\bf S}_T^{-1}{\bf S}_B)$.

The ISODATA Algorithm

In the K-means method, the number of clusters $K$ remains the same through out the iteration, although it may turn out later that more or fewer clusters would fit the data better. This drawback can be overcome in the ISODATA Algorithm (Iterative Self-Organizing Data Analysis Technique Algorithm), which allows the number of clusters to be adjusted automatically during the iteration by merging similar clusters and splitting clusters with large standard deviations. The algorithm is highly heuristic and based on the following pre-specified parameters:

Here are the steps of the algorithm:

  1. Choose randomly $K=K_0$ initial mean vectors $\{{\bf m}_1,\cdots,
{\bf m}_K\}$ from the data set.
  2. Assign each data point ${\bf x}$ to the cluster with closest mean:

    \begin{displaymath}
{\bf x} \in \omega_i\;\;\;\;if\;\;\;d({\bf x},{\bf m}_i)
=min\;\{\;d({\bf x},{\bf m}_1),\cdots,d({\bf x},{\bf m}_K) \}
\end{displaymath}

  3. Discard clusters containing too few members, i.e., if $n_j<n_{min} $, then discard $\omega_j$ and reassign its members to other clusters. $K \leftarrow K-1$.
  4. For each cluster $\omega_j$ $(j=1,\cdots,K)$, update the mean vector

    \begin{displaymath}
{\bf m}_j=\frac{1}{n_j} \sum_{{\bf x} \in \omega_j} {\bf x},
\end{displaymath}

    and the covariance matrix:

    \begin{displaymath}
{\bf\Sigma}_j=\frac{1}{n_j}\sum_{{\bf x}\in\omega_j}({\bf x}-{\bf m}_j)
({\bf x}-{\bf m}_j)^T
\end{displaymath}

    The diagonal elements are the variances $\sigma_1^2,\cdots,\sigma_N^2$ along the $N$ dimensions.

  5. If $K \leq K_0/2$ (too few clusters), go to Steps 6 for splitting;

    else if $K>2K_0$ (too many clusters), go to Steps 7 for merging;

    else go to Step 8.

  6. (split) For each cluster $\omega_j$ $(j=1,\cdots,K)$, find the greatest covariance $\sigma_m^2=max\{\sigma_1^2,\cdots,\sigma_N^2 \}$

    If $\sigma_m^2 > \sigma_{max}^2$ and $n_j > 2 n_{min}$, then split ${\bf m}_j$ into two new cluster centers

    \begin{displaymath}
{\bf m}_j^+={\bf m}_j+\sigma_m,\;\;\;\;\;\;{\bf m}_j^-={\bf m}_j-\sigma_m
\end{displaymath}

    Alternatively, carry out PCA to find the variance corresponding to the greatest eigenvalue $\lambda_{max}$ and split the cluster along the direction along the corresponding eigenvector.

    Set $K \leftarrow K+1$.

    Go to Step 8.

  7. (merge) Compute the $K(K-1)/2$ pairwise Bhattacharyya distances between every two cluster mean vectors:

    \begin{displaymath}
d_B(\omega_i, \omega_j)=\frac{1}{4}({\bf m}_i-{\bf m}_j)^T
...
...ight\vert)^{1/2}}\right],
\;\;\;\;\;(1\le i,j\le K,\;\;i>j)
\end{displaymath}

    For each of the distances satisfying $d_B(\omega_i,\omega_j)<d_{min}$, merge of the corresponding clusters to form a new one:

    \begin{displaymath}
{\bf m}_i=\frac{1}{n_i+n_j} [ n_i {\bf m}_i+n_j{\bf m}_j]
\end{displaymath}

    Delete ${\bf m}_j$, set $K \leftarrow K-1$.

  8. Terminate if maximum number of iterations is reached. Otherwise go to Step 2.

As the number of clusters $K$ can be dynamically adjusted in the process, the Isodata algorithm is more flexible than the K-means algorithm. But all of the many more parameters listed previously have to be chosen empirically.


next up previous
Next: Hierarchical Classifiers Up: classify Previous: The AdaBoost Algorithm
Ruye Wang 2016-11-30