Next: Hierarchical Classifiers Up: classify Previous: The AdaBoost Algorithm

Unsupervised Classification - Clustering

A clustering algorithm groups the given samples, each represented as a vector ${\bf x}=[x_1,\cdots,x_N]^T$ in the N-dimensional feature space, into a set of clusters according to their spatial distribution in the N-D space. Clustering is an unsupervised classification as no a priori knowledge (such as samples of known classes) is assumed to be available.

The K-Means Algorithm

Step 1. Randomly choose initial cluster centers ${\bf m}_1^{(0)},\;{\bf m}_2^{(0)},....,{\bf m}_K^{(0)}$ from the given sample set (e.g., the first samples of the sample set). Set ;
Step 2. Assign each of the samples ${\bf x}$ to one of the clusters according to the distance $d_p({\bf x},{\bf m}_i)$ between the sample and the mean vectors ${\bf m}_i$ of the clusters:

$\begin{displaymath} {\bf x} \in \omega_j\;\;\;if\;\;\; d_p({\bf x},{\bf m}_j^{... ...in\;\{\;d_p({\bf x},{\bf m}_i^{(l)}),\;\;\; i=1,\cdots,K \} \end{displaymath}$

where $\omega_i$ denotes the ith cluster of samples whose center is ${\bf m}_i^{(l)}$ at the lth iteration;
Step 3. Update the cluster centers to get ${\bf m}_j^{(l+1)}$ so that the sum of the distances $\sum_{{\bf x} \in \omega_j} \vert\vert{\bf x}-{\bf m}\vert\vert^2$ from all points ${\bf x}\in\omega_j$ in the jth class to the new center ${\bf m}_j^{l+1}$ is minimized. Here $\vert\vert{\bf x}-{\bf m}\vert\vert^2=d_2^2({\bf x}-{\bf m})=\sum_{i=1}^N(x_i-m_i)^2$ is the Euclidean distance $d_2({\bf x},{\bf m})$ squared. To do so, we let

$\begin{displaymath} \frac{d}{d{\bf m}}\left(\sum_{{\bf x} \in \omega_j} \vert\v... ...{{\bf x} \in \omega_j} \vert\vert{\bf x}-{\bf m}\vert\vert=0 \end{displaymath}$

Solving this we get

$\begin{displaymath} {\bf m}_j^{(l+1)}=\frac{1}{K_j} \sum_{{\bf x} \in \omega_j} {\bf x},\;\;\;\;\; (j=1,\cdots,K) \end{displaymath}$

where $K_j^{(l)}$ is the number of samples currently in $\omega_j^{(l)}$ at the lth iteration.
Step 4. Terminate if the algorithm has converged, i.e., the membership of each pattern is no longer changed:

$\begin{displaymath} {\bf m}_j^{(l+1)}={\bf m}_j^{(l)}\;\;\;\;(j=1,\cdots,K) \end{displaymath}$

or a preset maximum number of iterations is exceeded.
Otherwise, $l \leftarrow l+1$ , goto Step 2.

This method is simple, but it has the main drawback that the number of clusters

needs to be estimated based on some prior knowledge, and it stays fixed through out the clustering process, even it may turn out later that more or fewer clusters would fit the data better. One way to resolve this is to carry out the algorithm multiple times with different

, and then evaluate each result by some separability criteria, such as $tr({\bf S}_T^{-1}{\bf S}_B)$ .

The ISODATA Algorithm

In the K-means method, the number of clusters remains the same through out the iteration, although it may turn out later that more or fewer clusters would fit the data better. This drawback can be overcome in the ISODATA Algorithm (Iterative Self-Organizing Data Analysis Technique Algorithm), which allows the number of clusters to be adjusted automatically during the iteration by merging similar clusters and splitting clusters with large standard deviations. The algorithm is highly heuristic and based on the following pre-specified parameters:

: desired number of clusters;
$n_{min}$ : minimum number of samples in each cluster (for discarding clusters);
$\sigma^2_{max}$ : maximum variance (for splitting clusters).
$d_{min}$ : minimum pairwise distance (for merging clusters).

Here are the steps of the algorithm:

Choose randomly initial mean vectors $\{{\bf m}_1,\cdots, {\bf m}_K\}$ from the data set.
Assign each data point ${\bf x}$ to the cluster with closest mean:

$\begin{displaymath} {\bf x} \in \omega_i\;\;\;\;if\;\;\;d({\bf x},{\bf m}_i) =min\;\{\;d({\bf x},{\bf m}_1),\cdots,d({\bf x},{\bf m}_K) \} \end{displaymath}$
Discard clusters containing too few members, i.e., if $n_j<n_{min}$ , then discard $\omega_j$ and reassign its members to other clusters. $K \leftarrow K-1$ .
For each cluster $\omega_j$ $(j=1,\cdots,K)$ , update the mean vector

$\begin{displaymath} {\bf m}_j=\frac{1}{n_j} \sum_{{\bf x} \in \omega_j} {\bf x}, \end{displaymath}$

and the covariance matrix:

$\begin{displaymath} {\bf\Sigma}_j=\frac{1}{n_j}\sum_{{\bf x}\in\omega_j}({\bf x}-{\bf m}_j) ({\bf x}-{\bf m}_j)^T \end{displaymath}$

The diagonal elements are the variances $\sigma_1^2,\cdots,\sigma_N^2$ along the dimensions.
If $K \leq K_0/2$ (too few clusters), go to Steps 6 for splitting;
else if (too many clusters), go to Steps 7 for merging;
else go to Step 8.
(split) For each cluster $\omega_j$ $(j=1,\cdots,K)$ , find the greatest covariance $\sigma_m^2=max\{\sigma_1^2,\cdots,\sigma_N^2 \}$
If $\sigma_m^2 > \sigma_{max}^2$ and $n_j > 2 n_{min}$ , then split ${\bf m}_j$ into two new cluster centers

$\begin{displaymath} {\bf m}_j^+={\bf m}_j+\sigma_m,\;\;\;\;\;\;{\bf m}_j^-={\bf m}_j-\sigma_m \end{displaymath}$

Alternatively, carry out PCA to find the variance corresponding to the greatest eigenvalue $\lambda_{max}$ and split the cluster along the direction along the corresponding eigenvector.
Set $K \leftarrow K+1$ .
Go to Step 8.
(merge) Compute the pairwise Bhattacharyya distances between every two cluster mean vectors:

$\begin{displaymath} d_B(\omega_i, \omega_j)=\frac{1}{4}({\bf m}_i-{\bf m}_j)^T ... ...ight\vert)^{1/2}}\right], \;\;\;\;\;(1\le i,j\le K,\;\;i>j) \end{displaymath}$

For each of the distances satisfying $d_B(\omega_i,\omega_j)<d_{min}$ , merge of the corresponding clusters to form a new one:

$\begin{displaymath} {\bf m}_i=\frac{1}{n_i+n_j} [ n_i {\bf m}_i+n_j{\bf m}_j] \end{displaymath}$

Delete ${\bf m}_j$ , set $K \leftarrow K-1$ .
Terminate if maximum number of iterations is reached. Otherwise go to Step 2.

As the number of clusters can be dynamically adjusted in the process, the Isodata algorithm is more flexible than the K-means algorithm. But all of the many more parameters listed previously have to be chosen empirically.

Next: Hierarchical Classifiers Up: classify Previous: The AdaBoost Algorithm

Ruye Wang 2016-11-30