Clustering Analysis

All classification algorithms discussed above are supervised in nature based on the assumped availability of some training data in which each of the data patterns ${\bf x}_n\in {\bf X}=[{\bf x}_1,\cdots,{\bf x}_N\}]$ is labeled by $y_n\in{\bf y}=[y_1,\cdots,y_n]^T$ , i.e., the class they each belong is known. However, when such training set is not available, there is still the need to explore some potential structure of the data in the form of a set of unknown number $K$ of clusters $\{C_1,\cdots,C_K\}$ , each composed of a set of similar data points close to each other in the feature space. This can be accomplished by the unsupervised method of clustering analysis, based only on the given dataset ${\bf X}=[{\bf x}_1,\cdots,{\bf x}_N]$ , without any additional prior knowledge, such as the labeling of the data samples. We will now consider a few algorithms for clustering analysis.

Subsections

K-means clustering
Gaussian mixture model
Mixture of Bernoulli
General EM Algorithm