Competitive learning is a neural network algorithm for unsupervised
clustering, similar to the K-means algorithm considered previously.
The competitive learning takes place in a two-layer network composed
of an input layer of nodes that receives an input vector
as a point in the d-dimensional feature
space, and an output layer of
nodes that produces an output
pattern
representing the clusters of
interest. Each input variable
may take either a continuous
real value or a binary value depending on the specific application,
the outputs
are binary, of which only one is 1
while all others are 0 (one-hot), as the result of a winner-take-all
completition based on their activation values.
In each iteration of the learning process, a pattern
randomly chosen from the unlabeled dataset
is presented to the input
layer of the network, while each node of the output layer gets an
activation value, a linear combination of all such
inputs, i.e.,
the inner product of its weight vector
and the input
vector
:
(76) |
The output of the network is determined by a winner-take-all competition. The node in the output layer that is maximally activated will output 1 while all others output 0:
(77) |
(78) |
(79) |
(80) |
The competitive learing is illustrated in the figure above, where
the modified weight vector of the winner
is between
and the old
, i.e., the effect of the
learning process is that the winner's weight vector which is closest
to the current input pattern
is pulled even closer to it.
Here are the iterative steps in the competitive learning:
(81) |
(82) |
This iterative process can be more intuitively understood as shown
in the figure below. Every time a sample (a red dot) is
presented to the input layer, one of the output nodes will become
the winner and its weight vector
(an blue x) closest to
the current input
is drawn even closer to
. As
this process is carried out iteratively many times, each cluster
of similar sample points in the space will draw one of the weight
vectors towards its centeral area, and the corresponding output node
will always win and output 1 whenever any member of the cluster is
presented to the network in the future. In other words, after this
unsupervised learning, the feature space is partitioned into
regions each corresponding to one of the
clusters, represented
by one of the output nodes, whose weight vector is in the central
area of the region. We see that this process is very similar to the
K-means clustering algorithm, in which each cluster is represented
by the mean vector of all of its members.
It is possible in some cases that the data samples are not distributed in such a way that they form a set of clearly saperable clusters the feature space. In the extreme case, they may even form a continuum. In such cases, a small number of output nodes (possibly even just one) may become frequent winners, while others become “dead nodes” if they never win and consequently never get the chance to odify their weight vectors. To avoid such meaningless outcome, a mechanism is needed to ensure that all nodes have some chance to win. This can be implemented by including an extra bias term in the learning rule:
(83) |
(84) |
This process of competitive learning can also be viewed as a
vector quantization process, by which the continuous vector
space is quantized to become a set of discrete regions, called
Voronoi diagram (tessellation).
Vector quantization can be used for data compression. A cluster of
similar signals
in the vector space can all be approximately
represented by the weight vector
of one of a small number of
output nodes in the neighborhood of
, thereby the data
size can be significantly reduced.
The Matlab code for the iteration loop of the algorithm is listed below.
[d N]=size(X); % dataset containing N samples b=zeros(1,K); % bias terms for K clusters freq=zeros(1,K); % winning frequencies eta=0.9; % initial learning rate decay=0.99; % decay rate it=0; while eta>0.1 % main iterations it=it+1; W0=W; % initial weight vectors x=X(:,randi(N,1)); % randomly select an input sample dmin=inf; for k=1:K % find winner among all K output nodes d=norm(x-W(:,k))-b(k); if dmin>d dmin=d; m=k; % mth node is the winner end end w=W(:,m)+eta*(x-W(:,m)); % modify winner's weights W(:,m)=w/norm(w); % renormalize its weights (optional) share(m)=share(m)+1; % update share for winner b=c*(1/K-share/it); % modify biases for all nodes eta=eta*decay; % reduce learning rate end
Examples
The competitive learning method is applied to a set of simulated 2-D
data of four clusters. The network has input nodes and
output
nodes. The first 16 iterations are shown in the figure below, where open
circles represent the data points, while the solid back squares represent
the weight vectors. Also, the blud and red squares represent the weight
vector of the winner before and after its modification, visualized by the
straight line connecting the two squares. We see that the
weight
vectors randomly initialized are iteratively modified one at a time, and
after these 16 iterations, they each move to the center of one of the
four clusters. The saperability of the clustering result measured by
is 1.0. When the number of output nodes is
reduced to 3 and 2, the saperability is also reduced to 0.76 and 0.46,
respectively. When
output nodes are used, one of the four clusters
is represented by two output nodes.
The same method is applied to a set of 3-D data of eight clusters. The results are shown in the figure below, where the top and bottom rows show the data and weight vectors before and after clustering, respectively. The left column shows the 2-D space spanned by the first two principal components, while the right column shows the original 2-D space. Again we see that the weight vectors move from their random initial locations to the centers of the eight clusters as the result of the clustering.
The competitive learning algorithm is also applied to the dataset of
handwritten digits from 0 to 9, and the clustering result is shown
as the average of all samples in each cluster in
image
form shown in the figure below.