next up previous
Next: About this document ... Up: bp Previous: The Structure and Purpose

The Competitive Learning Law

For mathematical convenience, we assume

During training, all input patterns are presented to the input layer one at a time in a random order. Every time a pattern $X$ is presented to the input layer of the network, the weights are modified by the following learning law:

\begin{displaymath}W_i^{new}=W_i^{old}+\eta(X-W_i^{old})u_i=W_i^{old}+\triangle W_i u_i \end{displaymath}

where

\begin{displaymath}u_i=\left\{ \begin{array}{ll} 1 & \mbox{if $y_i$\ is the winner} \\
0 & \mbox{otherwise} \end{array}
\right. \end{displaymath}


\begin{displaymath}\triangle W_i \stackrel{\triangle}{=}\eta(X-W_i^{old}) \end{displaymath}

and $0<\eta<1$ is the learning rate.

This learning law can now be written as

\begin{displaymath}\left\{ \begin{array}{ll}
\mbox{for winner} & W_j^{new}=(1-\...
...sers} & W_i^{new}=W_i^{old}\;\;(i \neq j)
\end{array} \right. \end{displaymath}

We note that the new weight vector is still normalized:

\begin{displaymath}\sum_{i=1}^n w_{ij}^{new}=(1-\eta)\sum_{i=1}^n w_{ij}^{old}
+\eta \sum_{i=1}^n x_i=(1-\eta)+\eta=1
\end{displaymath}

We see that the winner's weight vector is modified so that it moves closer towards the current input vector $X$ while all other weights are unchanged. Since for an output node $y_j$ to win, its weight vector $W_j$ has to satisfy

\begin{displaymath}W_j^TX=\vert W_j\vert\,\vert X\vert\,cos(\phi) > W_i^TX\;\;\;\mbox{for any $i\neq j$} \end{displaymath}

where $\phi$ is the angle between the two vectors $X$ and $W_{ij}$, in other words, the distance between $X$ and $W_j$

\begin{displaymath}\vert W_j-X\vert^2=\vert W_j\vert^2+\vert X\vert^2-2\vert W_j\vert\,\vert X\vert\,cos(phi) \end{displaymath}

must be smaller than that between $X$ and any other $W_i$, we realize that the learning law will always pulls the weight vector closest to the current input vector even closer towards the input vector, so that the corresponding winning node will be more likely to win whenever a pattern similar to the current $X$ is presented in the future. The overall effect of such a learning process is to pull the weight vector of each output node towards the center of a cluster of similar input patterns and the corresponding node will always win the competition whenever a pattern in the cluster is presented. If there exist c clusters in the feature space, they will each be represented by an output node. The remaining $m-c$ output nodes may never win and therefore do not represent any cluster.


next up previous
Next: About this document ... Up: bp Previous: The Structure and Purpose
Ruye Wang 2002-12-09