Hebb's Learning

Donald Hebb (Canadian Psychologist) speculated in 1949 that

“When neuron A repeatedly and persistently takes part in exciting neuron B, the synaptic connection from A to B will be strengthened.”
Simultaneous activation of neurons leads to pronounced increases in synaptic strength between them. In other words, "Neurons that fire together, wire together. Neurons that fire out of sync, fail to link". So a Hebbian network can be used as an associator which will establish the association between two sets of patterns $\{{\bf x}_k,\;\;k=1,\cdots,K \}$ and $\{{\bf y}_k,\;\;k=1,\cdots,K\}$.

The classical conditioning (Pavlov, 1927) could be explained by Hebbian learning:

$\displaystyle (F \rightarrow S) \Rightarrow (F \cap B \rightarrow S) \Rightarrow
(B \rightarrow S)$
Synaptic connections between pattern B and pattern S are strengthened as both are repeatedly excited simultaneously.

The structure

twolayernet.gif

The Hebbian network is a supervised method with an input layer of $n$ nodes that take an input ${\bf x}=[x_1,\cdots,x_n]^T$ and an output layer of $m$ nodes that generate output ${\bf y}=[y_1,\cdots,y_m]^T$. Each output node $y_i$ is connected to each of the $n$ input nodes $x_j$ by a weight $w_{ij}$:

$\displaystyle y_i=\sum_{j=1}^n w_{ij} x_j\;\;\;\;\;(i=1,\cdots,m)
$
In matrix form, we have
$\displaystyle {\bf y}_{m\times 1}={\bf W}_{m\times n}{\bf x}_{n\times 1}
$

The learning law

$\displaystyle w_{ij}^{new}=w_{ij}^{old}+\eta x_j y_i\;\;\;\;(i=1,\cdots,m,\;j=1,\cdots,n)
$
or in matrix form:
$\displaystyle {\bf W}^{new}={\bf W}^{old}+\eta {\bf y} {\bf x}^T
$
Here $\eta$ is the learning rate, a parameter controlling how fast the weights are modified. The reasoning for this learning law is that when both $x_j$ and $y_i$ are high (activated), the weight $w_{ij}$ (synaptic connectivity) between them is enhanced according to Hebbian learning.

Training

If we assume $w_{ij}=0$ initially, $\eta=1$ and a set of $K$ pairs of patterns $\{ ({\bf x}_k,{\bf y}_k),\;\;k=1,\cdots,K \}$ are presented repeatedly in random order during training, we have

$\displaystyle w_{ij}=\sum_{k=1}^Kx_j^{(k)}y_i^{(k)}\;\;\;\;(i=1,\cdots,m,\;\;j=1,\cdots,n)
$
or in matrix form, the weight matrix is the sum of the outer-products of all $K$ pairs of patterns:
$\displaystyle {\bf W}_{m\times n}=\sum_{k=1}^K {\bf y}_k {\bf x}_k^T = \sum_{k=...
... \\ \vdots \\ y_m^{(k)} \end{array} \right]
[ x_1^{(k)}, \cdots, x_n^{(k)} ]
$

Classification

When presented with one of the patterns ${\bf x}_l$, the network will produce the output:

$\displaystyle {\bf y}={\bf W}{\bf x}_l=\left(\sum_{k=1}^K {\bf y}_k {\bf x}_k^T...
...={\bf y}_l({\bf x}_l^T{\bf x}_l)+\sum_{k\neq l}{\bf y}_k({\bf x}_k^T{\bf x}_l)
$
To interpret the output pattern ${\bf y}$, we first consider the ideal case where the following two conditions are satisfied:

If these conditions are true, then the response of the network to ${\bf x}_l$ will be

$\displaystyle {\bf y}={\bf y}_l({\bf x}_l^T{\bf x}_l)+\sum_{k\neq l}{\bf y}_k({\bf x}_k^T{\bf x}_l)={\bf y}_l
$
because all other terms are zero: ${\bf x}_k^T {\bf x}_l=0,\;\;(k \ne l)$. In other words, a one-to-one correspondence relationship between ${\bf x}_k$ and ${\bf y}_k$ has been established for all $k=1,\cdots,K$. In non-ideal cases, the summation term (called cross talk) is non-zero and we have an error $\vert\vert{\bf y}-{\bf y}_l\vert\vert > 0$.

Although two patterns ${\bf x}=B$ and ${\bf y}=S$ don't have causal relationship (as $S$ is caused by another pattern $F$), if they always appear simultaneously, an association relationship can be established in the Hebbian network.