next up previous
Next: Perceptron Learning Up: Introduction to Neural Networks Previous: Hebb's Learning

Hopfield Network

Hopfield network (Hopfield 1982) is recurrent network composed of a set of $n$ nodes and behaves as an auto-associator (content addressable memory) with a set of $K$ patterns $\{ {\bf y}_k,\;\;\;\;k=1,\cdots,K\}$ stored in it. The network is first trained in the fashion similar to that of the Hebbian learning, and then used as an auto-associator. When the incomplete or noisy version of one of the stored patterns is presented as the input, the complete pattern will be generated as the output after some iterative computation.

recurrentnet1.png

Presented with a new input pattern (e.g., noisy, incomplete version of some pre-stored pattern) ${\bf x}_k$:

\begin{displaymath}
{\bf x}_k=[x_1^{(k)},\cdots, x_n^{(k)}]^T\;\;\;\;\;\;\;\;x_i \in\{-1,\;1\}
\end{displaymath}

the network responds by iteratively updating its output

\begin{displaymath}
{\bf y}=[y_1,\cdots, y_n]^T\;\;\;\;\;\;\;\;\;\;y_i \in\{-1,\;1\}
\end{displaymath}

until finally convergence is reached when one of the stored patterns which most closely resembles ${\bf x}$ is produced as the output.

The training process

The training process is essentially the same as the Hebbian learning, except here the two associated patterns in each pair are the same (self-association), i.e., ${\bf x}_k={\bf y}_k$, and the input and output patterns all have the same dimension $m=n$.

After the network is trained by Hebbian learning its weight matrix is obtained as the sum of the outer-products of the $K$ patterns to be stored:

\begin{displaymath}
{\bf W}_{n\times n}=\frac{1}{n}\sum_{k=1}^K {\bf y}_k {\bf ...
...n^{(k)} \end{array} \right]
[ y_1^{(k)}, \cdots, y_n^{(k)} ]
\end{displaymath}

the weight connecting node i and node j is

\begin{displaymath}w_{ij}=\frac{1}{n}\sum_{k=1}^K y_i^{(k)} y_j^{(k)}=w_{ji} \end{displaymath}

And we assume no self-connection exists

\begin{displaymath}w_{ii}=0 \;\;\;\;\;(i=1,\cdots,n) \end{displaymath}

After the weight matrix ${\bf W}$ is obtained by the training process, the network can be used to recognize any input pattern representing some noisy and incomplete version of one of those pre-stored patterns. When an input pattern ${\bf x}$ is presented to the $n$ input nodes, the outputs nodes are updated iteratively and asynchronously. Only one of the $n$ nodes randomly selected is updated at a time:

\begin{displaymath}
x'_i=sgn \left(\sum_{j=1}^n w_{ij}x_j\right)=\left\{ \begin{...
... \mbox{if}\;\;\;\sum_{j=1}^n w_{ij}x_j < 0 \end{array} \right.
\end{displaymath}

where $x'_i=x_i(t+1)$ represents the new output of the ith node while $x_i=x_i(t)$ represents the old one during the iteration. We will now show that the iteration will always converge to produce one of the stored patterns as the output.

Energy function

We define the energy associated with a pair of nodes i and j as

\begin{displaymath}e_{ij}\stackrel{\triangle}{=}-w_{ij}x_ix_j \end{displaymath}

and the interaction between these two nodes can be summarized by this table


\begin{displaymath}\begin{tabular}{ r \vert r \vert\vert c \vert c} \hline
$x_i...
...$ \\
1 & 1 & $e_{ij}<0$ & $e_{ij}>0$  \hline
\end{tabular} \end{displaymath}

Two observations: In other words, low energy level of $e_{ij}$ corresponds to a stable interaction between $x_i$ and $x_j$, and high energy level corresponds to an unstable interaction. We next define the total energy function $E({\bf x})$ of all $n$ nodes in the network as the sum of all the pair-wise energies:

\begin{displaymath}
E({\bf x}) \stackrel{\triangle}{=} \sum_{i>j} e_{ij}=-\sum_{...
...}{=} -\frac{1}{2}\sum_{i=1}^{n} \sum_{j=1}^{n} w_{ij} x_i x_j
\end{displaymath}

Note that the last equation is due to $w_{ij}=w_{ji}$. Again, lower energy function level corresponds to more stable condition of the network, and vice versa.

The iteration and convergence

Now we show that the total energy $E({\bf x})$ always decreases whenever the state of any node changes. Assume $x_k$ has just been updated, i.e., $x'_k \neq x_k$ ($x_k=\pm 1$ but $x'_k=\mp 1$), while all others remain the same $x'_{l \neq k} = x_{l \neq k}$. The energy before $x_k$ changes state is

\begin{displaymath}
E({\bf x})= -\frac{1}{2}[ \sum_{i\neq k}\sum_{j\neq k}w_{ij}...
...um_{i\neq k}\sum_{j\neq k}w_{ij}x_ix_j
- \sum_i w_{ik}x_ix_k
\end{displaymath}

and the energy after $x_k$ changes state is

\begin{displaymath}E'({\bf x})= -\frac{1}{2} \sum_{i\neq k}\sum_{j\neq k}w_{ij}x_ix_j
- \sum_i w_{ik}x_ix'_k
\end{displaymath}

The energy difference is

\begin{displaymath}\bigtriangleup E\stackrel{\triangle}{=}E'({\bf x})-E({\bf x})
=-(x'_k-x_k) \sum_i w_{ik}x_i=(x_k-x'_k) \sum_i w_{ik}x_i
\end{displaymath}

Consider two cases: As $\bigtriangleup E({\bf x}) \leq 0$ is always true throughout the iteration, and the magnitude of $E({\bf x})$ is finite, we conclude that the energy function $E({\bf x})$ will eventually reach one of the minima of the ``energy landscape'' and, therefore, the iteration will always converge.

energylandscape.gif attractors.gif

Retrieval of stored patterns

Here we show the $K$ pre-stored patterns correspond to the minima of the energy function. First recall the weights of the network are obtained by Hebbian learning:

\begin{displaymath}w_{ij}=\frac{1}{n}\sum_{k=1}^K y_i^{(k)} y_j^{(k)} \end{displaymath}

The energy function $E({\bf x})$ can now be written as
$\displaystyle E({\bf x})$ $\textstyle =$ $\displaystyle -\frac{1}{2}\sum_{i=1}^{n} \sum_{j=1}^{n} w_{ij} x_i x_j =
-\frac...
...\frac{1}{2n}\sum_{k=1}^K \left[\sum_i \sum_j y_i^{(k)} x_i y_j^{(k)} x_j\right]$  
  $\textstyle =$ $\displaystyle -\frac{1}{2n}\sum_{k=1}^K \left[ \left(\sum_i y_i^{(k)} x_i\right...
...k)} x_i\right)^2
= -\frac{1}{2n}\sum_{k=1}^K \left({\bf y}_k^T {\bf x}\right)^2$  

If ${\bf x}$ is different from any of the stored patterns, all $K$ terms of the summation will contribute to the sum only moderately. However, if ${\bf x}$ is the same as any of the stored patterns, their inner product reaches maximum, and thus causing the total energy to be minimized to reach one of the minima. In other words, the patterns stored in the net correspond to the local minima of the energy function. i.e., these patterns become attractors.

Note that it is possible to have other local minima, called spurious states, which do not represent any of the stored patterns, i.e., the associative memory is not perfect.

hopfieldexample.gif


next up previous
Next: Perceptron Learning Up: Introduction to Neural Networks Previous: Hebb's Learning
Ruye Wang 2015-08-13