The Hopfield network
has only a single layer of
nodes. It behaves as an auto-associator
(content addressable memory) with a set of
n-D vector
patterns
stored in it. This is a
supervised method which is first trained and then used as an auto-associator.
This is a recurrent network, in the sense that the outputs of the
nodes are fed back in an iterative process. When an incomplete or noisy
version of one of the stored patterns is presented as the input, the
complete pattern will be generated as the output after some iterations.
During the classification stage after the network is trained based on the
Hebbian learning, the network can take an input pattern
, a noisy
or incomplete version of one of the pre-stored pattern:
and responds to it by iteratively updating its output
until finally convergence is reached when one of the stored patterns
which most closely resembles
is produced as the output.
Training:
The training process is essentially the same as the Hebbian learning, except
here the two associated patterns in each pair are the same (self-association),
i.e.,
, and the input and output patterns all have the
same dimension
.
After the network is trained by Hebbian learning its weight matrix is obtained
as the sum of the outer-products of the
patterns to be stored:
the weight connecting node i and node j is
We assume no self-connection exists
Classification:
When the weight matrix
is obtained by the training process, the
network can be used to recognize an input pattern representing some noisy
and incomplete version of one of those pre-stored patterns. When a pattern
is presented to the input nodes, the outputs nodes are updated
iteratively and asynchronously. Only one of the
nodes
randomly selected is updated at a time:
where
represents the new output of the ith node while
represents the old one during the iteration. We now show
that the iteration will always converge to produce one of the stored
patterns as the output.
Energy function
We define the energy associated with a pair of nodes i and j as
and the interaction between these two nodes can be summarized by this table
Two observations:
- Whenever the two nodes i and j are reinforcing each other's state, the
energy
is negative. For example, in the first and last rows when
,
will help keep
to stay at the same state
in the iteration above, and the energy
is negative. Also in the two
middle rows when
,
will help keep
to stay at the
same state
, and the energy
is negative.
- Whenever the two nodes i and j are trying to change each other's state,
the energy is positive. For example, in the first and last rows when
,
will tend to reverse
from its previous state
to
, and the energy
is positive. Also in the two
middle rows when
,
will tend to reverse
from its
previous state
to
, and the energy
is positive.
In summary, low energy level of
corresponds to a stable interaction
between
and
, i.e., they tend to remain unchanged, and high energy
level corresponds to an unstable interaction, i.e., they tend to change their
states.
We next define the total energy function
of all
nodes
in the network as the sum of all pair-wise energies:
Note that the last equation is due to
and
. Again,
lower energy function level corresponds to more stable condition of the network,
and vice versa.
The iteration and convergence
Now we show that the total energy
always decreases whenever
the state of any node changes. Assume
has just been changeed, i.e.,
(
but
), while all others remain the
same
. The energy before
changes state is
and the energy after
changes state is
The energy difference is
Consider two cases:
- Case 1: if
but
, then
, we
have
and
.
- Case 2: if
but
, then
, we
have
and
.
As in either case,
is always true
throughout the iteration, and the magnitude of
is finite, we
conclude that the energy function
will eventually reach one
of the minima of the “energy landscape” and, therefore, the iteration
will always converge.
Retrieval of stored patterns
Now we show the
pre-stored patterns correspond to the minima of the
energy function. First recall the weights of the network are obtained by
Hebbian learning:
The energy function
can be written as
If
is different from (ideally orthogonal to) any of the stored
patterns, all
terms of the summation will contribute to the sum only
moderately (ideally if all patterns are orthogonal, all terms are zero).
However, if
is the same as any one of the stored patterns, their
inner product reaches maximum, causing the total energy to be minimized to
reach one of the minima. In other words, the patterns stored in the net
correspond to the local minima of the energy function. i.e., these patterns
become attractors.
Note that it is possible to have other local minima, called spurious
states, which do not represent any of the stored patterns, i.e., the
associative memory is not perfect.