Next: The Structure and Purpose Up: bp Previous: The Gradient Descent Method

The Learning Law

Every time a particular pattern pair is presented at the input layer, all the weights are modified in the following manner.

Define error energy at the output layer:

$\begin{displaymath} E_p=\frac{1}{2}\sum_{i=1}^m (y_{pi}-y'_{pi})^2=\frac{1}{2} \sum_{i=1}^m (\delta_{pi}^{o})^2 \end{displaymath}$

where

$\begin{displaymath} \delta_{pi}^{o} \stackrel{\triangle}{=}y_{pi}-y'_{pi} \end{displaymath}$

$y_{pi}$ is the desired output, and $y'_{pi}$ is the actual output

$\begin{displaymath} y'_{pi}=f(net_{pi})=f(\sum_{j=1}^l w^{o}_{ij} z_{pj}+T_i) \end{displaymath}$
Find gradient of in output weight space $\{w_{ij}^{o}\}$ :

$\begin{displaymath} \frac{\partial E_p}{\partial w_{ij}^{o}} =\frac{\partial E... ...\partial w_{ij}^{o}} =-\delta_{pi}^{o}\;f'(net_{pi})\;z_{pj} \end{displaymath}$
Update $w_{ij}^{o}$ to minimize with gradient descent method:

$\begin{displaymath} w_{ij}^{o(new)} =w_{ij}^{o(old)}+\Delta w_{ij}^{o} =w_{ij}^{o(old)} -\eta \frac{\partial E_p}{\partial w_{ij}^{o}} \end{displaymath}$

where

$\begin{displaymath} \Delta w_{ij}^{o}\stackrel{\triangle}{=}-\eta \frac{\partial E_p}{\partial w_{ij}^{o}} \end{displaymath}$

and $\eta$ is the learning rate.
Relate to the hidden layer weights $w_{jk}^{h}$

$\displaystyle E_p$ $\textstyle =$ $\displaystyle \frac{1}{2}\sum_{i=1}^m(y_{pi}-y'_{pi})^2$

$\textstyle =$ $\displaystyle \frac{1}{2}\sum_{i=1}^m(y_{pi}-f(\sum_{j=1}^lw_{ij}^{o}z_{pj}+T_i))^2$

$\textstyle =$ $\displaystyle \frac{1}{2}\sum_{i=1}^m(y_{pi}-f(\sum_{j=1}^lw_{ij}^{o}f(net_{pj})+T_i))^2$

$\textstyle =$ $\displaystyle \frac{1}{2}\sum_{i=1}^m(y_{pi}-f(\sum_{j=1}^lw_{ij}^{o}f(\sum_{k=1}^nw_{jk}^{h}x_{pk}+T_j)+T_i))^2$
Find gradient of in hidden weight space $\{w_{jk}^{h}\}$ :

$\displaystyle \frac{\partial E_p}{\partial w_{jk}^{h}}$ $\textstyle =$ $\displaystyle -\sum_{i=1}^m(y_{pi}-y'_{pi})\frac{\partial y'_{pi}}{\partial net... ...artial z_{pj}}{\partial net_{pj}} \frac{\partial net_{pj}}{\partial w_{jk}^{h}}$

$\textstyle =$ $\displaystyle -\sum_{i=1}^m \delta_{pi}^{o} f'(net_{pi})w_{ij}^{o}\;f'(net_{pj})x_{pk}$

$\textstyle =$ $\displaystyle -f'(net_{pj})x_{pk} \delta_{pj}^{h}$

where

$\begin{displaymath} \delta_{pj}^{h} \stackrel{\triangle}{=}\sum_{i=1}^m \delta_{pi}^{o} f'(net_{pi})w_{ij}^{o} \end{displaymath}$
Update $w_{ij}^{h}$ to minimize with gradient descent method

$\begin{displaymath} w_{jk}^{h(new)} =w_{jk}^{h(old)}+\Delta w_{jk}^{h} =w_{jk}^{h(old)} -\eta \frac{\partial E_p}{\partial w_{jk}} \end{displaymath}$

where

$\begin{displaymath} \Delta w_{jk}^{h}\stackrel{\triangle}{=} -\eta \frac{\part... ...{\partial w_{ji}^{h}} =\eta f'(net_{pj}) x_k \delta_{pj}^{h} \end{displaymath}$

Summary of Back Propagation Training

The following is for one training pattern pair $\{X_p, Y_p\}$ .

Apply $X_p=(x_{p1}, x_{p2}, ... , x_{pn})^t$ to the input nodes;
Compute net input to hidden nodes

$\begin{displaymath} net_{pj}=\sum_{k=1}^{n} w_{jk}^{h} x_{pk}+T_j \end{displaymath}$
Compute output from hidden nodes

$\begin{displaymath} z_{pj}=f(net_{pj}) \end{displaymath}$
Compute net input to output nodes

$\begin{displaymath} net_{pi}=\sum_{j=1}^{l} w_{ij}^{o} z_{pj} + T_i \end{displaymath}$
Compute output from output nodes

$\begin{displaymath} y'_{pi}=f(net_{pi}) \end{displaymath}$
Find error terms for all output nodes (not quite the same as defined previously)

$\begin{displaymath} \delta_{pi}^{o}=f'(net_{pi}) (y_{pi}-y'_{pi}) \;\;\;(i=1,...,m) \end{displaymath}$

where $Y_p=(y_{p1},y_{p2},...,y_{pm})^t$ is the desired output for .
Find error terms for all hidden nodes (not quite the same as defined previously)

$\begin{displaymath} \delta_{pj}^{h}=f'(net_{pj}) \sum_{i=1}^{m} \delta_{pi}^{o} w_{ij}^o \;\;\;(j=1,...,l) \end{displaymath}$
Update weights to output nodes

$\begin{displaymath} w_{ij}^o \leftarrow w_{ij}^o+\eta \delta_{pi}^o z_{pj} \end{displaymath}$
Update weights to hidden nodes

$\begin{displaymath} w_{jk}^h \leftarrow w_{jk}^h+\eta \delta_{pj}^h x_{pk} \end{displaymath}$
Compute

$\begin{displaymath} E_p=\frac{1}{2}\sum_{i=1}^{m} (y_{pi}-y'_{pi})^2 \end{displaymath}$

When this error is acceptably small for all of the training pattern pairs, training can be discontinued.

Competitive learning Network

Next: The Structure and Purpose Up: bp Previous: The Gradient Descent Method

Ruye Wang 2002-12-09

$\displaystyle E_p$	$\textstyle =$	$\displaystyle \frac{1}{2}\sum_{i=1}^m(y_{pi}-y'_{pi})^2$
	$\textstyle =$	$\displaystyle \frac{1}{2}\sum_{i=1}^m(y_{pi}-f(\sum_{j=1}^lw_{ij}^{o}z_{pj}+T_i))^2$
	$\textstyle =$	$\displaystyle \frac{1}{2}\sum_{i=1}^m(y_{pi}-f(\sum_{j=1}^lw_{ij}^{o}f(net_{pj})+T_i))^2$
	$\textstyle =$	$\displaystyle \frac{1}{2}\sum_{i=1}^m(y_{pi}-f(\sum_{j=1}^lw_{ij}^{o}f(\sum_{k=1}^nw_{jk}^{h}x_{pk}+T_j)+T_i))^2$

$\displaystyle \frac{\partial E_p}{\partial w_{jk}^{h}}$	$\textstyle =$	$\displaystyle -\sum_{i=1}^m(y_{pi}-y'_{pi})\frac{\partial y'_{pi}}{\partial net... ...artial z_{pj}}{\partial net_{pj}} \frac{\partial net_{pj}}{\partial w_{jk}^{h}}$
	$\textstyle =$	$\displaystyle -\sum_{i=1}^m \delta_{pi}^{o} f'(net_{pi})w_{ij}^{o}\;f'(net_{pj})x_{pk}$
	$\textstyle =$	$\displaystyle -f'(net_{pj})x_{pk} \delta_{pj}^{h}$