Next: The Learning Law Up: bp Previous: The Purpose and the

The Gradient Descent Method

Given a 2-D function , the gradient descent method finds so that $f(x_0,y_0) \rightarrow \;min$ .

First consider 1D case. We see that $x^{new}$ is always in the opposite direction of the increasing direction of :

$\begin{displaymath}x^{new}=x^{old}-\eta\;\frac{df}{dx} \end{displaymath}$

where $\eta$ is the step size, or the learning rate in network training.

Then consider 2D case where the derivative used in 1D case becomes gradient

$\begin{displaymath}{\bf G}=\bigtriangledown f(x,y)\stackrel{\triangle}{=} \frac{... ...\frac{\partial f}{\partial y} {\bf j} =f_x {\bf i}+f_y {\bf j} \end{displaymath}$

Or, in vector form,

$\begin{displaymath}G=[f_x, f_y]^T \end{displaymath}$

Now gradient descent becomes

$\begin{displaymath}X^{new}=X^{old}-\eta\;G \end{displaymath}$

i.e.,

$\begin{displaymath}\left\{ \begin{array}{l} x^{new}=x^{old}-\eta\,f_x \\ y^{new}=y^{old}-\eta\,f_y \end{array} \right. \end{displaymath}$

Obviously the gradient descent method can be generalized to minimize high dimensional functions $f(x_1,\cdots,x_n)$ .

Ruye Wang 2002-12-09