Newton's method

We first consider Newton's method applied to minimizing a univariate function #tex2html_wrap_inline7124#. The Taylor series expansion (Section #TalorSeries#183>) of the function #tex2html_wrap_inline7126# around a point #tex2html_wrap_inline7128# is:

#math370#
#tex2html_wrap_indisplay18691# #tex2html_wrap_indisplay18692# #tex2html_wrap_indisplay18693#  
  #tex2html_wrap_indisplay18694# #tex2html_wrap_indisplay18695# (8)
where the first and second order derivatives #tex2html_wrap_inline7130# and #tex2html_wrap_inline7132# represent respectively the local change and the more global curvature of the function.

<#18698#>Figure<#18698#> 2.1: <#18699#>Newton's Method<#18699#>
[width=70mm]<#194#>figures/NewtonEx1.eps<#194#>

If #tex2html_wrap_inline7134# is a quadratic function denoted by #tex2html_wrap_inline7136#, then its Taylor series contains only the first three terms (constant, linear, and quadratic terms). In this case, we can find the vertex point at which #tex2html_wrap_inline7138# reaches its extremum, by first setting its derivative to zero:

#math371#
#tex2html_wrap_indisplay18705# #tex2html_wrap_indisplay18706# #tex2html_wrap_indisplay18707#  
  #tex2html_wrap_indisplay18708# #tex2html_wrap_indisplay18709# (9)
and then solving the resulting equation to get:
#math372#   #tex2html_wrap_indisplay18711# (10)
where #math373##tex2html_wrap_inline7140# is the step we need to take to move from any initial point #tex2html_wrap_inline7142# to the solution #tex2html_wrap_inline7144# in a single step. Note that #tex2html_wrap_inline7146# is a minimum if #tex2html_wrap_inline7148# but a maximum if #tex2html_wrap_inline7150#.

If #tex2html_wrap_inline7152# is not quadratic, the result above can still be considered as an approximation of the solution, which can be improved iteratively from the initial guess #tex2html_wrap_inline7154# to eventually approach the solution:

#math374#   #tex2html_wrap_indisplay18721# (11)
where #math375##tex2html_wrap_inline7156#. We see that in each step #tex2html_wrap_inline7158# of the iteration, the function #tex2html_wrap_inline7160# is fitted by a quadratic functions #tex2html_wrap_inline7162# and its vertex at #tex2html_wrap_inline7164# is used as the updated approximated solution, at which the function is again fitted by another quadratic function #tex2html_wrap_inline7166# for the next iteration. Through this process the solution can be approached as #math376##tex2html_wrap_inline7168#.

We note that the iteration #math377##tex2html_wrap_inline7170# is just Eq. (#NewtonRaphson1d#219>) in Chapter #equations#220>, the Newton-Raphson method applied to solving equation #tex2html_wrap_inline7172#, as the necessary condition for an extremum of #tex2html_wrap_inline7174#.

Newton's method for the minimization of a single-variable function #tex2html_wrap_inline7176# can be generalized for the minimization of a multivariable function #math378##tex2html_wrap_inline7178#. Again, we approximate #tex2html_wrap_inline7180# by a quadratic function #tex2html_wrap_inline7182# containing only the first three terms of its Taylor series at some initial point #tex2html_wrap_inline7184#:

#math379#   #tex2html_wrap_indisplay18738# (12)
where #tex2html_wrap_inline7186# and #tex2html_wrap_inline7188# are respectively the gradient vector and Hessian matrix of the function #tex2html_wrap_inline7190# at #tex2html_wrap_inline7192#:
#math380#
#tex2html_wrap_indisplay18744# #tex2html_wrap_indisplay18745# #tex2html_wrap_indisplay18746#  
#tex2html_wrap_indisplay18747# #tex2html_wrap_indisplay18748# #tex2html_wrap_indisplay18749# (13)
The stationary point of #math381##tex2html_wrap_inline7194# can be found from any initial guess #tex2html_wrap_inline7196# by setting its derivative to zero:
#math382#
#tex2html_wrap_indisplay18753# #tex2html_wrap_indisplay18754# #tex2html_wrap_indisplay18755#  
  #tex2html_wrap_indisplay18756# #tex2html_wrap_indisplay18757# (14)
and solving the resulting equation to get
#math383#   #tex2html_wrap_indisplay18759# (15)
Similar to the single-variable case in which #tex2html_wrap_inline7198# is either a minimum or maximum depends on whether the second order derivative #tex2html_wrap_inline7200# is greater or smaller than zero, here in multi-variable case the function value #tex2html_wrap_inline7202# at the stationary point is a maximum, minimum, or saddle point, depending on the curvature of the function represented by the Hessian matrix #math384##tex2html_wrap_inline7204# evaluated at #tex2html_wrap_inline7206#:
  • If #tex2html_wrap_inline7208# is positive definite (all eigenvalues are positive), then #tex2html_wrap_inline7210# is <#327#>convex<#327#> and #tex2html_wrap_inline7212# is the global minimum;
  • If #tex2html_wrap_inline7214# is negative definite (all eigenvalues are negative), then #tex2html_wrap_inline7216# is <#331#>concave<#331#> and #tex2html_wrap_inline7218# is the global maximum;
  • If #tex2html_wrap_inline7220# is indefinite (with both positive and negative eigenvalues), then #tex2html_wrap_inline7222# is neither convex nor concave, and #tex2html_wrap_inline7224# is a saddle point (maximum in some directions, but minimum in others).
If #tex2html_wrap_inline7226# is quadratic and convex, the optimization problem is called <#338#>convex programming<#338#>, and its minimum can be easily obtained. (Equivalently, if #tex2html_wrap_inline7228# is quadratic and concave, its maximum can be easily obtained.)

But if #math385##tex2html_wrap_inline7230# is not quadratic, the result above is not a minimum or maximum, but it can still be used as an approximation of the solution, which can be improved iteratively to approach the solution:

#math386#   #tex2html_wrap_indisplay18778# (16)
where #math387##tex2html_wrap_inline7232#, #math388##tex2html_wrap_inline7234#, and the increment #math389##tex2html_wrap_inline7236# is called <#366#>Newton search direction<#366#>. We note that this iteration is just a generalization of #math390##tex2html_wrap_inline7238# in 1-D case where #math391##tex2html_wrap_inline7240# and #math392##tex2html_wrap_inline7242#, and the iteration #math393##tex2html_wrap_inline7244# above is just Eq. (#NRiterationMeqN#376>) in Chapter #equations#377>, the Newton-Raphson method applied to solving equation #math394##tex2html_wrap_inline7246#, as the necessary condition for an extremum of #math395##tex2html_wrap_inline7248#. The computational complexity for each iteration is #tex2html_wrap_inline7250# due to the inverse operation #math396##tex2html_wrap_inline7252#.

<#386#>Example 1:<#386#>

An over-constrained nonlinear equation system #math397##tex2html_wrap_inline7254# of #tex2html_wrap_inline7256# equations and #tex2html_wrap_inline7258# unknowns can be solved by minimizing the following SSE:

#math398#   #tex2html_wrap_indisplay18794# (17)

To use the Newton-Raphson method, we first find the gradient of the error function:

#math399#   #tex2html_wrap_indisplay18796# (18)
of which the ith component is:
#math400#   #tex2html_wrap_indisplay18798# (19)
where #math401##tex2html_wrap_inline7260# is the component of the function's Jacobian #math402##tex2html_wrap_inline7262# in the mth row and ith column.

We further find the component of the Hessian #math403##tex2html_wrap_inline7264# of the error function in the ith row and jth column:

#math404#
#tex2html_wrap_indisplay18803# #tex2html_wrap_indisplay18804# #tex2html_wrap_indisplay18805#  
  #tex2html_wrap_indisplay18806# #tex2html_wrap_indisplay18807#  
  #tex2html_wrap_indisplay18808# #tex2html_wrap_indisplay18809# (20)
Here #tex2html_wrap_inline7266# is the component of the Jacobian #tex2html_wrap_inline7268# in the ith row and jth column, and we have dropped the second order terms #math405##tex2html_wrap_inline7270#. Now the Hessian matrix can be written as
#math406#   #tex2html_wrap_indisplay18814# (21)
Based on Newton-Raphson's method, the optimal solution #tex2html_wrap_inline7272# that minimizes #math407##tex2html_wrap_inline7274# can be found iteratively:
#math408#   #tex2html_wrap_indisplay18818# (22)
This is the same as pseudo-inverse method in Eq. (#NRiterationMneqN#501>) in Chapter #equations#502>.

<#503#>Example 2:<#503#>

Consider the following quadratic function:

#math409#   #tex2html_wrap_indisplay18820#  
We have
#math410#   #tex2html_wrap_indisplay18822#  
We let #tex2html_wrap_inline7276#, #tex2html_wrap_inline7278#, and consider the following values of #tex2html_wrap_inline7280#:
  • #tex2html_wrap_inline7282#, #math411##tex2html_wrap_inline7284#, #math412##tex2html_wrap_inline7286#, #math413##tex2html_wrap_inline7288#, #tex2html_wrap_inline7290# is positive definite, #tex2html_wrap_inline7292# is the minimum.
  • #tex2html_wrap_inline7294#, #tex2html_wrap_inline7296#, #tex2html_wrap_inline7298#, #math414##tex2html_wrap_inline7300#, #tex2html_wrap_inline7302# is positive definite, #tex2html_wrap_inline7304# is the minimum.
  • #tex2html_wrap_inline7306#, #math415##tex2html_wrap_inline7308#, #math416##tex2html_wrap_inline7310#, #math417##tex2html_wrap_inline7312#, #tex2html_wrap_inline7314# is positive definite, #tex2html_wrap_inline7316# is the minimum.
  • #tex2html_wrap_inline7318#, #math418##tex2html_wrap_inline7320#, #math419##tex2html_wrap_inline7322#, #math420##tex2html_wrap_inline7324#, #tex2html_wrap_inline7326# is indefinite, #tex2html_wrap_inline7328# is a saddle point (minimum in one direction but maximum in another).

<#18850#>Figure<#18850#> 2.2: <#18851#>Quadratic Surfaces<#18851#>
[width=110mm]<#538#>figures/QuadraticSurf.eps<#538#>

We can speed up the convergence by a bigger step size #tex2html_wrap_inline7330#. However, if #tex2html_wrap_inline7332# is too big, the solution may be skipped and the iteration may not converge if it gets into an oscillation around the solution. Even worse, the iteration may become divergent. For such reasons, a smaller step size #tex2html_wrap_inline7334# may be preferred sometimes.

In summary, Newton's method approximates the function #tex2html_wrap_inline7336# at an estimated solution #tex2html_wrap_inline7338# by a quadratic equation (the first three terms of the Taylor's series) based on the gradient #tex2html_wrap_inline7340# and Hessian #tex2html_wrap_inline7342# of the function at #tex2html_wrap_inline7344#, and treat the vertex of the quadratic equation as the updated estimate #tex2html_wrap_inline7346#.

Newton's method requires the Hessian matrix as well as the gradient to be available. Moreover, it is necessary calculate the inverse of the Hessian matrix in each iteration, which may be computationally expensive.

<#549#>Example 3:<#549#>

The Newton's method is applied to solving the following non-linear equation system of #tex2html_wrap_inline7348# variables:

#math421#   #tex2html_wrap_indisplay18864#  
with the exact solution #math422##tex2html_wrap_inline7350#. These equations can be expressed in vector form as #math423##tex2html_wrap_inline7352# and solved as an optimization problem with the objective function #math424##tex2html_wrap_inline7354#. The iteration from an initial guess #math425##tex2html_wrap_inline7356# is shown below.
#math426#   #tex2html_wrap_indisplay18870# (23)
We see that after 13 iterations the algorithm converges to the following approximated solution with accuracy of #math427##tex2html_wrap_inline7358#:
#math428#   #tex2html_wrap_indisplay18873# (24)