Newton's method is based on the
Taylor series expansion
of the function to be minimized near some point
:
(28) |
If is a quadratic function, then the Taylor series
contains only the first three terms (constant, linear, and quadratic
terms). In this case, we can find the vertex point at which
reaches its extremum, by first setting its derivative to zero:
(29) |
If
is not quadratic, the result above can still
be considered as an approximation of the solution, which can be
improved iteratively from the initial guess
to eventually
approach the solution:
(31) |
We note that the iteration
is just
the Newton-Raphson method
for equation
, as a necessary condition for an extremum
of
.
Newton's method for the minimization of a single-variable
function can be generalized for the minimization of
a mult-variable function
. Again,
we approximate
by a quadratic function
containing only the first three terms of its
Taylor series
at some initial point
:
(32) |
(33) |
(34) |
(35) |
If
is not quadratic, the result above is
not a minimum or maximum, but it can still be used as an approximation
of the solution, which can be improved to approached the true solution
iteratively:
We note that the iteration above is just a generalization of
in 1-D case where
and
. Also, the iteration
above is just the
Newton-Raphson method
for
solving equation
as the necessary condition for an extremum of
.
The speed of convergence of the iteration can be controlled by a
parameter that controls the step size:
(37) |
Example:
An over-constrained nonlinear equation system
of
equations and
unknowns can be solved by minimizing the following
sum-of-squares error:
(38) |
The gradient of the error function is:
(39) |
(40) |
We further find the component of the Hessian
of the error function in the ith row and jth column:
(41) |
(42) |
(43) |
Example:
Consider the following quadratic function:
We can speed up the convergence by a bigger step size . However,
if
is too big, the solution may be skipped and the iteration may
not converge if it gets into an oscillation around the solution. Even worse,
the iteration may become divergent. For such reasons, a smaller step size
may be preferred sometimes.
In summary, Newton's method approximates the function
at an
estimated solution
by a quadratic equation (the first three
terms of the Taylor's series) based on the gradient
and Hessian
of the function at
, and treat the vertex of the
quadratic equation as the updated estimate
.
Newton's method requires the Hessian matrix as well as the gradient to be available. Moreover, it is necessary calculate the inverse of the Hessian matrix in each iteration, which may be computationally expensive.
Example:
The Newton's method is applied to solving the following non-linear equation
system of variables:
(44) |
(45) |