Newton's method is based on the
Taylor series expansion
of the function to be minimized near some point :
(28) |
If is a quadratic function, then the Taylor series
contains only the first three terms (constant, linear, and quadratic
terms). In this case, we can find the vertex point at which
reaches its extremum, by first setting its derivative to zero:
(29) |
If is not quadratic, the result above can still be considered as an approximation of the solution, which can be improved iteratively from the initial guess to eventually approach the solution:
(31) |
We note that the iteration is just the Newton-Raphson method for equation , as a necessary condition for an extremum of .
Newton's method for the minimization of a single-variable function can be generalized for the minimization of a mult-variable function . Again, we approximate by a quadratic function containing only the first three terms of its Taylor series at some initial point :
(32) |
(33) |
(34) |
(35) |
If is not quadratic, the result above is not a minimum or maximum, but it can still be used as an approximation of the solution, which can be improved to approached the true solution iteratively:
where , , and the increment is called Newton search direction. We note that the computational complexity for each iteration is due to the inverse operation .We note that the iteration above is just a generalization of in 1-D case where and . Also, the iteration above is just the Newton-Raphson method for solving equation as the necessary condition for an extremum of .
The speed of convergence of the iteration can be controlled by a parameter that controls the step size:
(37) |
Example:
An over-constrained nonlinear equation system of equations and unknowns can be solved by minimizing the following sum-of-squares error:
(38) |
The gradient of the error function is:
(39) |
(40) |
We further find the component of the Hessian
of the error function in the ith row and jth column:
(41) |
(42) |
(43) |
Example:
Consider the following quadratic function:
We can speed up the convergence by a bigger step size . However, if is too big, the solution may be skipped and the iteration may not converge if it gets into an oscillation around the solution. Even worse, the iteration may become divergent. For such reasons, a smaller step size may be preferred sometimes.
In summary, Newton's method approximates the function at an estimated solution by a quadratic equation (the first three terms of the Taylor's series) based on the gradient and Hessian of the function at , and treat the vertex of the quadratic equation as the updated estimate .
Newton's method requires the Hessian matrix as well as the gradient to be available. Moreover, it is necessary calculate the inverse of the Hessian matrix in each iteration, which may be computationally expensive.
Example:
The Newton's method is applied to solving the following non-linear equation system of variables:
(44) |
(45) |