Newton-Raphson Method (Multivariate)

The Newton-Raphson method discussed above for solving a single-variable equation #tex2html_wrap_inline3142# can be generalized to the case of multivariate equation systems containing #tex2html_wrap_inline3144# equations of #tex2html_wrap_inline3146# variables in #math342##tex2html_wrap_inline3148#:

#math343#   #tex2html_wrap_indisplay6744# (65)
Again, we first consider the Taylor series expansion of each of the #tex2html_wrap_inline3150# functions in the neighborhood of an initial guess #math344##tex2html_wrap_inline3152#:
#math345#   #tex2html_wrap_indisplay6748# (66)
where #math346##tex2html_wrap_inline3154# represents the second and higher order terms in the series beyond the linear term, which can be neglected if #math347##tex2html_wrap_inline3156# is small. These #tex2html_wrap_inline3158# equations can be expressed in matrix form
#math348#   #tex2html_wrap_indisplay6753# (67)
or more concisely
#math349#   #tex2html_wrap_indisplay6755# (68)
where #math350##tex2html_wrap_inline3160#, and #math351##tex2html_wrap_inline3162# and #math352##tex2html_wrap_inline3164# are respectively the function #math353##tex2html_wrap_inline3166# and its <#1183#>Jacobian matrix<#1183#> #math354##tex2html_wrap_inline3168# both evaluated at #tex2html_wrap_inline3170#. We further consider solving the equation system #math355##tex2html_wrap_inline3172# in the following two cases:

  • #tex2html_wrap_inline3174#: The number of equations is the same as the number of unknowns, the Jacobian #math356##tex2html_wrap_inline3176# is a square matrix and its inverse #tex2html_wrap_inline3178# exists in general. In the special case where #math357##tex2html_wrap_inline3180# is linear, the Taylor series contains only the first two terms while all higher order terms are zero, and the approximation in Eq. (#TaylorApproximate#1199>) becomes exact with #math358##tex2html_wrap_inline3182#. To find the root #tex2html_wrap_inline3184# of the equation, we set #math359##tex2html_wrap_inline3186# in the equation to zero and solve the resulting equation to get
    #math360#   #tex2html_wrap_indisplay6771# (69)
    For example, if #math361##tex2html_wrap_inline3188# and #math362##tex2html_wrap_inline3190#, then #math363##tex2html_wrap_inline3192#.

    If #math364##tex2html_wrap_inline3194# is nonlinear, the sum of the first two terms of the Taylor series is only an approximation, and the result in Eq. (#NRnlinear#1229>) is only an approximate root, which can be further improved iteratively to move from the initial guess #tex2html_wrap_inline3196# towards the root #tex2html_wrap_inline3198#:

    #math365#   #tex2html_wrap_indisplay6779# (70)
    where #math366##tex2html_wrap_inline3200# is the increment, representing the <#1247#>search direction<#1247#> at the nth step. The iteration moves #tex2html_wrap_inline3202# in the N-D space spanned by #math367##tex2html_wrap_inline3204# from some initial guess #tex2html_wrap_inline3206# along such a path that all function values #math368##tex2html_wrap_inline3208#) are reduced. As in the univariate case, a scaling factor #tex2html_wrap_inline3210# can be used to control the step size of the iteration
    #math369#   #tex2html_wrap_indisplay6787# (71)
    The step size can be adjusted based on the specific function shape. When #tex2html_wrap_inline3212#, the step size becomes smaller and the convergence of the iteration is slower, we will have a better chance not to skip a solution, which may happen if #math370##tex2html_wrap_inline3214# is not smooth and the step size is too big.

    The algorithm is listed below (where the tolerance #tex2html_wrap_inline3216# is preset to a small value):

    • Select #tex2html_wrap_inline3218#
    • Obtain #math371##tex2html_wrap_inline3220# and #tex2html_wrap_inline3222#
    • Obtain #math372##tex2html_wrap_inline3224#
    • #tex2html_wrap_inline3226#
    • While #math373##tex2html_wrap_inline3228# do
      • #math374##tex2html_wrap_inline3230#
      • Find #tex2html_wrap_inline3232# and #tex2html_wrap_inline3234#
      • #tex2html_wrap_inline3236#

  • #tex2html_wrap_inline3238#: There are more equations than unknowns, i.e., equation #math375##tex2html_wrap_inline3240# is an over-constrained system, and the Jacobian #math376##tex2html_wrap_inline3242# is an #tex2html_wrap_inline3244# non-square matrix with no inverse, i.e., no solution exists for the equation #math377##tex2html_wrap_inline3246# in general. But we can still seek to find an optimal solution #tex2html_wrap_inline3248# that minimizes the following sum-of-squares error:
    #math378#   #tex2html_wrap_indisplay6808# (72)
    The <#1313#>gradient vector<#1313#> of #math379##tex2html_wrap_inline3250# is:
    #math380#   #tex2html_wrap_indisplay6811# (73)
    The nth component of #math381##tex2html_wrap_inline3252# is
    #math382#   #tex2html_wrap_indisplay6814# (74)
    where #math383##tex2html_wrap_inline3254# is the component in the mth row and nth column of the Jacobian #math384##tex2html_wrap_inline3256#. Now the gradient can be written as
    #math385#   #tex2html_wrap_indisplay6818# (75)

    If specially #math386##tex2html_wrap_inline3258# is linear with #math387##tex2html_wrap_inline3260#, then it can be represented as the sum of the first two terms of its Taylor series in Eq. (#TaylorApproximate#1366>), then the gradient is:

    #math388#   #tex2html_wrap_indisplay6822# (76)
    where #tex2html_wrap_inline3262# is any chosen initial guess. If we assume #tex2html_wrap_inline3264# is the optimal solution at which #math389##tex2html_wrap_inline3266# is minimized and #math390##tex2html_wrap_inline3268# is zero:
    #math391#   #tex2html_wrap_indisplay6828# (77)
    then by solving this equation we get the optimal increment
    #math392#   #tex2html_wrap_indisplay6830# (78)
    Here #math393##tex2html_wrap_inline3270# is the <#1418#>pseudo-inverse<#1418#> (Section #pseudoInverse#1419>) of the non-square matrix #tex2html_wrap_inline3272#. Now the optimal solution can be found as:
    #math394#   #tex2html_wrap_indisplay6834# (79)
    For example, if #math395##tex2html_wrap_inline3274# and #math396##tex2html_wrap_inline3276#, we have #math397##tex2html_wrap_inline3278#, same as Eq. (#linearEquationPseudoInverse#1440>).

    When #math398##tex2html_wrap_inline3280# is nonlinear in general, the sum of the first two terms of its Taylor series is its approximation, and the result above is an approximation of the root, which can be further improved iteratively to move from the initial guess towards the root:

    #math399#   #tex2html_wrap_indisplay6840# (80)
    This iteration will converge to #tex2html_wrap_inline3282# at which #math400##tex2html_wrap_inline3284#, and the squared error #math401##tex2html_wrap_inline3286# is minimized.

Comparing Eqs. (#NRiterationMeqN#1459>) and (#NRiterationMneqN#1460>), we see that they are essentially the same, with the only difference that the regular inverse #tex2html_wrap_inline3288# is used when #tex2html_wrap_inline3290#, but the pseudoinverse #tex2html_wrap_inline3292# is used when #tex2html_wrap_inline3294# and #tex2html_wrap_inline3296# does not exist.

The Newton-Raphson method assumes the availability of the analytical expressions of all partial derivatives #math402##tex2html_wrap_inline3298# in the Jacobian matrix #tex2html_wrap_inline3300#. However, when this is not the case, #tex2html_wrap_inline3302# need to be approximated by forward or central difference (secant) method:

#math403#
#tex2html_wrap_indisplay6853# #tex2html_wrap_indisplay6854# #tex2html_wrap_indisplay6855#  
  #tex2html_wrap_indisplay6856# #tex2html_wrap_indisplay6857# (81)
where #tex2html_wrap_inline3304# is a small increment.

<#1479#>Example 1<#1479#>

#math404#   #tex2html_wrap_indisplay6860#  

#math405#   #tex2html_wrap_indisplay6862#  

#math406#   #tex2html_wrap_indisplay6864#  

<#1500#>Example 2<#1500#>

#math407#   #tex2html_wrap_indisplay6866#  

#math408#   #tex2html_wrap_indisplay6868#  

With #math409##tex2html_wrap_inline3306#, we get a root:

#math410#   #tex2html_wrap_indisplay6871#  

With #math411##tex2html_wrap_inline3308#, we get another root:

#math412#   #tex2html_wrap_indisplay6874#  

<#1532#>Broyden's method<#1532#>

In the Newton-Raphson method, two main operations are carried out in each iteration: (a) evaluate the Jacobian matrix #math413##tex2html_wrap_inline3310# and (b) obtain its inverse #math414##tex2html_wrap_inline3312#. To avoid the expensive computation for these operations, we can consider using Broyden's method, one of the so called <#1540#>quasi-Newton methods<#1540#>, which approximates the inverse of the Jacobian #math415##tex2html_wrap_inline3314# from the #math416##tex2html_wrap_inline3316# in the previous iteration step, so that it can be updated iteratively from the initial #math417##tex2html_wrap_inline3318#.

We first consider in the single-variable case how to estimate the next derivative #math418##tex2html_wrap_inline3320# from the current #tex2html_wrap_inline3322# by the secant method:

#math419#
#tex2html_wrap_indisplay6883# #tex2html_wrap_indisplay6884# #tex2html_wrap_indisplay6885#  
  #tex2html_wrap_indisplay6886# #tex2html_wrap_indisplay6887# (82)
where
  • #math420##tex2html_wrap_inline3324# is the true increment of the function over the interval #math421##tex2html_wrap_inline3326#;
  • #math422##tex2html_wrap_inline3328# is the estimated increment of the function based on the previous derivative #tex2html_wrap_inline3330#;
  • #math423##tex2html_wrap_inline3332# is the estimated increment of the derivative:
    #math424#   #tex2html_wrap_indisplay6894# (83)
The equation above indicates that the derivative #tex2html_wrap_inline3334# in the #tex2html_wrap_inline3336# step can be estimated by adding the estimated increment #math425##tex2html_wrap_inline3338# to the derivative #tex2html_wrap_inline3340# in the current #tex2html_wrap_inline3342# step.

<#6900#>Figure<#6900#> 1.16: <#6901#>Broyden's Method<#6901#>
[width=90mm]<#1594#>figures/Broyden1.eps<#1594#>

Having obtained #math426##tex2html_wrap_inline3344#, we can use the same iteration in the Newton-Raphson method to find #tex2html_wrap_inline3346#:

#math427#   #tex2html_wrap_indisplay6906# (84)

This method for single-variable case can be generalized to multiple variable case for solving #math428##tex2html_wrap_inline3348#. Following the way we estimate the increment of the derivative of a single-variable function in Eq. (#deltaderivative#1609>), here we can estimate the increment of the Jacobian of a multi-variable function:

#math429#   #tex2html_wrap_indisplay6909# (85)
where #math430##tex2html_wrap_inline3350# and #math431##tex2html_wrap_inline3352#. Now in each iteration, we can update the estimated Jacobian as well as the estimated root:
#math432#   #tex2html_wrap_indisplay6913# (86)

The algorithm can be further improved so that the inverse of the Jacobian #tex2html_wrap_inline3354# is avoided. Specifically, consider the inverse Jacobian:

#math433#   #tex2html_wrap_indisplay6916# (87)
We can apply the Sherman-Morrison formula (Section #Woodbury#1668>):
#math434#   #tex2html_wrap_indisplay6918# (88)
to the right-hand side of the equation above by defining
#math435#   #tex2html_wrap_indisplay6920# (89)
and rewrite it as:
#math436#   #tex2html_wrap_indisplay6922# (90)
We see that the next #math437##tex2html_wrap_inline3356# can be iteratively estimated directly from the previous #math438##tex2html_wrap_inline3358#, thereby avoiding computing the inverse of #tex2html_wrap_inline3360# altogether. The algorithm is listed below:

  • Select #tex2html_wrap_inline3362#
  • Find #math439##tex2html_wrap_inline3364#, #math440##tex2html_wrap_inline3366#, and #math441##tex2html_wrap_inline3368#
  • #math442##tex2html_wrap_inline3370#
  • #tex2html_wrap_inline3372#
  • While #math443##tex2html_wrap_inline3374# do
    • #math444##tex2html_wrap_inline3376#
    • #math445##tex2html_wrap_inline3378#
    • #math446##tex2html_wrap_inline3380#
    • #math447##tex2html_wrap_inline3382#
    • #math448##tex2html_wrap_inline3384#, #math449##tex2html_wrap_inline3386#
    • #math450##tex2html_wrap_inline3388#
    • #tex2html_wrap_inline3390#