Quasi-Newton Methods

As we have seen above, Newton's methods can be used to solve both nonlinear systems to find roots of a set of simultaneous equations, and optimization problems to minimize a scalar-valued objective function based on the iterations of the same form. Specifically,

If the Jacobian #math527##tex2html_wrap_inline7706# or the Hessian matrix #math528##tex2html_wrap_inline7708# is not available, or if it is too computationally costly to calculate its inverse (with complexity #tex2html_wrap_inline7710#), the quasi-Newton methods can be used to approximate the Hessian matrix or its inverse based only on the first order derivative, the gradient #tex2html_wrap_inline7712# of #tex2html_wrap_inline7714# (with complexity #tex2html_wrap_inline7716#), similar to Broyden's method previously considered (Section #NewtonRaphsonMV#1424>).

In the following, we consider the minimization of a function #tex2html_wrap_inline7718#. Its Taylor expansion around point #tex2html_wrap_inline7720# is

#math529#   #tex2html_wrap_indisplay19186# (59)
Taking derivative with respect to #tex2html_wrap_inline7722#, we get
#math530#   #tex2html_wrap_indisplay19189# (60)
Evaluating at #math531##tex2html_wrap_inline7724#, we have #math532##tex2html_wrap_inline7726#, and the above can be written as
#math533#
#tex2html_wrap_indisplay19193# #tex2html_wrap_indisplay19194# #tex2html_wrap_indisplay19195#  
  #tex2html_wrap_indisplay19196# #tex2html_wrap_indisplay19197# (61)
where matrix #tex2html_wrap_inline7728# is the secant approximation of the Hessian matrix #tex2html_wrap_inline7730#, and the last equality is called the <#1494#>secant equation<#1494#>. For convenience, we further define:
#math534#   #tex2html_wrap_indisplay19201# (62)
so that the equation above can be written as
#math535#   #tex2html_wrap_indisplay19203# (63)
This is the <#1517#>quasi-Newton equation<#1517#>, which is the <#1518#>secant condition<#1518#> that must be satisfied by matrix #tex2html_wrap_inline7732#, or its inverse #math536##tex2html_wrap_inline7734#, in any of the quasi-Newton algorithms, all taking the follow general steps:
  1. Initialize #tex2html_wrap_inline7736# and #tex2html_wrap_inline7738#, set #tex2html_wrap_inline7740#;
  2. Compute gradient #tex2html_wrap_inline7742# and the search direction #math537##tex2html_wrap_inline7744#;
  3. Get #math538##tex2html_wrap_inline7746# with step size #tex2html_wrap_inline7748# satisfying the Wolfe conditions
  4. update #math539##tex2html_wrap_inline7750# or #math540##tex2html_wrap_inline7752#, that satisfies the quasi-Newton equation;
  5. If termination condition is not satisfied, #tex2html_wrap_inline7754#, go back to second step.
For this iteration to converge to a local minimum of #tex2html_wrap_inline7756#, #tex2html_wrap_inline7758# must be positive definite matrix, same as the Hessian matrix #tex2html_wrap_inline7760# it approximates. Specially, if #math541##tex2html_wrap_inline7762#, then #math542##tex2html_wrap_inline7764#, the algorithm becomes the gradient descent method; also, if #math543##tex2html_wrap_inline7766# is the Hessian matrix, then the algorithm becomes the Newton's method.

In a quasi-Newton method, we can choose to update either #tex2html_wrap_inline7768# or its inverse #math544##tex2html_wrap_inline7770# based on one of the two forms of the quasi-Newton equation. We note that there is a dual relationship between the two ways to update, i.e., by swapping #tex2html_wrap_inline7772# and #tex2html_wrap_inline7774#, an update formula for #tex2html_wrap_inline7776# can be directly applied to its inverse and vice versa.

  • The <#1562#>Symmetric Rank 1 (SR1)<#1562#> Algorithm

    Here matrix #tex2html_wrap_inline7778# is updated by an additional term:

    #math545#   #tex2html_wrap_indisplay19229# (64)
    where #tex2html_wrap_inline7780# a vector and #tex2html_wrap_inline7782# is a matrix of rank 1. Imposing the secant condition, we get
    #math546#   #tex2html_wrap_indisplay19233# (65)
    This equation indicates #tex2html_wrap_inline7784# is in the same direction as #math547##tex2html_wrap_inline7786#, and can therefore be written as #math548##tex2html_wrap_inline7788#. Substituting this back we get
    #math549#   #tex2html_wrap_indisplay19238# (66)
    Solving this we get #math550##tex2html_wrap_inline7790#,
    #math551#   #tex2html_wrap_indisplay19241# (67)
    and the iteration of #tex2html_wrap_inline7792# becomes
    #math552#   #tex2html_wrap_indisplay19244# (68)
    Applying the Sherman-Morrison formula (Section #Woodbury#1644>) we further get the inverse #tex2html_wrap_inline7794#:
    #math553#
    #tex2html_wrap_indisplay19247# #tex2html_wrap_indisplay19248# #tex2html_wrap_indisplay19249#  
      #tex2html_wrap_indisplay19250# #tex2html_wrap_indisplay19251# (69)
    We note that this equation is dual to the previous one, and it can also be obtained based on the duality between the secant conditions for #tex2html_wrap_inline7796# and #math554##tex2html_wrap_inline7798#. Specifically, same as above, we impose the secant condition for the inverse #math555##tex2html_wrap_inline7800# of the following form
    #math556#   #tex2html_wrap_indisplay19256# (70)
    and get
    #math557#   #tex2html_wrap_indisplay19258# (71)
    Then by repeating the same process above, we get the same update formula for #math558##tex2html_wrap_inline7802#.

    This is the formula for directly updating #math559##tex2html_wrap_inline7804#. For the iteration to converge to a local minimum, matrix #math560##tex2html_wrap_inline7806# needs to be positive definite as well as #math561##tex2html_wrap_inline7808#, we therefore require

    #math562#   #tex2html_wrap_indisplay19264# (72)
    It may be difficult to maintain this requirement through out the iteration. This problem can be avoided in the following DFP and BFGS methods.

  • The <#1731#>BFGS (Broyden-Fletcher-Goldfarb-Shanno)<#1731#> algorithm

    This is a rank 2 algorithm in which matrix #tex2html_wrap_inline7810# is updated by two rank-1 terms:

    #math563#   #tex2html_wrap_indisplay19267# (73)
    Imposing the secant condition for #tex2html_wrap_inline7812#, we get
    #math564#   #tex2html_wrap_indisplay19270# (74)
    or
    #math565#   #tex2html_wrap_indisplay19272# (75)
    which can be satisfied if we let
    #math566#   #tex2html_wrap_indisplay19274# (76)
    Substituting these into #math567##tex2html_wrap_inline7814# we get
    #math568#   #tex2html_wrap_indisplay19277# (77)
    Given this update formula for #tex2html_wrap_inline7816#, we can further find its inverse #math569##tex2html_wrap_inline7818# and then the search direction #math570##tex2html_wrap_inline7820#.

    Alternatively, given #tex2html_wrap_inline7822#, we can further derive an update formula directly for the inverse matrix #math571##tex2html_wrap_inline7824#, so that the search direction #tex2html_wrap_inline7826# can be obtained without carrying out the inversion computation. Specifically, we first define #math572##tex2html_wrap_inline7828# and #math573##tex2html_wrap_inline7830#, where

    #math574#   #tex2html_wrap_indisplay19287# (78)
    so that the expression above can be written as:
    #math575#   #tex2html_wrap_indisplay19289# (79)
    and then apply the Sherman-Morrison formula to get:
    #math576#
    #tex2html_wrap_indisplay19291# #tex2html_wrap_indisplay19292# #tex2html_wrap_indisplay19293#  
      #tex2html_wrap_indisplay19294# #tex2html_wrap_indisplay19295#  
      #tex2html_wrap_indisplay19296# #tex2html_wrap_indisplay19297# (80)
    where we have defined
    #math577#   #tex2html_wrap_indisplay19299# (81)
    with
    #math578#   #tex2html_wrap_indisplay19301# (82)
    #math579#   #tex2html_wrap_indisplay19303# (83)
    #math580#   #tex2html_wrap_indisplay19305# (84)
    #math581#   #tex2html_wrap_indisplay19307# (85)
    #math582#   #tex2html_wrap_indisplay19309# (86)
    Substituting this #tex2html_wrap_inline7832# into the expression for #math583##tex2html_wrap_inline7834#, we get:
    #math584#
    #tex2html_wrap_indisplay19313# #tex2html_wrap_indisplay19314# #tex2html_wrap_indisplay19315#  
      #tex2html_wrap_indisplay19316# #tex2html_wrap_indisplay19317# (87)
    where
    #math585#   #tex2html_wrap_indisplay19319# (88)
    #math586#   #tex2html_wrap_indisplay19321# (89)

    #math587#
    #tex2html_wrap_indisplay19323# #tex2html_wrap_indisplay19324# #tex2html_wrap_indisplay19325#  
      #tex2html_wrap_indisplay19326# #tex2html_wrap_indisplay19327# (90)
    Substituting these three terms back into Eq. (#BFGSinverseBexpression#2181>), we get the update formula for #math588##tex2html_wrap_inline7836#:
    #math589#   #tex2html_wrap_indisplay19330# (91)
    In summary, #math590##tex2html_wrap_inline7838# can be found by either Eq. (#BFGSupdateB#2215>) for #tex2html_wrap_inline7840# followed by an additional inversion operation, or Eq. (#BFGSupdateInvB#2218>) directly for #math591##tex2html_wrap_inline7842#. The results obtained by these methods are equivalent.

  • The <#2222#>DFP (Davidon-Fletcher-Powell)<#2222#> Algorithm

    Same as the BFGS algorithm, the DFP algorithm is also a rank 2 algorithm, where instead of #tex2html_wrap_inline7844#, the inverse matrix #math592##tex2html_wrap_inline7846# is updated by two rank-1 terms:

    #math593#   #tex2html_wrap_indisplay19337# (92)
    where #tex2html_wrap_inline7848# and #tex2html_wrap_inline7850# are real scalars, #tex2html_wrap_inline7852# and #tex2html_wrap_inline7854# are vectors. Imposing the secant condition for the inverse of #tex2html_wrap_inline7856#, we get
    #math594#   #tex2html_wrap_indisplay19344# (93)
    or
    #math595#   #tex2html_wrap_indisplay19346# (94)
    which can be satisfied if we let
    #math596#   #tex2html_wrap_indisplay19348# (95)
    Substituting these into #math597##tex2html_wrap_inline7858# we get
    #math598#   #tex2html_wrap_indisplay19351# (96)
    Following the same procedure used for the BFGS method, we can also obtain the update formula of matrix #tex2html_wrap_inline7860# for the DFP method:
    #math599#   #tex2html_wrap_indisplay19354# (97)

    We note that Eqs. (#DFPupdateInvB#2347>) and (#BFGSupdateB#2348>) form a duality pair, and Eqs. (#DFPupdateB#2349>) and (#BFGSupdateInvB#2350>) form another duality pair. In other words, the BFGS and FDP methods are dual of each other.

In both the DFP and BFGS methods, matrix #math600##tex2html_wrap_inline7862#, as well as #tex2html_wrap_inline7864#, must be positive definite, i.e., #math601##tex2html_wrap_inline7866# must hold for any #math602##tex2html_wrap_inline7868#. We now prove that this requirement is satisfied if the curvature condition of the Wolfe conditions is satisfied:

#math603#   #tex2html_wrap_indisplay19360# (98)
Replacing the search direction #tex2html_wrap_inline7870# by #math604##tex2html_wrap_inline7872#, we get:
#math605#   #tex2html_wrap_indisplay19364# (99)
We also have
#math606#   #tex2html_wrap_indisplay19366# (100)
Subtracting the first equation from the second, we get
#math607#   #tex2html_wrap_indisplay19368# (101)
The last inequality is due to the fact that#math608##tex2html_wrap_inline7874# and #tex2html_wrap_inline7876#. We therefore have
#math609#   #tex2html_wrap_indisplay19372# (102)
Given #math610##tex2html_wrap_inline7878#, we can further prove by induction that both #tex2html_wrap_inline7880# and #math611##tex2html_wrap_inline7882# based respectively on the update formulae in Eqs. (#BFGSupdateB#2431>) and (#DFPupdateInvB#2432>) are positive definite. We first assume #tex2html_wrap_inline7884# and #tex2html_wrap_inline7886# are both positive definite, and express #tex2html_wrap_inline7888# in terms of its Cholesky decomposition, #math612##tex2html_wrap_inline7890#. We further define #math613##tex2html_wrap_inline7892#, #math614##tex2html_wrap_inline7894#, and get
#math615#   #tex2html_wrap_indisplay19383# (103)
Now we can show that the sum of the first and third terms of Eq. (#BFGSupdateInvB#2461>) is positive definite:
#math616#   #tex2html_wrap_indisplay19385# (104)
The last step is due to the Cauchy-Schwarz inequality. Also, as #math617##tex2html_wrap_inline7896#, the second term of Eq. (#BFGSupdateB#2491>) is positive definite
#math618#   #tex2html_wrap_indisplay19388# (105)
Combining these two results, we get
#math619#   #tex2html_wrap_indisplay19390# (106)
i.e., #tex2html_wrap_inline7898# based on the update formula Eq. (#BFGSupdateB#2524>) is positive definite. Following the same steps, we can also show that #math620##tex2html_wrap_inline7900# based on Eq. (#DFPupdateInvB#2528>) is positive definite.

<#2529#>Examples:<#2529#>

Fig. #RosenbrockKSR1#2530> shows the search path of the DFP and BPGS methods when applied to find the minimum of the Rosenbrock function.

<#19393#>Figure<#19393#> 2.11: <#19394#>SR1 based on #tex2html_wrap_inline7902#<#19394#>
[width=70mm]<#2532#>figures/RosenbrockSR1.eps<#2532#>

<#19398#>Figure<#19398#> 2.12: <#19399#>DFP based on #tex2html_wrap_inline7904#<#19399#>
[width=70mm]<#2538#>figures/RosenbrockDFP.eps<#2538#>

<#19403#>Figure<#19403#> 2.13: <#19404#>BFGS based on #tex2html_wrap_inline7906#<#19404#>
[width=70mm]<#2544#>figures/RosenbrockBFGS.eps<#2544#>