Quasi-Newton Methods

As we have seen above, Newton's methods can be used to solve both nonlinear systems to find roots of a set of simultaneous equations, and optimization problems to minimize a scalar-valued objective function based on the iterations of the same form. Specifically,

Solving equations #math520##tex2html_wrap_inline7690#:

#math521# #tex2html_wrap_indisplay19167# (57)

where #math522##tex2html_wrap_inline7692# is the Jacobian matrix of #math523##tex2html_wrap_inline7694#,

Minimizing #tex2html_wrap_inline7696#:

#math524# #tex2html_wrap_indisplay19172# (58)

where #math525##tex2html_wrap_inline7698# and #math526##tex2html_wrap_inline7700# are the gradient vector and Hessian matrix of #tex2html_wrap_inline7702# at #tex2html_wrap_inline7704#, respectively.

If the Jacobian #math527##tex2html_wrap_inline7706# or the Hessian matrix #math528##tex2html_wrap_inline7708# is not available, or if it is too computationally costly to calculate its inverse (with complexity #tex2html_wrap_inline7710#), the quasi-Newton methods can be used to approximate the Hessian matrix or its inverse based only on the first order derivative, the gradient #tex2html_wrap_inline7712# of #tex2html_wrap_inline7714# (with complexity #tex2html_wrap_inline7716#), similar to Broyden's method previously considered (Section #NewtonRaphsonMV#1424>).

In the following, we consider the minimization of a function #tex2html_wrap_inline7718#. Its Taylor expansion around point #tex2html_wrap_inline7720# is

#math529# #tex2html_wrap_indisplay19186# (59)

#tex2html_wrap_inline7722#

#math530# #tex2html_wrap_indisplay19189# (60)

#math531##tex2html_wrap_inline7724#, we have #math532##tex2html_wrap_inline7726#, and the above can be written as

#math533#

#tex2html_wrap_indisplay19193#	#tex2html_wrap_indisplay19194#	#tex2html_wrap_indisplay19195#
	#tex2html_wrap_indisplay19196#	#tex2html_wrap_indisplay19197#	(61)

where matrix #tex2html_wrap_inline7728# is the secant approximation of the Hessian matrix #tex2html_wrap_inline7730#, and the last equality is called the <#1494#>secant equation<#1494#>. For convenience, we further define:

#math534# #tex2html_wrap_indisplay19201# (62)

so that the equation above can be written as

#math535# #tex2html_wrap_indisplay19203# (63)

This is the <#1517#>quasi-Newton equation<#1517#>, which is the <#1518#>secant condition<#1518#> that must be satisfied by matrix #tex2html_wrap_inline7732#, or its inverse #math536##tex2html_wrap_inline7734#, in any of the quasi-Newton algorithms, all taking the follow general steps:

Initialize #tex2html_wrap_inline7736# and #tex2html_wrap_inline7738#, set #tex2html_wrap_inline7740#;
Compute gradient #tex2html_wrap_inline7742# and the search direction #math537##tex2html_wrap_inline7744#;
Get #math538##tex2html_wrap_inline7746# with step size #tex2html_wrap_inline7748# satisfying the Wolfe conditions
update #math539##tex2html_wrap_inline7750# or #math540##tex2html_wrap_inline7752#, that satisfies the quasi-Newton equation;
If termination condition is not satisfied, #tex2html_wrap_inline7754#, go back to second step.

For this iteration to converge to a local minimum of #tex2html_wrap_inline7756#, #tex2html_wrap_inline7758# must be positive definite matrix, same as the Hessian matrix #tex2html_wrap_inline7760# it approximates. Specially, if #math541##tex2html_wrap_inline7762#, then #math542##tex2html_wrap_inline7764#, the algorithm becomes the gradient descent method; also, if #math543##tex2html_wrap_inline7766# is the Hessian matrix, then the algorithm becomes the Newton's method.

In a quasi-Newton method, we can choose to update either #tex2html_wrap_inline7768# or its inverse #math544##tex2html_wrap_inline7770# based on one of the two forms of the quasi-Newton equation. We note that there is a dual relationship between the two ways to update, i.e., by swapping #tex2html_wrap_inline7772# and #tex2html_wrap_inline7774#, an update formula for #tex2html_wrap_inline7776# can be directly applied to its inverse and vice versa.

The <#1562#>Symmetric Rank 1 (SR1)<#1562#> Algorithm
Here matrix #tex2html_wrap_inline7778# is updated by an additional term:
#math545# #tex2html_wrap_indisplay19229# (64)
where #tex2html_wrap_inline7780# a vector and #tex2html_wrap_inline7782# is a matrix of rank 1. Imposing the secant condition, we get
#math546# #tex2html_wrap_indisplay19233# (65)
This equation indicates #tex2html_wrap_inline7784# is in the same direction as #math547##tex2html_wrap_inline7786#, and can therefore be written as #math548##tex2html_wrap_inline7788#. Substituting this back we get
#math549# #tex2html_wrap_indisplay19238# (66)
Solving this we get #math550##tex2html_wrap_inline7790#,
#math551# #tex2html_wrap_indisplay19241# (67)
and the iteration of #tex2html_wrap_inline7792# becomes
#math552# #tex2html_wrap_indisplay19244# (68)
Applying the Sherman-Morrison formula (Section #Woodbury#1644>) we further get the inverse #tex2html_wrap_inline7794#:

#math553#

#tex2html_wrap_indisplay19247# #tex2html_wrap_indisplay19248# #tex2html_wrap_indisplay19249#

#tex2html_wrap_indisplay19250# #tex2html_wrap_indisplay19251# (69)

We note that this equation is dual to the previous one, and it can also be obtained based on the duality between the secant conditions for #tex2html_wrap_inline7796# and #math554##tex2html_wrap_inline7798#. Specifically, same as above, we impose the secant condition for the inverse #math555##tex2html_wrap_inline7800# of the following form
#math556# #tex2html_wrap_indisplay19256# (70)
and get
#math557# #tex2html_wrap_indisplay19258# (71)
Then by repeating the same process above, we get the same update formula for #math558##tex2html_wrap_inline7802#.
This is the formula for directly updating #math559##tex2html_wrap_inline7804#. For the iteration to converge to a local minimum, matrix #math560##tex2html_wrap_inline7806# needs to be positive definite as well as #math561##tex2html_wrap_inline7808#, we therefore require
#math562# #tex2html_wrap_indisplay19264# (72)
It may be difficult to maintain this requirement through out the iteration. This problem can be avoided in the following DFP and BFGS methods.

The <#1731#>BFGS (Broyden-Fletcher-Goldfarb-Shanno)<#1731#> algorithm

This is a rank 2 algorithm in which matrix #tex2html_wrap_inline7810# is updated by two rank-1 terms:

#math563# #tex2html_wrap_indisplay19267# (73)

Imposing the secant condition for #tex2html_wrap_inline7812#, we get

#math564# #tex2html_wrap_indisplay19270# (74)

#math565# #tex2html_wrap_indisplay19272# (75)

which can be satisfied if we let

#math566# #tex2html_wrap_indisplay19274# (76)

Substituting these into #math567##tex2html_wrap_inline7814# we get

#math568# #tex2html_wrap_indisplay19277# (77)

Given this update formula for #tex2html_wrap_inline7816#, we can further find its inverse #math569##tex2html_wrap_inline7818# and then the search direction #math570##tex2html_wrap_inline7820#.

Alternatively, given #tex2html_wrap_inline7822#, we can further derive an update formula directly for the inverse matrix #math571##tex2html_wrap_inline7824#, so that the search direction #tex2html_wrap_inline7826# can be obtained without carrying out the inversion computation. Specifically, we first define #math572##tex2html_wrap_inline7828# and #math573##tex2html_wrap_inline7830#, where

#math574# #tex2html_wrap_indisplay19287# (78)

so that the expression above can be written as:

#math575# #tex2html_wrap_indisplay19289# (79)

and then apply the Sherman-Morrison formula to get:

#math576#

#tex2html_wrap_indisplay19291#	#tex2html_wrap_indisplay19292#	#tex2html_wrap_indisplay19293#
	#tex2html_wrap_indisplay19294#	#tex2html_wrap_indisplay19295#
	#tex2html_wrap_indisplay19296#	#tex2html_wrap_indisplay19297#	(80)

where we have defined

#math577# #tex2html_wrap_indisplay19299# (81)

with

#math578# #tex2html_wrap_indisplay19301# (82)

#math579# #tex2html_wrap_indisplay19303# (83)

#math580# #tex2html_wrap_indisplay19305# (84)

#math581# #tex2html_wrap_indisplay19307# (85)

#math582# #tex2html_wrap_indisplay19309# (86)

Substituting this #tex2html_wrap_inline7832# into the expression for #math583##tex2html_wrap_inline7834#, we get:

#math584#

#tex2html_wrap_indisplay19313#	#tex2html_wrap_indisplay19314#	#tex2html_wrap_indisplay19315#
	#tex2html_wrap_indisplay19316#	#tex2html_wrap_indisplay19317#	(87)

where

#math585# #tex2html_wrap_indisplay19319# (88)

#math586# #tex2html_wrap_indisplay19321# (89)

#math587#

#tex2html_wrap_indisplay19323#	#tex2html_wrap_indisplay19324#	#tex2html_wrap_indisplay19325#
	#tex2html_wrap_indisplay19326#	#tex2html_wrap_indisplay19327#	(90)

Substituting these three terms back into Eq. (#BFGSinverseBexpression#2181>), we get the update formula for #math588##tex2html_wrap_inline7836#:

#math589# #tex2html_wrap_indisplay19330# (91)

In summary, #math590##tex2html_wrap_inline7838# can be found by either Eq. (#BFGSupdateB#2215>) for #tex2html_wrap_inline7840# followed by an additional inversion operation, or Eq. (#BFGSupdateInvB#2218>) directly for #math591##tex2html_wrap_inline7842#. The results obtained by these methods are equivalent.

The <#2222#>DFP (Davidon-Fletcher-Powell)<#2222#> Algorithm
Same as the BFGS algorithm, the DFP algorithm is also a rank 2 algorithm, where instead of #tex2html_wrap_inline7844#, the inverse matrix #math592##tex2html_wrap_inline7846# is updated by two rank-1 terms:
#math593# #tex2html_wrap_indisplay19337# (92)
where #tex2html_wrap_inline7848# and #tex2html_wrap_inline7850# are real scalars, #tex2html_wrap_inline7852# and #tex2html_wrap_inline7854# are vectors. Imposing the secant condition for the inverse of #tex2html_wrap_inline7856#, we get
#math594# #tex2html_wrap_indisplay19344# (93)
or
#math595# #tex2html_wrap_indisplay19346# (94)
which can be satisfied if we let
#math596# #tex2html_wrap_indisplay19348# (95)
Substituting these into #math597##tex2html_wrap_inline7858# we get
#math598# #tex2html_wrap_indisplay19351# (96)
Following the same procedure used for the BFGS method, we can also obtain the update formula of matrix #tex2html_wrap_inline7860# for the DFP method:
#math599# #tex2html_wrap_indisplay19354# (97)

We note that Eqs. (#DFPupdateInvB#2347>) and (#BFGSupdateB#2348>) form a duality pair, and Eqs. (#DFPupdateB#2349>) and (#BFGSupdateInvB#2350>) form another duality pair. In other words, the BFGS and FDP methods are dual of each other.

In both the DFP and BFGS methods, matrix #math600##tex2html_wrap_inline7862#, as well as #tex2html_wrap_inline7864#, must be positive definite, i.e., #math601##tex2html_wrap_inline7866# must hold for any #math602##tex2html_wrap_inline7868#. We now prove that this requirement is satisfied if the curvature condition of the Wolfe conditions is satisfied:

#math603# #tex2html_wrap_indisplay19360# (98)

Replacing the search direction #tex2html_wrap_inline7870# by #math604##tex2html_wrap_inline7872#, we get:

#math605# #tex2html_wrap_indisplay19364# (99)

We also have

#math606# #tex2html_wrap_indisplay19366# (100)

Subtracting the first equation from the second, we get

#math607# #tex2html_wrap_indisplay19368# (101)

The last inequality is due to the fact that#math608##tex2html_wrap_inline7874# and #tex2html_wrap_inline7876#. We therefore have

#math609# #tex2html_wrap_indisplay19372# (102)

Given #math610##tex2html_wrap_inline7878#, we can further prove by induction that both #tex2html_wrap_inline7880# and #math611##tex2html_wrap_inline7882# based respectively on the update formulae in Eqs. (#BFGSupdateB#2431>) and (#DFPupdateInvB#2432>) are positive definite. We first assume #tex2html_wrap_inline7884# and #tex2html_wrap_inline7886# are both positive definite, and express #tex2html_wrap_inline7888# in terms of its Cholesky decomposition, #math612##tex2html_wrap_inline7890#. We further define #math613##tex2html_wrap_inline7892#, #math614##tex2html_wrap_inline7894#, and get

#math615# #tex2html_wrap_indisplay19383# (103)

Now we can show that the sum of the first and third terms of Eq. (#BFGSupdateInvB#2461>) is positive definite:

#math616# #tex2html_wrap_indisplay19385# (104)

The last step is due to the Cauchy-Schwarz inequality. Also, as #math617##tex2html_wrap_inline7896#, the second term of Eq. (#BFGSupdateB#2491>) is positive definite

#math618# #tex2html_wrap_indisplay19388# (105)

Combining these two results, we get

#math619# #tex2html_wrap_indisplay19390# (106)

i.e., #tex2html_wrap_inline7898# based on the update formula Eq. (#BFGSupdateB#2524>) is positive definite. Following the same steps, we can also show that #math620##tex2html_wrap_inline7900# based on Eq. (#DFPupdateInvB#2528>) is positive definite.

<#2529#>Examples:<#2529#>

Fig. #RosenbrockKSR1#2530> shows the search path of the DFP and BPGS methods when applied to find the minimum of the Rosenbrock function.

**<#19393#>Figure<#19393#> 2.11:** <#19394#>SR1 based on #tex2html_wrap_inline7902#<#19394#>
[width=70mm]<#2532#>figures/RosenbrockSR1.eps<#2532#>

**<#19398#>Figure<#19398#> 2.12:** <#19399#>DFP based on #tex2html_wrap_inline7904#<#19399#>
[width=70mm]<#2538#>figures/RosenbrockDFP.eps<#2538#>

**<#19403#>Figure<#19403#> 2.13:** <#19404#>BFGS based on #tex2html_wrap_inline7906#<#19404#>
[width=70mm]<#2544#>figures/RosenbrockBFGS.eps<#2544#>

#tex2html_wrap_indisplay19247#	#tex2html_wrap_indisplay19248#	#tex2html_wrap_indisplay19249#
	#tex2html_wrap_indisplay19250#	#tex2html_wrap_indisplay19251#	(69)