Interior Point Methods

Consider the a general constrained optimization problem:

#math1025#   #tex2html_wrap_indisplay20561# (228)
By introducing slack variables #math1026##tex2html_wrap_inline9432#, all inequality constraints can be converted into equality constraints:
#math1027#   #tex2html_wrap_indisplay20564# (229)
which can then be combined with the equality constraints and still denoted by #math1028##tex2html_wrap_inline9434#, while the slack variables #tex2html_wrap_inline9436# are also combined with the original variables and still denoted by #tex2html_wrap_inline9438#. We assume there are in total #tex2html_wrap_inline9440# equality constraints and #tex2html_wrap_inline9442# variables. Now the optimization problem can be reformulated as:
#math1029#   #tex2html_wrap_indisplay20571# (230)
The Lagrangian is
#math1030#   #tex2html_wrap_indisplay20573# (231)
The KKT conditions are
#math1031#   #tex2html_wrap_indisplay20575# (232)
where
#math1032#   #tex2html_wrap_indisplay20577# (233)
and #math1033##tex2html_wrap_inline9444# is the Jacobian matrix of #math1034##tex2html_wrap_inline9446#:
#math1035#   #tex2html_wrap_indisplay20581# (234)
To simplify the problem, the non-negativity constraints can be removed by introducing an indicator function:
#math1036#   #tex2html_wrap_indisplay20583# (235)
which, when used as a cost function, penalizes any violation of the non-negativity constraint #tex2html_wrap_inline9448#. Now the optimization problem can be reformulated as shown below without the non-negative constants #math1037##tex2html_wrap_inline9450#:
#math1038#   #tex2html_wrap_indisplay20587# (236)
However, as the indicator function is not smooth and therefore not differentiable, it is approximated by the <#5643#>logarithmic barrier function<#5643#> which approaches #tex2html_wrap_inline9452# when the parameter #tex2html_wrap_inline9454# approaches infinity:
#math1039#   #tex2html_wrap_indisplay20591# (237)

<#20592#>Figure<#20592#> 2.26: <#20593#>Logarithmic Barrier Function<#20593#>
[width=90mm]<#5651#>figures/LogBarrier.eps<#5651#>

Now the optimization problem can be written as:

#math1040#   #tex2html_wrap_indisplay20596# (238)
The Lagrangian is:
#math1041#   #tex2html_wrap_indisplay20598# (239)
and its gradient is:
#math1042#   #tex2html_wrap_indisplay20600# (240)
We define
#math1043#   #tex2html_wrap_indisplay20602# (241)
and combine this equation with the KKT conditions of the new optimization problem to get
#math1044#   #tex2html_wrap_indisplay20604# (242)
The third condition is actually from the definition of #math1045##tex2html_wrap_inline9456# given above, but here it is labeled as complementarity, so that these conditions can be compared with the KKT conditions of the original problem. We see that the two sets of KKT conditions are similar to each other, but with the following differences:
  • The non-negativity #math1046##tex2html_wrap_inline9458# in the primal feasibility of the original KKT conditions is dropped;
  • Consequently the inequality #math1047##tex2html_wrap_inline9460# in the dual feasibility of the original KKT conditions is also dropped;
  • There is an extra term #tex2html_wrap_inline9462# in complementarity which will vanish when #math1048##tex2html_wrap_inline9464#.

The optimal solution that minimizes the objective function #tex2html_wrap_inline9466# can be found by solving the simultaneous equations in the modified KKT conditions (with no inequalities) given above. To do so, we first express the equations in the KKT conditions above as a nonlinear equation system #math1049##tex2html_wrap_inline9468#, and find its Jacobian matrix #math1050##tex2html_wrap_inline9470# composed of the partial derivatives of the function #math1051##tex2html_wrap_inline9472# with respect to each of the three variables #tex2html_wrap_inline9474#, #math1052##tex2html_wrap_inline9476#, and #math1053##tex2html_wrap_inline9478#:

#math1054#   #tex2html_wrap_indisplay20618# (243)
where
#math1055#
#tex2html_wrap_indisplay20620# #tex2html_wrap_indisplay20621# #tex2html_wrap_indisplay20622#  
  #tex2html_wrap_indisplay20623# #tex2html_wrap_indisplay20624# (244)
is symmetric as the Hessian matrices #tex2html_wrap_inline9480# and #math1056##tex2html_wrap_inline9482# are symmetric.

We can now use the Newton-Raphson method to find the solution of the nonlinear equation #math1057##tex2html_wrap_inline9484# iteratively:

#math1058#   #tex2html_wrap_indisplay20629# (245)
where #tex2html_wrap_inline9486# is a parameter that controls the step size and #math1059##tex2html_wrap_inline9488# is the search direction (<#5858#>Newton direction<#5858#>), which can be found by solving the following equation:
#math1060#
    #tex2html_wrap_indisplay20633#  
  #tex2html_wrap_indisplay20634# #tex2html_wrap_indisplay20635# (246)
To get the initial values for the iteration, we first find any point inside the feasible region #tex2html_wrap_inline9490# and then #math1061##tex2html_wrap_inline9492#, based on which we further find #math1062##tex2html_wrap_inline9494# by solving the first equation in the KKT conditions in Eq. (#KKTconditionsIP#5916>).

Actually this equation system above can be separated into two subsystems, which are easier to solve. Pre-multiplying #tex2html_wrap_inline9496# to the third equation:

#math1063#   #tex2html_wrap_indisplay20641# (247)
we get
#math1064#   #tex2html_wrap_indisplay20643# (248)
Adding this to the first equation:
#math1065#   #tex2html_wrap_indisplay20645# (249)
we get
#math1066#   #tex2html_wrap_indisplay20647# (250)
Combining this equation with the second one, we get an equation system of two vector variables #tex2html_wrap_inline9498# and #math1067##tex2html_wrap_inline9500# with symmetric coefficient matrix:
#math1068#   #tex2html_wrap_indisplay20651# (251)
Solving this equation system we get #tex2html_wrap_inline9502# and #math1069##tex2html_wrap_inline9504#, and we can further find #math1070##tex2html_wrap_inline9506# by solving the third equation
#math1071#   #tex2html_wrap_indisplay20656# (252)
to get
#math1072#   #tex2html_wrap_indisplay20658# (253)

We now consider the two special cases of linear and quadratic programming:

  • <#6036#>Linear programming<#6036#>:

    #math1073#   #tex2html_wrap_indisplay20660# (254)
    We have
    • #math1074##tex2html_wrap_inline9508#
    • #math1075##tex2html_wrap_inline9510#.
    • #math1076##tex2html_wrap_inline9512#.
    The Lagrangian is
    #math1077#   #tex2html_wrap_indisplay20665# (255)
    The KKT conditions are:
    #math1078#   #tex2html_wrap_indisplay20667#  
    The search direction of the iteration can be found by solving the this equation system:
    #math1079#   #tex2html_wrap_indisplay20669# (256)
    By solving the equation #math1080##tex2html_wrap_inline9514# we get the initial value #math1081##tex2html_wrap_inline9516#. The Matlab code for the interior point method for the LP problem is listed below:

    verbatim340#

    <#6155#>Example:<#6155#>

    A given linear programming problem shown below is first converted into the standard form:

    #math1082#   #tex2html_wrap_indisplay20673#  
    or in matrix form:
    #math1083#   #tex2html_wrap_indisplay20675#  
    with
    #math1084#   #tex2html_wrap_indisplay20677#  
    where #math1085##tex2html_wrap_inline9518# are the slack variables.

    <#20679#>Figure<#20679#> 2.27: <#20680#>Interior Point Method<#20680#>
    [width=70mm]<#6211#>figures/InteriorPointLPexample.eps<#6211#>

    We choose an initial value #math1086##tex2html_wrap_inline9520#, and an initial parameter #tex2html_wrap_inline9522#, which is scaled up by a factor of #tex2html_wrap_inline9524# in each iteration. We also used full Newton step size of #tex2html_wrap_inline9526#. After 8 iterations, the algorithm converged to the optimal solution #math1087##tex2html_wrap_inline9528# corresponding to the maximum function value #math1088##tex2html_wrap_inline9530#:

    #math1089#   #tex2html_wrap_indisplay20689#  
    At the end of the iteration, we also get the values of the slack variables: #math1090##tex2html_wrap_inline9532#.

  • <#6225#>Quadratic programming:<#6225#>

    #math1091#   #tex2html_wrap_indisplay20692# (257)
    We have
    • #math1092##tex2html_wrap_inline9534#
    • #math1093##tex2html_wrap_inline9536#.
    • #math1094##tex2html_wrap_inline9538#.
    The Lagrangian is
    #math1095#   #tex2html_wrap_indisplay20697# (258)
    The KKT conditions are:
    #math1096#   #tex2html_wrap_indisplay20699#  
    The equation system for the Newton-Raphson method is:
    #math1097#   #tex2html_wrap_indisplay20701# (259)

The Matlab code for the interior point method for the QP problem is listed below:

verbatim341#

<#6352#>Example:<#6352#>

A 2-D quadratic programming problem shown below is to minimize four different quadratic functions under the same linear constraints as the linear programming problem in the previous example. In general, the quadratic function is given in the matrix form:

#math1098#   #tex2html_wrap_indisplay20703#  
with following four different sets of function parameters:
#math1099#   #tex2html_wrap_indisplay20705#  

#math1100#   #tex2html_wrap_indisplay20707#  
As shown in the four panels in the figure below, the iteration of the interior point algorithm brings the solution from the same initial position at #math1101##tex2html_wrap_inline9540# to the final position for each of the four objective functions #tex2html_wrap_inline9542#, the optimal solution, which is on the boundary of the feasible region in all cases except the third one, where the optimal solution is inside the feasible region, i.e., the optimization problem is not constrained. Also note that the trajectories of the solution during the iteration go straightly from the initial guess to the final optimal solution by following the negative gradient direction of the quadratic function in all cases except in the last one, where it makes a turn while approaching the vertical boundary, due obviously to the significantly greater value of the log barrier function close to the boundary.

<#20710#>Figure<#20710#> 2.28: <#20711#>Interior Point Example<#20711#>
[width=100mm]<#6412#>figures/InteriorPointQPexample.eps<#6412#>

We also note that the dual problem in the SVM algorithm in Eq. (#SVMdual3#6416>) is in the same form as in Eq. (#QPproblem#6417>), i.e., it is a quadratic programming problem that can be solved by the interior point algorithm.

In fact, the equation system of three variable #math1102##tex2html_wrap_inline9544# above can be separated into two independent systems, the first one for #math1103##tex2html_wrap_inline9546#, while the second one for #math1104##tex2html_wrap_inline9548#. To see this, we first left multiply #tex2html_wrap_inline9550# on both sides of the third equation:

#math1105#
    #tex2html_wrap_indisplay20718#  
  #tex2html_wrap_indisplay20719# #tex2html_wrap_indisplay20720#  
and then add this to the first equation:
#math1106#   #tex2html_wrap_indisplay20722#  

#math1107#   #tex2html_wrap_indisplay20724#  

where #math1108##tex2html_wrap_inline9552#, called the <#6499#>Newton direction<#6499#>, can be found by solving the linear equation system #math1109##tex2html_wrap_inline9554#. Here we will first express all equations in the KKT conditions as #math1110##tex2html_wrap_inline9556#, and find its Jacobian #math1111##tex2html_wrap_inline9558#. Then the Newton direction #math1112##tex2html_wrap_inline9560# can be found by solving the following linear system:

we solve these nonlinear equations in the KKT conditions with #math1113##tex2html_wrap_inline9562#, #math1114##tex2html_wrap_inline9564# as well as #tex2html_wrap_inline9566# treated as variables by ///

We first consider using the <#6518#>central path<#6518#> method to solve the following inequality constrained optimization problem:

#math1115#   #tex2html_wrap_indisplay20734#  
Here the inequality constraints are expressed in vector form in terms of the vector function #math1116##tex2html_wrap_inline9568#. The Lagrangian of the problem is
#math1117#   #tex2html_wrap_indisplay20737# (260)
where #math1118##tex2html_wrap_inline9570# is for the #tex2html_wrap_inline9572# multipliers, and #math1119##tex2html_wrap_inline9574# is a diagonal matrix of the #tex2html_wrap_inline9576# constraint functions. The KKT conditions for #tex2html_wrap_inline9578# and #math1120##tex2html_wrap_inline9580# to be the solution of this problem are:
#math1121#   #tex2html_wrap_indisplay20745#  

#math1122#   #tex2html_wrap_indisplay20747#  
where #tex2html_wrap_inline9582#, called the <#6597#>logarithmic barrier function<#6597#>, is defined as:
#math1123#   #tex2html_wrap_indisplay20750#  
which is smooth and differentiable. To minimize this new objective function we set its gradient to zero:
#math1124#   #tex2html_wrap_indisplay20752#  
We further define:
#math1125#   #tex2html_wrap_indisplay20754#  
Here #tex2html_wrap_inline9584# is non-negative because #tex2html_wrap_inline9586# and #math1126##tex2html_wrap_inline9588#. These #tex2html_wrap_inline9590# equations can be expressed in matrix form:
#math1127#   #tex2html_wrap_indisplay20760#  
Then the above becomes
#math1128#   #tex2html_wrap_indisplay20762# (261)
The corresponding dual Lagrangian turns out to take the same form as that of the original given above:
#math1129#   #tex2html_wrap_indisplay20764# (262)
and the solution #math1130##tex2html_wrap_inline9592# that minimizes #math1131##tex2html_wrap_inline9594# has to satisfy the KKT conditions, the above two equations of both variables #tex2html_wrap_inline9596# and #math1132##tex2html_wrap_inline9598#:
#math1133#   #tex2html_wrap_indisplay20770# (263)
Comparing these with the KKT conditions of the original problem, we see they are very similar, except the following differences:
  • The inequality #math1134##tex2html_wrap_inline9600# in the primal feasibility of the original KKT is dropped;
  • The inequality #math1135##tex2html_wrap_inline9602# in the dual feasibility of the original KKT conditions is dropped as #tex2html_wrap_inline9604# and #math1136##tex2html_wrap_inline9606#;
  • There is an extra term #tex2html_wrap_inline9608# in complementarity. However, it will vanish if we let #math1137##tex2html_wrap_inline9610#.

The given inequality-constrained optimization problem can now be be solved by the central path method based on the new KKT conditions without any inequality constraints. It starts from an initial point #tex2html_wrap_inline9612# inside the feasible region with an initial #tex2html_wrap_inline9614#, and approaches the solution on the boundary iteratively. In each iteration (outer), the value of the parameter #tex2html_wrap_inline9616# is increased from the previous value, and the best solution #tex2html_wrap_inline9618# corresponding to this #tex2html_wrap_inline9620# is obtained by solving the two equations given above by the iteration (inner) of Newton-Raphson's method. Eventually, when #math1138##tex2html_wrap_inline9622#, #tex2html_wrap_inline9624# approaches some point on the boundary, at which #tex2html_wrap_inline9626# is minimized subject to the inequality constraints.

Specifically, in each step of the outer iteration, we find the increments #tex2html_wrap_inline9628# and #math1139##tex2html_wrap_inline9630# by solving the following linear equation system:

#math1140#   #tex2html_wrap_indisplay20788# (264)
where the 2 by 2 block matrix on the left is the Jacobian of the function combining the two functions above, with respect to both variables #tex2html_wrap_inline9632# and #math1141##tex2html_wrap_inline9634#.

We then update the two variables to get #math1142##tex2html_wrap_inline9636# and #math1143##tex2html_wrap_inline9638# based on their previous values #tex2html_wrap_inline9640# and #math1144##tex2html_wrap_inline9642#. The iteration will approach the solutions #tex2html_wrap_inline9644# and #math1145##tex2html_wrap_inline9646# (both are functions of parameter #tex2html_wrap_inline9648#) that minimize the dual Lagrangian #math1146##tex2html_wrap_inline9650# corresponding to the specific #tex2html_wrap_inline9652#:

#math1147#   #tex2html_wrap_indisplay20801# (265)
The difference #math1148##tex2html_wrap_inline9654# is the duality gap which approaches zero when #math1149##tex2html_wrap_inline9656#.

The duality relationship is useful as sometime solving the dual problem may be easier than solving the primal one. Even if the duality gap is not zero, by solving the dual problem, we can still get the tightest bound for the true solution, either the maximal lower bound of a minimization problem or the minimal upper bound, as a good estimate of the true solution, when it is unavailable.