Duality and KKT Conditions

Previously we consider the constrained minimization problem:

$\displaystyle \left\{
\begin{tabular}{ll}
min: & $f_p({\bf x})$\\
s. t.: &
$\l...
...bf x})\le{\bf0}\;\mbox{or}\;\ge{\bf0}
\end{array} \right.$
\end{tabular}\right.$ (198)

with its Lagrangian:

$\displaystyle L({\bf x},{\bf\lambda},{\bf\mu})
=f_p({\bf x})-{\bf\lambda}^T{\bf h}({\bf x})-{\bf\mu}^T{\bf g}({\bf x})$ (199)

As this is a minimization problem, ${\bf\mu}^T{\bf g}({\bf x})\ge 0$ as indicated in Table 188.

This problem is called a primal problem, as there also exists an equivalent dual problem, which may be easier to solve. We first define the dual function as the infimum (lower bound) of the Lagrangian to be minimized:

$\displaystyle f_d({\bf\lambda},{\bf\mu})
=\inf_{\bf x} L({\bf x},{\bf\lambda},{...
... f_p({\bf x})-{\bf\lambda}^T{\bf h}({\bf x})
-{\bf\mu}^T{\bf g}({\bf x})\right]$ (200)

The optimal ${\bf x}$ that minimizes this Lagrangian can be found by solving the equation $\bigtriangledown_{\bf x}L({\bf x},{\bf\lambda},{\bf\mu})={\bf0}$. Moreover, to find the tightest lower bound, the supremum (upper bound) of the dual function $f_d({\bf\lambda},{\bf\mu})$ over its variables ${\bf\lambda}$ and ${\bf\mu}$, we solve the following dual problem of the original primal problem:

$\displaystyle \left\{
\begin{tabular}{ll}
max: & $f_d({\bf\lambda},{\bf\mu})
=\...
...u})$\\
s. t.: & ${\bf\mu}\le{\bf0}\;\mbox{or}\;\ge{\bf0}$
\end{tabular}\right.$ (201)

Here ${\bf\lambda}$ for the equality constrains of the primal problem can be either positive or negative, i.e., it is not constrained in the dual problem. However, the sign of $\mu$ for the inequalities constraint of the primal minimization problem needs such so that ${\bf\mu}^T{\bf g}({\bf x}\ge 0$ according to Eq. (189), i.e., ${\bf\mu}$ is subject to the inequality constraint of the dual problem. As a typical example, the algorithm of support vector machine in machine learning is a minimization problem, based on a set of $N$ the given data $\{({\bf x}_n,y_n),
n=1,\cdots,N\}$:

$\displaystyle \left\{
\begin{tabular}{ll}
min: & $\frac{1}{2}\vert\vert{\bf w}\...
...:& $y_n ({\bf x}_n^T {\bf w}+b) \ge 1\;\;\;(n=1,\cdots,N)$
\end{tabular}\right.$ (202)

the solution of which is actually obtained by solving its dual problem:

$\displaystyle \left\{
\begin{tabular}{ll}
max: & $\sum_{n=1}^N\alpha_n -\frac{1...
...{\bf\alpha}=0, \;\;\;\; \alpha_n\ge 0\;\;\;(n=1,\cdots,N)$
\end{tabular}\right.$ (203)

We further note that the following holds for any ${\bf x}$ in the feasible regions defined by ${\bf h}({\bf x})={\bf0}$ and ${\bf g}({\bf x})\ge{\bf0}$ and any ${\bf\lambda}$ and ${\bf\mu}$ in its feasible region:

$\displaystyle d^*=f_d({\bf\lambda}^*,{\bf\mu}^*)\le
f_d({\bf\lambda},{\bf\mu})\...
...\bf g}({\bf x})
=f_p({\bf x})-{\bf\mu}^T{\bf g}({\bf x}) \le f_p({\bf x}^*)=p^*$ (204)

This inequality is satisfied by any ${\bf x}$ and ${\bf\lambda},\;{\bf\mu}$, including the solution ${\bf x}^*$ and ${\bf\lambda}^*,\;{\bf\mu}^*$. We denote the minimum of the objective function of the primal problem by $p^*=f_p({\bf x}^*)$, and the maximum of the objective function of the dual problem by $d^*=f_d({\bf\lambda}^*,{\bf\mu}^*)$, then we have:

$\displaystyle d^*=f({\bf x}^*) \le L({\bf x},{\bf\lambda},{\bf\mu})
\le f_p({\bf x}^*)=p^*$ (205)

Similarly, we can also show that a primal maximization problem with solution ${\bf x}^*$ that satisfies $p^*=f({\bf x}^*)\ge f({\bf x})$ has a dual minimization problem with solution ${\bf\lambda}^*,\;{\bf\mu}^*$ that satisfies

$\displaystyle d^*=f_d({\bf\lambda}^*,\;{\bf\mu}^*)\ge f_p({\bf x}^*)=p^*$ (206)

We therefore conclude that the solution of the dual problem always provides the tightest (lower or upper) bound for the solution of the primal (minimization or maximization) problem. The difference $p^*-d^*\ge 0$ for the minimization primal problem or $d^*-p^*\ge 0$ for the maximization primal problem is called the duality gap. If the duality gap is non-zero, i.e., $d^*\ne p^*$, the dual relationship is a weak duality, otherwise it is a strong duality, for which the following holds:
$\displaystyle d^*$ $\displaystyle =$ $\displaystyle f_d({\bf\lambda}^*,{\bf\mu}^*)= L({\bf x}^*,{\bf\lambda}^*,{\bf\m...
...({\bf x}^*)-{\bf\lambda}^{*T}{\bf h}({\bf x}^*)-{\bf\mu}^{*T}{\bf g}({\bf x}^*)$  
  $\displaystyle =$ $\displaystyle f_p({\bf x}^*)-{\bf\mu}^{*T}{\bf g}({\bf x}^*) =f_p({\bf x}^*)=p^*$ (207)

which further implies all following equations must hold:

$\displaystyle \left\{\begin{array}{ll}
\bigtriangledown_{\bf x} L({\bf x}^*,{\b...
... x}^*)=0,\;\;\;\;\;(j=1,\cdots,n) &
\mbox{(complementarity)}
\end{array}\right.$ (208)

These are called the Karush-Kuhn-Tucker (KKT) conditions, the necessary conditions for the solution of a constrained optimization problem. In other words, for any ${\bf x}$ to be a solution, it must satisfy all of the following: