The goal of mathematical optimization (or mathematical programming)
is to solve an optimization problem of a given real multivariate
function
by searching its domain,
an N-dimensional space, to find the optimal point
at which the
corresponding function value
is either maximized or
minimized. As maximizing
is equivalent to minimizing
, we can only consider minimization problems. This
function
is called an objective or
cost function.
(1) |
For example, in machine learning, we need to carry out regression
analysis and classification, and we may need to develop a model
to fit a set of observed data points
, by estimating
the parameters in
of the function.
Typically the function is assumed to be known, such as the linear
model
with parameters
for the slope and
for
the intercept, to be estimated based on the observed data.
Different methods can be used to solve this parameter estimation problem. If the least square method is used, the objective function can be the squared error, the difference between the model and the given data, which is to be minimized:
(2) |
(3) |
If an optimization problem is unconstrained, and the analytical
expression of the objective function is explicitly given, then the
solution at which the objective function
is minimized may be found by solving the following equation system
obtained by setting the gradient of the
to be zero:
(4) |
Whether the function value
at this stationary point
is a maximum, minimum, or saddle point, depending on its Hessian
matrix for its second order derivatives:
For example, the gradient vectors and Hessian matrices of three
functions
,
,
and
:
The problems of minimization and equation solving are essentially the same as they can be converted to each other.
(5) |
This equation system can be solved by any of the methods discussed previously,
such as the Newton-Raphson method,
which finds the root of a general equation
by the
iteration
.
Here, specifically, to solve the equation
, we
first get the Jacobian
of the gradient
of
, which is the Hessian of
,
and then carry out the iteration below to eventually find
that
minimizes
:
(6) |
(7) |
(8) |
(9) |