The goal of mathematical optimization (or mathematical programming) is to solve an optimization problem of a given real multivariate function by searching its domain, an N-dimensional space, to find the optimal point at which the corresponding function value is either maximized or minimized. As maximizing is equivalent to minimizing , we can only consider minimization problems. This function is called an objective or cost function.
i.e., | (1) |
For example, in machine learning, we need to carry out regression analysis and classification, and we may need to develop a model to fit a set of observed data points , by estimating the parameters in of the function. Typically the function is assumed to be known, such as the linear model with parameters for the slope and for the intercept, to be estimated based on the observed data.
Different methods can be used to solve this parameter estimation problem. If the least square method is used, the objective function can be the squared error, the difference between the model and the given data, which is to be minimized:
(2) |
(3) |
If an optimization problem is unconstrained, and the analytical expression of the objective function is explicitly given, then the solution at which the objective function is minimized may be found by solving the following equation system obtained by setting the gradient of the to be zero:
(4) |
Whether the function value at this stationary point is a maximum, minimum, or saddle point, depending on its Hessian matrix for its second order derivatives:
For example, the gradient vectors and Hessian matrices of three functions , , and :
The problems of minimization and equation solving are essentially the same as they can be converted to each other.
(5) |
This equation system can be solved by any of the methods discussed previously, such as the Newton-Raphson method, which finds the root of a general equation by the iteration . Here, specifically, to solve the equation , we first get the Jacobian of the gradient of , which is the Hessian of , and then carry out the iteration below to eventually find that minimizes :
(6) |
(7) |
(8) |
(9) |