Optimization

Many machine learning problems can be formulated as a certain kind of optimization problem. The goal of many algorithms in either regression or classification is to maximize or minimize an <#3#>objective<#3#> function of a set of model parameters so that the model fits the observed dataset optimally in certain sense.

Consider, for example, the construction of a model #math342##tex2html_wrap_inline7040# parameterized by #math343##tex2html_wrap_inline7042# to fit a set of observed data points called the <#7#>training dataset<#7#> or simply <#8#>training set<#8#> denoted by #math344##tex2html_wrap_inline7044#. While the form of the model function is typically assumed to be known, such as a linear or quadratic function, or certain probability density function (pdf) of the data, the parameter #math345##tex2html_wrap_inline7046# is unknown and to be estimated based on the training set #tex2html_wrap_inline7048#, by some methods such as the following widely used in machine learning:

<#14#>Least squares estimation (LSE)<#14#>:
The sum of squared error (SSE) between the model output and the desired output according to the training set is to be minimized:
#math346# #tex2html_wrap_indisplay18634# (1)
<#21#>Maximum likelihood estimate (MLE)<#21#>:
The likelihood of the parameter #math347##tex2html_wrap_inline7050#, proportional to the conditional probability of observing the data in #tex2html_wrap_inline7052# given a specific parameter #math348##tex2html_wrap_inline7054#, is to be maximized:
#math349# #tex2html_wrap_indisplay18639# (2)

Solving these optimization problems we get the optimal model parameter #math350##tex2html_wrap_inline7056# for the model that best fits the observed data.

To prepare for the discussions of these algorithms in the future chapters, here in this chapter we consider a set of optimization algorithms for optimizing a generic real valued multivariate function #math351##tex2html_wrap_inline7058#, by searching its domain, the N-D space #tex2html_wrap_inline7060# spanned by #math352##tex2html_wrap_inline7062#, for the optimal solution #math353##tex2html_wrap_inline7064# that either maximize or minimize #tex2html_wrap_inline7066#. As maximizing #tex2html_wrap_inline7068# is equivalent to minimizing #tex2html_wrap_inline7070#, we typically consider only minimization problems:

#math354# #tex2html_wrap_indisplay18649# (3)

If the independent variable #tex2html_wrap_inline7072# is allowed to take any value in the entire function domain #tex2html_wrap_inline7074#, then such an optimization problem is <#54#>unconstrained<#54#>, otherwise if #tex2html_wrap_inline7076# must be inside a subset of #tex2html_wrap_inline7078#, called the <#57#>feasible region<#57#>, the problem is <#58#>constrained<#58#>. Both types of optimization problems occur in machine learning.