Optimization
Many machine learning problems can be formulated as a
certain kind of optimization problem. The goal of many
algorithms in either regression or classification is to
maximize or minimize an <#3#>objective<#3#> function of a
set of model parameters so that the model fits the
observed dataset optimally in certain sense.
Consider, for example, the construction of a model
#math342##tex2html_wrap_inline7040# parameterized by
#math343##tex2html_wrap_inline7042# to fit a
set of observed data points called the <#7#>training dataset<#7#>
or simply <#8#>training set<#8#> denoted by
#math344##tex2html_wrap_inline7044#.
While the form of the model function is typically assumed
to be known, such as a linear or quadratic function, or
certain probability density function (pdf) of the data,
the parameter #math345##tex2html_wrap_inline7046# is unknown and to be
estimated based on the training set #tex2html_wrap_inline7048#, by some
methods such as the following widely used in machine
learning:
- <#14#>Least squares estimation (LSE)<#14#>:
The sum of squared error (SSE) between the model output
and the desired output according to the training set is
to be minimized:
#math346# #tex2html_wrap_indisplay18634#
(1)
- <#21#>Maximum likelihood estimate (MLE)<#21#>:
The likelihood of the parameter #math347##tex2html_wrap_inline7050#,
proportional to the conditional probability of observing
the data in #tex2html_wrap_inline7052# given a specific parameter
#math348##tex2html_wrap_inline7054#, is to be maximized:
#math349# #tex2html_wrap_indisplay18639#
(2)
Solving these optimization problems we get the optimal model
parameter #math350##tex2html_wrap_inline7056# for the model that best fits
the observed data.
To prepare for the discussions of these algorithms in the future
chapters, here in this chapter we consider a set of optimization
algorithms for optimizing a generic real valued multivariate
function #math351##tex2html_wrap_inline7058#, by searching its domain,
the N-D space #tex2html_wrap_inline7060# spanned by #math352##tex2html_wrap_inline7062#, for
the optimal solution #math353##tex2html_wrap_inline7064# that either
maximize or minimize #tex2html_wrap_inline7066#. As maximizing #tex2html_wrap_inline7068# is
equivalent to minimizing #tex2html_wrap_inline7070#, we typically consider only
minimization problems:
#math354# #tex2html_wrap_indisplay18649#
(3)
If the independent variable #tex2html_wrap_inline7072# is allowed to take any
value in the entire function domain #tex2html_wrap_inline7074#, then such
an optimization problem is <#54#>unconstrained<#54#>, otherwise if
#tex2html_wrap_inline7076# must be inside a subset of #tex2html_wrap_inline7078#, called
the <#57#>feasible region<#57#>, the problem is <#58#>constrained<#58#>.
Both types of optimization problems occur in machine learning.