Solving Equations

In this chapter we consider methods for solving equation systems, which are closely related to the methods for optimization to be discussed in Chapter #optimization#3>, which is of essential importance in machine learning as many of the algorithms are basically to optimize some <#4#>objective function<#4#>, e.g., either minimize the least squared error or maximize the likelihood of some estimated parameters of a certain learning model.

In general, we consider various methods for solving an equation system of #tex2html_wrap_inline2260# simultaneous equations of #tex2html_wrap_inline2262# variables:

#math136# #tex2html_wrap_indisplay6020# (1)

where #math137##tex2html_wrap_inline2264# is an N-D column vector, a point in the N-D vector space #tex2html_wrap_inline2266# spanned by #math138##tex2html_wrap_inline2268#, and #math139##tex2html_wrap_inline2270# is a scalar valued multivariable function of all #tex2html_wrap_inline2272# variables in #tex2html_wrap_inline2274#. We further define a vector valued function #math140##tex2html_wrap_inline2276# as an M-D column vector, and represent the equation system more concisely as #math141##tex2html_wrap_inline2278#. The purpose is to find a solution #math142##tex2html_wrap_inline2280# that satisfies all #tex2html_wrap_inline2282# equations. If such a solution does not exisit, we would still like to find an approximate solution by which certain error (e.g., #math143##tex2html_wrap_inline2284#) is minimized.

Geometrically, a function #tex2html_wrap_inline2286# defined over this space #tex2html_wrap_inline2288# is a curve if #tex2html_wrap_inline2290#, a surface if #tex2html_wrap_inline2292#, or a <#27#>hypersurface<#27#> if #tex2html_wrap_inline2294#, in an N+1 dimensional space, of which the N+1st dimension represents the function value #tex2html_wrap_inline2296# corresponding to a point #tex2html_wrap_inline2298# in the N-D space. The <#30#>roots<#30#> or <#31#>zeros<#31#> of function #tex2html_wrap_inline2300#, i.e., the solutions of equation #tex2html_wrap_inline2302#, are all the points on a hypercurve in the N-D space, as the intersection of the hypersurface #tex2html_wrap_inline2304# and the hyperplane #tex2html_wrap_inline2306#, if they do intersect.

For example, in the special case of #tex2html_wrap_inline2308#, functions #tex2html_wrap_inline2310# and #tex2html_wrap_inline2312# are two surfaces defined over the 2-D space spanned by #tex2html_wrap_inline2314# and #tex2html_wrap_inline2316#, and the roots of each of the two functions are on a curve as the intersection of the corresponding surface and the 2-D space. The solutions of the system composed of the two equations #tex2html_wrap_inline2318# and #tex2html_wrap_inline2320# are the intersection of these two curves, if they do intersect, otherwise no solution exists.

<#35#>Example:<#35#> Consider a simultaneous equation system with #tex2html_wrap_inline2322#:

#math144# #tex2html_wrap_indisplay6052#

The first function #tex2html_wrap_inline2324# is a parabolic cone in the #tex2html_wrap_inline2326# dimensional space centrally symmetric to the vertical axis, and its roots form a circle #tex2html_wrap_inline2328# on the 2-D plane spanned by #tex2html_wrap_inline2330# and #tex2html_wrap_inline2332# centered at the origin #tex2html_wrap_inline2334# with radius #tex2html_wrap_inline2336#; the second function #tex2html_wrap_inline2338# is a plane in the 3-D space through the origin, and its roots form a straight line #tex2html_wrap_inline2340# on the 2-D plane. The solutions of the equation system are where the two curves intersect:

If #tex2html_wrap_inline2342#, there are two solutions #tex2html_wrap_inline2344# and #tex2html_wrap_inline2346#;
If #tex2html_wrap_inline2348#, there is only one solution #tex2html_wrap_inline2350#;
If #tex2html_wrap_inline2352#, the two curves do not intersect, i.e., no solution exists.