General linear least squares

The problem considered previously can be generalized to the modeling of a linear combination of $M$ functions $f_i(x)$ ( $i=1,\cdots,M$ ) of $x$ .

$\displaystyle y=f(x)=\sum_{j=1}^M a_j f_j(x)$

(50)

We want to find the $M$

unknown parameters $a_j$

. Examples include:

$\displaystyle y=f(x)=a_1+a_2x+a_3x^2+a_4x^3+\cdots+a_Mx^{M-1}$

(51)

with $f_j(x)=x^{j-1}$ , and

$\displaystyle y=f(x)=a_1+a_2\cos(\omega x)+a_3\cos(2\omega x)+a_4\cos(3\omega x)+a_5\cos(4\omega x)$

(52)

with $f_j=\cos((j-1)\omega x)$ .

Given a set of $N$ data points $\{(x_i, y_i),\; i=1,\cdots,N\}$ we get $N$ equations:

$\displaystyle y_i=\sum_{j=1}^M a_j f_j(x_i)+e_i,\;\;\;\;\;(i=1,\cdots,N)$

(53)

which can be written in vector form as

$\displaystyle {\bf y}={\bf B}{\bf a}+{\bf e}$

(54)

where

$\displaystyle {\bf y}=\left[\begin{array}{c}y_1\\ \vdots\\ \vdots\\ y_N\end{arr... ...\left[\begin{array}{c}e_1\\ \vdots\\ \vdots\\ e_N\end{array}\right]_{N\times 1}$

(55)

and

$\displaystyle {\bf B}=\left[\begin{array}{ccc}b_{11}&\cdots&b_{1M}\\ \vdots&\dd... ... \vdots&\ddots&\vdots\\ f_1(x_N)&\cdots&f_M(x_N)\end{array}\right]_{N\times M}$

(56)

where

$\displaystyle b_{ij}=f_j(x_i),\;\;\;\;\;(i=1,\cdots,N,\;j=1,\cdots,M)$

(57)

Typically $N\gg M$ , i.e., the problem is over-constrained with $N$

equations but only $M$

unknowns. The LS-error is

$\displaystyle \varepsilon({\bf a})=\sum_{i=1}^N e_i^2 ={\bf e}^T{\bf e}=({\bf y}-{\bf B}{\bf a})^T({\bf y}-{\bf B}{\bf a})$

(58)

To find the $M$

optimal parameters ${\bf a}=[a_1,\cdots,a_M]^T$ that will minimize $\varepsilon({\bf a})$ , we set the derivative of the LS-error with respective to ${\bf a}$ to zero and get

$\displaystyle \frac{d}{d {\bf a}}\varepsilon({\bf a})$	$\displaystyle =$	$\displaystyle \frac{d}{d {\bf a}} \left[({\bf y}-{\bf Ba})^T({\bf y}-{\bf Ba})\... ...y}^T{\bf B}{\bf a} +{\bf a}^T{\bf B}^T{\bf y}+{\bf a}^T{\bf B}^T{\bf B}{\bf a})$
	$\displaystyle =$	$\displaystyle -2{\bf B}^T{\bf y}+2{\bf B}^T{\bf B}{\bf a}={\bf0}$

From this we can get the normal equation

$\displaystyle {\bf B}^T{\bf B}{\bf a} = {\bf B}^T{\bf y}$

(59)

This linear equation system with a symmetric and positive-definite coefficient matrix ${\bf B}^T{\bf B}$ can be solved by the conjugategradient (CG) method in $M$

iterations.

Alternatively, we can solve the normal equation for ${\bf a}$ to get

$\displaystyle {\bf a}=({\bf B}^T{\bf B})^{-1}{\bf B}^T {\bf y}={\bf B}^-{\bf y}$

(60)

where ${\bf B}^-$ is the pseudo-inverse of a non-square matrix ${\bf B}_{N\times M}$ defined as

$\displaystyle {\bf B}^-=({\bf B}^T{\bf B})^{-1}{\bf B}^T$

(61)

The pseudo-inverse of an $M\times N$ matrix ${\bf A}$ can also be found based on the singular value decomposition (SVD) of ${\bf A}$

$\displaystyle {\bf A}={\bf U}{\bf\Lambda}^{1/2}{\bf V}^T$

(62)

where ${\bf U}_{M\times M}$ and ${\bf V}_{N\times N}$ are both orthogonal $({\bf U}^T={\bf U}^{-1}$ and ${\bf V}^T={\bf V}^{-1}$ ) composed of the orthogonal eigenvectors of the symmetric matrix ${\bf A}{\bf A}^T$ and ${\bf A}^T{\bf A}$ , respectively, and ${\bf\Lambda}_{M\times N}$ is a diagonal matrix composed of the common eigenvalues of both ${\bf A}{\bf A}^T$ and ${\bf A}^T{\bf A}$ , i.e.,

$\displaystyle {\bf U}^T({\bf A}{\bf A}^T){\bf U}={\bf\Lambda}_{M\times M},\;\;\;\;\;\; {\bf V}^T({\bf A}^T{\bf A}){\bf V}={\bf\Lambda}_{N\times N}$