General linear least squares

The problem considered previously can be generalized to the modeling of a linear combination of $M$ functions $f_i(x)$ ( $i=1,\cdots,M$) of $x$.

$\displaystyle y=f(x)=\sum_{j=1}^M a_j f_j(x)$ (50)

We want to find the $M$ unknown parameters $a_j$. Examples include:

$\displaystyle y=f(x)=a_1+a_2x+a_3x^2+a_4x^3+\cdots+a_Mx^{M-1}$ (51)

with $f_j(x)=x^{j-1}$, and

$\displaystyle y=f(x)=a_1+a_2\cos(\omega x)+a_3\cos(2\omega x)+a_4\cos(3\omega x)+a_5\cos(4\omega x)$ (52)

with $f_j=\cos((j-1)\omega x)$.

Given a set of $N$ data points $\{(x_i, y_i),\; i=1,\cdots,N\}$ we get $N$ equations:

$\displaystyle y_i=\sum_{j=1}^M a_j f_j(x_i)+e_i,\;\;\;\;\;(i=1,\cdots,N)$ (53)

which can be written in vector form as

$\displaystyle {\bf y}={\bf B}{\bf a}+{\bf e}$ (54)

where

$\displaystyle {\bf y}=\left[\begin{array}{c}y_1\\ \vdots\\ \vdots\\ y_N\end{arr...
...\left[\begin{array}{c}e_1\\ \vdots\\ \vdots\\ e_N\end{array}\right]_{N\times 1}$ (55)

and

$\displaystyle {\bf B}=\left[\begin{array}{ccc}b_{11}&\cdots&b_{1M}\\ \vdots&\dd...
... \vdots&\ddots&\vdots\\
f_1(x_N)&\cdots&f_M(x_N)\end{array}\right]_{N\times M}$ (56)

where

$\displaystyle b_{ij}=f_j(x_i),\;\;\;\;\;(i=1,\cdots,N,\;j=1,\cdots,M)$ (57)

Typically $N\gg M$, i.e., the problem is over-constrained with $N$ equations but only $M$ unknowns. The LS-error is

$\displaystyle \varepsilon({\bf a})=\sum_{i=1}^N e_i^2 ={\bf e}^T{\bf e}=({\bf y}-{\bf B}{\bf a})^T({\bf y}-{\bf B}{\bf a})$ (58)

To find the $M$ optimal parameters ${\bf a}=[a_1,\cdots,a_M]^T$ that will minimize $\varepsilon({\bf a})$, we set the derivative of the LS-error with respective to ${\bf a}$ to zero and get
$\displaystyle \frac{d}{d {\bf a}}\varepsilon({\bf a})$ $\displaystyle =$ $\displaystyle \frac{d}{d {\bf a}} \left[({\bf y}-{\bf Ba})^T({\bf y}-{\bf Ba})\...
...y}^T{\bf B}{\bf a}
+{\bf a}^T{\bf B}^T{\bf y}+{\bf a}^T{\bf B}^T{\bf B}{\bf a})$  
  $\displaystyle =$ $\displaystyle -2{\bf B}^T{\bf y}+2{\bf B}^T{\bf B}{\bf a}={\bf0}$  

From this we can get the normal equation

$\displaystyle {\bf B}^T{\bf B}{\bf a} = {\bf B}^T{\bf y}$ (59)

This linear equation system with a symmetric and positive-definite coefficient matrix ${\bf B}^T{\bf B}$ can be solved by the conjugategradient (CG) method in $M$ iterations.

Alternatively, we can solve the normal equation for ${\bf a}$ to get

$\displaystyle {\bf a}=({\bf B}^T{\bf B})^{-1}{\bf B}^T {\bf y}={\bf B}^-{\bf y}$ (60)

where ${\bf B}^-$ is the pseudo-inverse of a non-square matrix ${\bf B}_{N\times M}$ defined as

$\displaystyle {\bf B}^-=({\bf B}^T{\bf B})^{-1}{\bf B}^T$ (61)

The pseudo-inverse of an $M\times N$ matrix ${\bf A}$ can also be found based on the singular value decomposition (SVD) of ${\bf A}$

$\displaystyle {\bf A}={\bf U}{\bf\Lambda}^{1/2}{\bf V}^T$ (62)

where ${\bf U}_{M\times M}$ and ${\bf V}_{N\times N}$ are both orthogonal $({\bf U}^T={\bf U}^{-1}$ and ${\bf V}^T={\bf V}^{-1}$) composed of the orthogonal eigenvectors of the symmetric matrix ${\bf A}{\bf A}^T$ and ${\bf A}^T{\bf A}$, respectively, and ${\bf\Lambda}_{M\times N}$ is a diagonal matrix composed of the common eigenvalues of both ${\bf A}{\bf A}^T$ and ${\bf A}^T{\bf A}$, i.e.,

$\displaystyle {\bf U}^T({\bf A}{\bf A}^T){\bf U}={\bf\Lambda}_{M\times M},\;\;\;\;\;\;
{\bf V}^T({\bf A}^T{\bf A}){\bf V}={\bf\Lambda}_{N\times N}$ (63)

The pseudo-inverse of ${\bf A}$ is

$\displaystyle {\bf A}^-={\bf V}{\bf\Lambda}^{-1/2}{\bf U}^T$ (64)

which is a left-inverse

$\displaystyle {\bf A}^-{\bf A}={\bf V}{\bf\Lambda}^{-1/2}{\bf U}^T {\bf U}{\bf\Lambda}^{1/2}{\bf V}^T
={\bf I}_{N\times N}$ (65)

But note that it is not a right inverse

$\displaystyle {\bf A}{\bf A}^-={\bf U}{\bf\Lambda}^{1/2}{\bf V}^T{\bf V}{\bf\Lambda}^{-1/2}{\bf U}^T
\ne {\bf I}_{M\times M}$ (66)

as the rank of the $N$ by $N$ matrix ${\bf V}^T{\bf V}$ is at most $M<N$. Along its diagonal there are only $M$ 1's but $N-M$ 0's, i.e., ${\bf V}^T{\bf V}\ne {\bf I}_{N\times N}$.