Next: Appendix Up: algebra Previous: Vector and matrix differentiation

Solving over-determined linear equations

In general an over-determined linear equation system of unknowns but equations has no solution if . But it is still possible to find the optimal approximation in the least squares sense, so that the squared error is minimized. Specifically, consider an over determined linear equation system

$\begin{displaymath}\sum_{j=1}^n a_{ij} x_j=b_i,\;\;\;\;\;\;\;(i=1,\cdots,m>n) \end{displaymath}$

which can also be represented in matrix form as

$\begin{displaymath}{\bf A}{\bf x}={\bf b} \end{displaymath}$

where

$\begin{displaymath}{\bf A}=\left[\begin{array}{ccc} a_{11} & \cdots & a_{1n}\\ ... ...egin{array}{c}b_1 \vdots b_m\end{array}\right]_{m\times 1} \end{displaymath}$

As in general no ${\bf x}$ can satisfy the equation system, there is always some residual for each of the

equations:

$\begin{displaymath}r_i=b_i-\sum_{j=1}^n a_{ij} x_j,\;\;\;\;\;\;\;(i=1,\cdots,m) \end{displaymath}$

or in matrix form

$\begin{displaymath}{\bf r}={\bf b}-{\bf A}{\bf x} \end{displaymath}$

where ${\bf r}=[r_1,\cdots,r_m]^T$ . The total error can be defined as

$\displaystyle e({\bf x})$	$\textstyle =$	$\displaystyle \sum_{i=1}^m r_i^2=\sum_{i=1}^m\left(b_i-\sum_{j=1}^na_{ij}x_j\right)^2$
	$\textstyle =$	$\displaystyle \vert\vert{\bf r}\vert\vert^2={\bf r}^T{\bf r}=({\bf b}-{\bf A}{\... ...f x}^T{\bf A}^T{\bf b}-{\bf b}^T{\bf A}{\bf x}+{\bf x}^T{\bf A}^T{\bf A}{\bf x}$

To find the optimal ${\bf x}$ that minimizes

, we let

$\begin{displaymath}\frac{\partial e({\bf x})}{\partial x_k} =\frac{\partial}{\pa... ...\sum_{j=1}^na_{ij}x_j\right)(-a_{ik})=0,\;\;\;\;(k=1,\cdots,n) \end{displaymath}$

which yields

$\begin{displaymath}\sum_{i=1}^m\sum_{j=1}^n a_{ij} a_{ik} x_j=\sum_{j=1}^n \left... ...ik}\right] x_j =\sum_{i=1}^m b_i a_{ik},\;\;\;\;(k=1,\cdots,n) \end{displaymath}$

This can be expressed in the matrix form as

$\begin{displaymath}{\bf A}^T{\bf A}{\bf x}={\bf A}^T{\bf b} \end{displaymath}$

Or in matrix form we have:

$\begin{displaymath}\frac{\partial}{\partial {\bf x}}\vert\vert{\bf r}\vert\vert^... ...\bf x}\right) =2(-{\bf A}^T{\bf b}+{\bf A}^T{\bf A}{\bf x})=0 \end{displaymath}$

Solving this for ${\bf x}$ , we get the same result above. This matrix equation can be solved for ${\bf x}$ by multiplying both sides by the inverse of ${\bf A}^T{\bf A}$ , if it exists:

$\begin{displaymath}{\bf x}=\left({\bf A}^T{\bf A}\right)^{-1}{\bf A}^T{\bf b}={\bf A}^-{\bf b} \end{displaymath}$

where

$\begin{displaymath}{\bf A}^-=\left({\bf A}^T{\bf A}\right)^{-1}{\bf A}^T \end{displaymath}$

is the pseudo-inverse of the non-square matrix ${\bf A}$ .

Sometime it is desired for the unknown to be as small as possible, then a cost function can be constructed as

$\begin{displaymath}c({\bf x})=e({\bf x})+\sum_{j=1}^n \lambda_jx^2_j =\sum_{i=1... ..._i-\sum_{j=1}^na_{ij}x_j\right)^2+\sum_{j=1}^n \lambda_jx^2_j \end{displaymath}$

where a greater $\lambda_j$ means the size of the corresponding

is more tightly controlled. Then repeating the process above we get:

$\begin{displaymath}\frac{\partial c({\bf x})}{\partial x_k} =2\sum_{i=1}^m \left... ...ij}x_j\right)(-a_{ik})+2\lambda_kx_k=0, \;\;\;\;(k=1,\cdots,n) \end{displaymath}$

which yields

$\begin{displaymath}\sum_{j=1}^n \left(\sum_{i=1}^ma_{ij} a_{ik}+\lambda_j\right) x_j =\sum_{i=1}^m b_i a_{ik},\;\;\;\;(k=1,\cdots,n) \end{displaymath}$

or in matrix form

$\begin{displaymath}({\bf A}^T{\bf A}+{\bf\Lambda}){\bf x}={\bf A}^T{\bf b} \end{displaymath}$

where ${\bf\Lambda}=diag(\lambda_1,\cdots,\lambda_n)$ . Solving this for ${\bf x}$ we get:

$\begin{displaymath}{\bf x}=({\bf A}^T{\bf A}+{\bf\Lambda})^{-1}{\bf A}^T{\bf b} \end{displaymath}$

Next: Appendix Up: algebra Previous: Vector and matrix differentiation

Ruye Wang 2015-04-27