next up previous
Next: Appendix Up: algebra Previous: Vector and matrix differentiation

Solving over-determined linear equations

In general an over-determined linear equation system of $n$ unknowns but $m$ equations has no solution if $m>n$. But it is still possible to find the optimal approximation in the least squares sense, so that the squared error is minimized. Specifically, consider an over determined linear equation system

\begin{displaymath}\sum_{j=1}^n a_{ij} x_j=b_i,\;\;\;\;\;\;\;(i=1,\cdots,m>n) \end{displaymath}

which can also be represented in matrix form as

\begin{displaymath}{\bf A}{\bf x}={\bf b} \end{displaymath}

where

\begin{displaymath}{\bf A}=\left[\begin{array}{ccc} a_{11} & \cdots & a_{1n}\\
...
...egin{array}{c}b_1 \vdots b_m\end{array}\right]_{m\times 1}
\end{displaymath}

As in general no ${\bf x}$ can satisfy the equation system, there is always some residual for each of the $m$ equations:

\begin{displaymath}r_i=b_i-\sum_{j=1}^n a_{ij} x_j,\;\;\;\;\;\;\;(i=1,\cdots,m) \end{displaymath}

or in matrix form

\begin{displaymath}{\bf r}={\bf b}-{\bf A}{\bf x} \end{displaymath}

where ${\bf r}=[r_1,\cdots,r_m]^T$. The total error can be defined as
$\displaystyle e({\bf x})$ $\textstyle =$ $\displaystyle \sum_{i=1}^m r_i^2=\sum_{i=1}^m\left(b_i-\sum_{j=1}^na_{ij}x_j\right)^2$  
  $\textstyle =$ $\displaystyle \vert\vert{\bf r}\vert\vert^2={\bf r}^T{\bf r}=({\bf b}-{\bf A}{\...
...f x}^T{\bf A}^T{\bf b}-{\bf b}^T{\bf A}{\bf x}+{\bf x}^T{\bf A}^T{\bf A}{\bf x}$  

To find the optimal ${\bf x}$ that minimizes $e$, we let

\begin{displaymath}\frac{\partial e({\bf x})}{\partial x_k}
=\frac{\partial}{\pa...
...\sum_{j=1}^na_{ij}x_j\right)(-a_{ik})=0,\;\;\;\;(k=1,\cdots,n) \end{displaymath}

which yields

\begin{displaymath}\sum_{i=1}^m\sum_{j=1}^n a_{ij} a_{ik} x_j=\sum_{j=1}^n \left...
...ik}\right] x_j
=\sum_{i=1}^m b_i a_{ik},\;\;\;\;(k=1,\cdots,n) \end{displaymath}

This can be expressed in the matrix form as

\begin{displaymath}{\bf A}^T{\bf A}{\bf x}={\bf A}^T{\bf b} \end{displaymath}

Or in matrix form we have:

\begin{displaymath}\frac{\partial}{\partial {\bf x}}\vert\vert{\bf r}\vert\vert^...
...\bf x}\right)
=2(-{\bf A}^T{\bf b}+{\bf A}^T{\bf A}{\bf x})=0 \end{displaymath}

Solving this for ${\bf x}$, we get the same result above. This matrix equation can be solved for ${\bf x}$ by multiplying both sides by the inverse of ${\bf A}^T{\bf A}$, if it exists:

\begin{displaymath}{\bf x}=\left({\bf A}^T{\bf A}\right)^{-1}{\bf A}^T{\bf b}={\bf A}^-{\bf b} \end{displaymath}

where

\begin{displaymath}{\bf A}^-=\left({\bf A}^T{\bf A}\right)^{-1}{\bf A}^T \end{displaymath}

is the pseudo-inverse of the non-square matrix ${\bf A}$.

Sometime it is desired for the unknown $x_j$ to be as small as possible, then a cost function can be constructed as

\begin{displaymath}c({\bf x})=e({\bf x})+\sum_{j=1}^n \lambda_jx^2_j
=\sum_{i=1...
..._i-\sum_{j=1}^na_{ij}x_j\right)^2+\sum_{j=1}^n \lambda_jx^2_j
\end{displaymath}

where a greater $\lambda_j$ means the size of the corresponding $x_j$ is more tightly controlled. Then repeating the process above we get:

\begin{displaymath}\frac{\partial c({\bf x})}{\partial x_k}
=2\sum_{i=1}^m \left...
...ij}x_j\right)(-a_{ik})+2\lambda_kx_k=0,
\;\;\;\;(k=1,\cdots,n) \end{displaymath}

which yields

\begin{displaymath}\sum_{j=1}^n \left(\sum_{i=1}^ma_{ij} a_{ik}+\lambda_j\right) x_j
=\sum_{i=1}^m b_i a_{ik},\;\;\;\;(k=1,\cdots,n) \end{displaymath}

or in matrix form

\begin{displaymath}({\bf A}^T{\bf A}+{\bf\Lambda}){\bf x}={\bf A}^T{\bf b} \end{displaymath}

where ${\bf\Lambda}=diag(\lambda_1,\cdots,\lambda_n)$. Solving this for ${\bf x}$ we get:

\begin{displaymath}{\bf x}=({\bf A}^T{\bf A}+{\bf\Lambda})^{-1}{\bf A}^T{\bf b} \end{displaymath}


next up previous
Next: Appendix Up: algebra Previous: Vector and matrix differentiation
Ruye Wang 2015-04-27