Pseudo-Inverse Solutions Based on SVD

In the previous section we obtained the solution of the equation ${\bf Ax}={\bf b}$ together with the bases of the four subspaces of ${\bf A}$ based its rref. Here we will consider an alternative and better way to solve the same equation and find a set of orthogonal bases that also span the four subspaces, based on the pseudo-inverse and the singular value decomposition (SVD) of ${\bf A}$. The solution obtained this way is optimal in some certain sense as shown below.

Consider the SVD of an $M\times N$ matrix ${\bf A}$ of rank $R\le\min(M,N)$:

$\displaystyle {\bf A}={\bf U\Sigma V}^T$ (160)

where

$\displaystyle {\bf\Sigma}=\left[\begin{array}{cccccc}\sigma_1&&&&&0\\
&\ddots&...
...&&\sigma_R &&&\\ &&&0&&\\ &&&&\ddots &\\ 0&&&&&0
\end{array}\right]_{M\times N}$ (161)

is an $M\times N$ matrix with $R$ non-zero singular values $\sigma_i=\sqrt{\lambda_i} \;\;\;(i=1,\ldots,R)$ of ${\bf A}$ along the diagonal (starting from the top-left corner), while all other components are zero, and ${\bf U}=[{\bf u}_1,\cdots,{\bf u}_M]$ and ${\bf V}=[{\bf v}_1,\cdots,{\bf v}_N]$ are two orthogonal matrices of dimensions $M\times M$ and $N\times N$ respectively. The column vectors ${\bf u}_i\;(i=1,\cdots,M)$ and ${\bf v}_j\;(j=1,\cdots,N)$, are called the left and right singular vectors of ${\bf A}$, respectively, and they can be used as the orthonormal bases to span respectively $\mathbb{R}^M=C({\bf A})\oplus N({\bf A}^T)$ and its subspaces $C({\bf A})$ and $N({\bf A}^T)$, and $\mathbb{R}^N=C({\bf A})\oplus N({\bf A})$ and its subspaces $C({\bf A})$ and $N({\bf A})$.

To see this, we rewrite the SVD of ${\bf A}$ as:

$\displaystyle {\bf A}$ $\displaystyle =$ $\displaystyle [{\bf c}_1,\cdots,{\bf c}_N]={\bf U\Sigma V}^T$  
  $\displaystyle =$ $\displaystyle [{\bf u}_1,\cdots\cdots\cdots,{\bf u}_M]
\left[\begin{array}{cccc...
...array}{c}{\bf v}_1^T\\ \vdots\\ \vdots\\ \vdots\\ {\bf v}_N^T\end{array}\right]$  
  $\displaystyle =$ $\displaystyle \sum_{k=1}^R \sigma_k\;\left({\bf u}_k{\bf v}_k^T\right)
=\sum_{k...
...gma_kv_{1k})\;{\bf u}_k,\cdots,
\sum_{k=1}^R (\sigma_kv_{Nk})\;{\bf u}_k\right]$ (162)

Each column vector of ${\bf A}$ can be expressed as a linear combination of the first $R$ columns of ${\bf U}$ corresponding to the non-zero singular values:

$\displaystyle {\bf c}_j=\sum_{k=1}^R (\sigma_k v_{jk})\;{\bf u}_k\;\;\;\;\;\;(j=1,\cdots,N),$ (163)

Taking transpose on both sides of the SVD we also get:
$\displaystyle \left({\bf A}^T\right)$ $\displaystyle =$ $\displaystyle [{\bf r}_1,\cdots,{\bf r}_M]
={\bf V\Sigma}^T{\bf U}^T$  
  $\displaystyle =$ $\displaystyle \sum_{k=1}^R \sigma_k\;\left({\bf v}_k{\bf u}_k^T\right)
=\sum_{k...
...ku_{1k})\;{\bf v}_k,\;\cdots,\;
\sum_{k=1}^R (\sigma_ku_{Mk})\;{\bf v}_k\right]$ (164)

i.e., each row vector of ${\bf A}$ can be expressed as a linear combination of the first $R$ columns of ${\bf V}$ corresponding to the non-zero singular values:

$\displaystyle {\bf r}_i=\sum_{k=1}^R (\sigma_k u_{ik})\;{\bf v}_k\;\;\;\;\;\;(i=1,\cdots,M)$ (165)

We therefore see that the $R$ columns of ${\bf U}$ and ${\bf V}$ corresponding to the non-zero singular values span respectively the column space $C({\bf A})$ and row space $R({\bf A})$:

$\displaystyle C({\bf A})=span({\bf u}_1,\cdots,{\bf u}_R),\;\;\;\;\;\;\;
R({\bf A})=span({\bf v}_1,\cdots,{\bf v}_R)$ (166)

and the remaining $M-R$ columns of ${\bf U}$ and $N-R$ columns of ${\bf V}$ corresponding to the zero singular values span respectively $N({\bf A}^T)$ orthogonal to $C({\bf A})$, and $N({\bf A})$ orthogonal to $R({\bf A})$:

$\displaystyle N({\bf A}^T)=span({\bf u}_{R+1},\cdots,{\bf u}_M),\;\;\;\;\;\;
N({\bf A})=span({\bf v}_{R+1},\cdots,{\bf v}_N)$ (167)

Subspace Definition Dimension Basis
$C({\bf A})\subseteq \mathbb{R}^M$ column space (image) of ${\bf A}$ $R$ $R$ columns of ${\bf U}$ corresponding to non-zero singular values
$N({\bf A^T})\subseteq \mathbb{R}^M$ left null space of ${\bf A}^T$ $M-R$ $M-R$ columns of ${\bf U}$ corresponding to zero singular values
$R({\bf A})\subseteq \mathbb{R}^N$ row space of ${\bf A}$ (column space of ${\bf A}^T$) $R$ $R$ columns of ${\bf V}$ corresponding to non-zero singular values
$N({\bf A})\subseteq \mathbb{R}^N$ null space of ${\bf A}$ $N-R$ $N-R$ columns of ${\bf V}$ corresponding to zero singular values
$\mathbb{R}^N=N({\bf A}) \oplus R({\bf A})$ domain of ${\bf Ax}={\bf b}$ $N$ all $N$ columns of ${\bf V}$
$\mathbb{R}^M=N({\bf A}^T) \oplus C({\bf A})$ codomain of ${\bf Ax}={\bf b}$ $M$ all $M$ columns of ${\bf U}$

The SVD method can be used to find the pseudo-inverse of an $M\times N$ matrix ${\bf A}$ of rank $R\le\min(M,N)$:

$\displaystyle {\bf A}^-={\bf V\Sigma}^-{\bf U}^T$ (168)

where both ${\bf A}^-$ and

$\displaystyle {\bf\Sigma}^-=\left[\begin{array}{cccccc}1/\sigma_1&&&&&0\\
&\dd...
...1/\sigma_R &&&\\ &&&0&&\\ &&&&\ddots &\\ 0&&&&&0
\end{array}\right]_{N\times M}$ (169)

are $N\times M$ matrices.

Consider the following four cases:

We further note that matrices ${\bf U}$ and ${\bf V}$ are related to each other by:

$\displaystyle \left\{\begin{array}{l}{\bf A}={\bf U}{\bf\Sigma}{\bf V}^T\\
{\bf A}^-={\bf V}{\bf\Sigma}^-{\bf U}^T\end{array}\right.
\;\;\;\;\;\;\;$or$\displaystyle \;\;\;\;\;\;\;
\left\{\begin{array}{l}{\bf A}{\bf V}={\bf U}{\bf\Sigma}\\
{\bf A}^-{\bf U}={\bf V}{\bf\Sigma}^-\end{array}\right.$ (175)

or in vector form:

$\displaystyle \left\{\begin{array}{ll}
{\bf A}{\bf v}_i=\sigma_i{\bf u}_i\;\;\;...
...+1,\cdots,N)\\
{\bf A}^-{\bf u_i}=0\;\;\;\;(i=R+1,\cdots,M)
\end{array}\right.$ (176)

indicating how the individual columns are related. We see that the last $N-R$ columns ${\bf v}_{R+1},\cdots,{\bf v}_N$ of ${\bf V}$ form an orthogonal basis of $N({\bf A})$; and the last $M-R$ columns ${\bf u}_{R+1},\cdots,{\bf u}_M$ of ${\bf U}$ form an orthogonal basis of $N({\bf A}^T)$.

FourSpacesSVD1.png

We now show that the optimal solution of the linear system ${\bf Ax}={\bf b}$ can be obtained based on the pseudo-inverse of ${\bf A}$:

$\displaystyle {\bf x}_{svd}={\bf A}^-{\bf b}={\bf V\Sigma}^-{\bf U}^T{\bf b}$ (177)

Specially, if $M=N=R$, then ${\bf A}^-={\bf A}^{-1}$, and ${\bf x}_{svd}={\bf A}^{-1}{\bf b}$ is the unique and exact solution. In general when $M\ne N$ or $R<\min(M,N)$, the solution may not exist, or it may not be unique, but ${\bf x}_{svd}$ is still an optimal solution in two ways, from the perspective of both the domain and codomain of the linear mapping ${\bf Ax}$, as shown below.

Summarizing the two aspects above, we see that the pseudo-inverse solution ${\bf x}_{svd}$ is optimal in the sense that both its norm $\vert\vert{\bf x}_{svd}\vert\vert$ and its error $\vert\vert{\bf Ax}_{svd}-{\bf b}\vert\vert$ are minimized.

SVDoptimal.png

NoSolution.png

FourSpaces2.png

Example: Given the same system considered in previous examples

$\displaystyle {\bf A}{\bf x}=\left[\begin{array}{rccc}1&2&3&4\\ 4&3&2&1\\ -2&1&...
... x_3\\ x_4\end{array}\right]
=\left[\begin{array}{r}3\\ 2\\ 4\end{array}\right]$    

we will now solve it using the pseudo-inverse method. We first find SVD of ${\bf A}$ in terms of the following matrices:

$\displaystyle {\bf U}=[{\bf u}_1,\;{\bf u}_2,\;{\bf u}_3]
=\left[\begin{array}{...
...817 \\
-0.267 & -0.873 & -0.408 \\
-0.802 & 0.436 & -0.408 \end{array}\right]$    

$\displaystyle {\bf V}=[{\bf v}_1,\;{\bf v}_2,\;{\bf v}_3,\;{\bf v}_4]
=\left[\b...
... -0.120 & 0.472 & 0.691 \\
-0.802 & 0.239 & -0.049 & -0.546 \end{array}\right]$    

$\displaystyle {\bf\Sigma}=\left[\begin{array}{cccc}
10.0 & 0 & 0 & 0 \\
0 & 5....
...c}
0.1 & 0 & 0 \\
0 & 0.183 & 0 \\
0 & 0 & 0 \\
0 & 0 & 0 \end{array}\right]$    


$\displaystyle {\bf A}^-={\bf V}{\bf\Sigma}^-{\bf U}^T$ $\displaystyle =$ $\displaystyle \left[\begin{array}{rrrr}
0.000 & -0.837 & 0.374 & -0.400 \\
-0....
....802 \\
-0.218 & -0.873 & 0.436 \\
0.817 & -0.408 & -0.408 \end{array}\right]$  
  $\displaystyle =$ $\displaystyle \left[\begin{array}{rrr}
0.033 & 0.133 & -0.067 \\
0.033 & 0.083 & -0.017 \\
0.033 & 0.033 & 0.033 \\
0.033 & -0.017 & 0.083 \end{array}\right]$  

The particular solution of the system ${\bf A}{\bf x}={\bf b}$ with ${\bf b}=[3,\;2,\;4]^T\in C({\bf A})$ is
$\displaystyle {\bf x}_{svd}$ $\displaystyle =$ $\displaystyle {\bf A}^-{\bf b}=\left[\begin{array}{rrr}
0.033 & 0.133 & -0.067 ...
...nd{array}\right]
=\left[\begin{array}{r}0.1\\ 0.2\\ 0.3\\ 0.4\end{array}\right]$  
  $\displaystyle =$ $\displaystyle -0.535\left[\begin{array}{r}0.000\\ -0.267\\ -0.535\\ -0.802\end{...
...r}-0.837\\ -0.478\\ -0.120\\ 0.239\end{array}\right]
=c_1{\bf v}_1+c_2{\bf v}_2$  

which is in $R({\bf A})$ spanned by the first $R=2$ columns ${\bf v}_1$ and ${\bf v}_2$, perpendicular to $N({\bf A})$ spanned by the last $N-R=2$ columns ${\bf v}_3$ and ${\bf v}_4$. Note that this solution ${\bf x}_{svd}=[0.1,\;0.2,\;0.3,\;0.4]^T$ is actually the first component ${\bf x}'_p\in R({\bf A})$ of the particular solution ${\bf x}_p=[-1,\,2,\,0,\,0]$ found in the previous section by Gauss-Jordan elimination, which is not in $R({\bf A})$. Adding the null space to the particular solution, we get the complete solution:

$\displaystyle {\bf x}_c={\bf x}_{svd}+N({\bf A})={\bf x}_p+c_1{\bf v}_3+c_2{\bf...
...t]
+c_2 \left[\begin{array}{r}-0.400\\ 0.255\\ 0.691\\ -0.546\end{array}\right]$    

If the right-hand side is replaced by ${\bf b}=[1,\;3,\;5]^T\notin C({\bf A})$, no solution exists. However, we can still find the pseudo-inverse solution as the optimal approximate solution:

$\displaystyle {\bf x}_{svd}={\bf A}^-{\bf b}=\left[\begin{array}{rrr}
0.033 & 0...
...nd{array}\right]
=\left[\begin{array}{r}0.1\\ 0.2\\ 0.3\\ 0.4\end{array}\right]$    

which is the same as the solution for ${\bf b}=[3,\;2,\;4]^T$, indicating $[3,\;2,\;4]^T$ happens to be the projection of $[1,\;3,\;5]^T$ onto $C({\bf A})$ spanned by ${\bf u}_1$ and ${\bf u}_2$. The result produced by ${\bf x}_{svd}$ is ${\bf b}'={\bf Ax}_{svd}\in C({\bf A})$ is the projection of ${\bf b}$ onto $C({\bf A})$, with a minimum error distance $\vert\vert{\bf e}\vert\vert=\vert\vert{\bf b}-{\bf b}'\vert\vert$, indicating ${\bf x}_{svd}$ is the optimal approximate solution.

Homework 3:

  1. Use Matlab function svd(A) to carry out the SVD of the coefficient matrix ${\bf A}$ of the linear system in Homework 2 problem 3,

    $\displaystyle {\bf A}=\left[\begin{array}{rrrr}-1 & 3 & 4 & 1\\
2 & -4 & 3 & 2\\ 1 & -1 & 7 & 3\end{array}\right]$    

    Find ${\bf U}$, ${\bf V}$, ${\bf\Sigma}$. Verify that ${\bf U}$ and ${\bf V}$ are orthogonal, i.e., ${\bf U}^T{\bf U}={\bf I}$ and ${\bf V}^T{\bf V}={\bf I}$.
  2. Obtain ${\bf\Sigma}^-$, i.e, find the transpose of ${\bf\Sigma}$ and then replace each singular value by its reciprocal, verify that ${\bf\Sigma}{\bf\Sigma}^-={\bf I}$ and ${\bf\Sigma}^-{\bf\Sigma}={\bf I}$. Then find the pseudo-inverse ${\bf A}^-={\bf U}{\bf\Sigma}^-{\bf V}^T$.
  3. Use Matlab function pinv(A) to find the pseudo-inverse ${\bf\Sigma}^-$ and ${\bf A}^-$. Compare them to what you obtained in the previous part.
  4. Identify the bases of the four subspaces $R({\bf A})$, $N({\bf A})$, $C({\bf A})=R({\bf A}^T)$, and $N({\bf A}^T)$ based on ${\bf U}$ and ${\bf V}$. Verify that these bases and those you found previously (problem 3 of Homework 2) span the same spaces, i.e., the basis vectors of one basis can be written as a linear combination of those of the other, for each of the four subspaces.
  5. Use the pseudo-inverse ${\bf A}^-$ found above to solve the system ${\bf A}{\bf x}={\bf b}$. Find the two particular solutions ${\bf x}_{p1}$ and ${\bf x}_{p2}$ corresponding to two different right-hand side ${\bf b}_1=[1,\;2,\;3]^T$ and ${\bf b}_2=[2,\;3,\;2]^T$.
  6. How are the two results ${\bf x}_{p1}$ and ${\bf x}_{p2}$ related to each other? Give your explanation. Why can't you find a solution when ${\bf b}=[2,\;3,\;2]^T$ but SVD method can?
  7. Verify that ${\bf x}_p \perp N({\bf A})$. Also find the error (or residual) ${\bf e}={\bf b}-{\bf A}{\bf x}_p$ for each of the two results above. Verify that ${\bf e}\perp C({\bf A})$, if ${\bf e}\ne 0$.
  8. In Homework 2 you used row reduction method to solve the system ${\bf A}{\bf x}={\bf b}_1$ and you should have found a particular solution ${\bf x}_1=[5,\;2,\;0,\;0,\;]^T$. Also it is obvious to see that another solution is ${\bf x}_2=[0,\;0,\;0,\;1]^T$. Show that the projection of these solutions onto $R({\bf A})$ spanned by the first two columns of ${\bf V}$ is the same as the particular solution ${\bf x}_p$ found by the SVD method.

    Answer

    $\displaystyle c_1={\bf x}_1^T{\bf v}_1=0.0717,\;\;\;\;\;\;c_1={\bf x}_1^T{\bf v}_2=0.3913$    

    $\displaystyle c_1{\bf v}_1+c_2{\bf v}_2={\bf x}_p$    

    $\displaystyle d_1={\bf x}_2^T{\bf v}_1=0.0717,\;\;\;\;\;\;d_1={\bf x}_2^T{\bf v}_2=0.3913$    

    $\displaystyle d_1{\bf v}_1+d_2{\bf v}_2={\bf x}_p$    

  9. Show that ${\bf b}_1=[1,\;2,\;3]^T$ is the projection of ${\bf b}_2=[2,\;3,\;2]^T$ onto $C({\bf A})$, the 2-D subspace spanned by the first two columns of ${\bf U}$. Can you also show that it is the projection of ${\bf b}$ onto $C({\bf A})$ spanned by the basis you obtained in Homework 2 (not necessarily orthogonal)?

    Answer

    $\displaystyle c_1={\bf b}_2^T{\bf u}_1=3.7213,\;\;\;\;c_2={\bf b}_2^T{\bf u}_2=0.3898$    

    $\displaystyle c_1{\bf u}_1+c_2{\bf u}_2=[1,\;2,\;3]^T={\bf b}_1$    

  10. Give an expression of the null space $N({\bf A})$, and then write the complete solution in form of ${\bf x}_c={\bf x}_p+N({\bf A})$. Verify any complete solution so generated satisfies ${\bf A}{\bf x}_c={\bf b}_1$.