The Schur Decomposition and QR Algorithm

Here we will present the QR algorithm, an important iterative method for solving the eigenvalue problem of a general square matrix (real or complex, symmetric or non-symmetric). To do so, we first need to consider the Schur decomposition, the theoretical foundation for the QR algorithm, which has two versions for real and complex matrices.

Theorem of Schur decomposition (complex):

A complex matrix ${\bf A}$ can be decomposed into a product

$\displaystyle {\bf A}={\bf Q}{\bf U}{\bf Q}^{-1}={\bf Q}{\bf U}{\bf Q}^*$ (77)

where ${\bf Q}^{-1}={\bf Q}^*$ is a unitary matrix, and ${\bf U}$ is an upper triangular matrix containing all eigenvalues of ${\bf A}$ along its diagonal.

Proof: The proof is by induction. When $N=1$, the statement is trivially true. We assume this is true for $N-1$, and show the statement is also true for $N$. Let ${\bf u}$ be the normalized eigenvector of ${\bf A}$ corresponding to an eigenvalue $\lambda$, i.e., ${\bf Au}=\lambda{\bf u}$ and $\vert\vert{\bf u}\vert\vert^2=1$. We construct a unitary matrix

$\displaystyle {\bf Q}_0=[{\bf u},{\bf w}_1,\cdots,{\bf w}_{N-1}]=[{\bf u}\;{\bf W}]$ (78)

satisfying ${\bf Q}^{-1}_0={\bf Q}^*_0$. Here ${\bf W}=[{\bf w}_1,\cdots,{\bf w}_{N-1}]$ is an $N\times(N-1)$ matrix composed of $N-1$ mutually orthonormal vectors all orthogonal to ${\bf u}$ (can be obtained by Gram-Schmidt process based on any $N-1$ independent N-D vectors). Now we have

$\displaystyle {\bf Q}_0^*{\bf A}{\bf Q}_0
=\left[\begin{array}{c}{\bf u}^*\\ {\...
...ht]
=\left[\begin{array}{cc}\lambda & ***\\ {\bf0}&{\bf A}_1
\end{array}\right]$ (79)

The lower-left block is ${\bf W}^*{\bf A}{\bf u}={\bf W}^*\lambda{\bf u}={\bf0}$ as ${\bf w}_i^*{\bf u}=0,\;(i=1,\cdots,N-1)$. However, by assumption the $N-1$ dimensional matrix ${\bf A}_1={\bf W}^*{\bf A}{\bf W}$ in the lower-right block can be decomposed into an upper triangular matrix by a unitary matrix,

$\displaystyle {\bf A}_1={\bf Q}_1{\bf U}_1{\bf Q}_1^*,\;\;\;\;\;$i.e.,$\displaystyle \;\;\;\;\;\;
{\bf U}_1={\bf Q}_1^*{\bf A}_1{\bf Q}_1$ (80)

We construct an N-D matrix

$\displaystyle {\bf Q}_1'=\left[\begin{array}{cc}1 & {\bf0}^T\\ {\bf0} & {\bf Q}_1\end{array}\right]$ (81)

and define ${\bf Q}={\bf Q}_0{\bf Q}_1'$ to get
$\displaystyle {\bf Q}^*{\bf A}{\bf Q}$ $\displaystyle =$ $\displaystyle ({\bf Q}_0{\bf Q}_1')^*{\bf A} ({\bf Q}_0{\bf Q}_1')
={\bf Q}_1'^...
...ght]
\left[\begin{array}{cc}1 & {\bf0}^T\\ {\bf0} & {\bf Q}_1\end{array}\right]$  
  $\displaystyle =$ $\displaystyle \left[\begin{array}{cc}\lambda & ***\\ {\bf0}&{\bf Q}_1^*{\bf A}_...
...ft[\begin{array}{cc}\lambda & ***\\ {\bf0}&{\bf U}_1
\end{array}\right]={\bf U}$ (82)

which is Eq.(77) of the theorem.

QED

Theorem Schur decomposition (real):

A real matrix ${\bf A}$ can be decomposed into a product

$\displaystyle {\bf A}={\bf Q}{\bf U}{\bf Q}^{-1}={\bf Q}{\bf U}{\bf Q}^*$ (83)

where ${\bf Q}^{-1}={\bf Q}^T$ is an orthogonal matrix, and ${\bf U}$ is a quasi upper triangular matrix that takes the following form

$\displaystyle {\bf U}=\left[\begin{array}{cccc}
{\bf B}_1 & * & \cdots & *\\
0...
...ots & \vdots & \ddots & \vdots \\
0 & 0 & \cdots & {\bf B}_k\end{array}\right]$ (84)

where ${\bf B}_i$ is either a scalar, a real eigenvalue of ${\bf A}$, or a $2\times 2$ block, which has a pair of complex conjugate eigenvalues that also belongs to ${\bf A}$.

Note that here ${\bf U}$ is not strictly triangular, as there may exist some $2\times 2$ blocks on the diagonal, i.e., some subdiagonal entries may be non-zeor.

Proof:

We first review the following simple facts. Let $\lambda$ and ${\bf u}$ be the eigenvalue and the corresponding eigenvector of a real matrix ${\bf A}$, i.e., ${\bf Au}=\lambda{\bf u}$. Taking complex conjugate on both sides we get $\overline{\bf A}\overline{\bf u}={\bf A}\overline{\bf u}
=\overline\lambda\overline{\bf u}$. Consider the following two cases:

Consider specifically a $2\times 2$ real matrix $\left[\begin{array}{rr}\alpha & \beta\\ -\beta & \alpha\end{array}\right]$. Solving its characteristic equation

$\displaystyle \det\left[\begin{array}{rr}
\lambda-\alpha & -\beta\\ \beta &\lam...
...\right]
=(\lambda-\alpha)^2+\beta^2=\lambda^2-2\alpha\lambda+\alpha^2+\beta^2=0$ (85)

we get a pair of complex conjugate eigenvalues $\lambda_{1,2}=\alpha\pm j\beta$.

Assume a real matrix ${\bf A}$ has a pair of complex conjugate eigenvalues $\alpha\pm j\beta$ with the corresponding complex conjugate eigenvectors ${\bf u}\pm j{\bf v}$:

$\displaystyle {\bf A}({\bf u}+j{\bf v})=(\alpha+j\beta)({\bf u}+j{\bf v})$ (86)

where ${\bf u}$, ${\bf v}$, $\alpha$ and $\beta$ are real. Equating the real and imaginary parts we have

$\displaystyle {\bf Au}=\alpha{\bf u}-\beta{\bf v},\;\;\;\;\;
{\bf Av}=\beta{\bf u}+\alpha{\bf v}$ (87)

Combining these two equations we get:

$\displaystyle {\bf A}[{\bf u}\;{\bf v}]=[{\bf u}\;{\bf v}]
\left[\begin{array}{rr}\alpha & \beta\\ -\beta & \alpha\end{array}\right]$ (88)

Let ${\bf w}_1$ and ${\bf w}_2$ be an orthonormal basis of the 2-D space $span({\bf u}, {\bf v})$, then ${\bf u}$ and ${\bf v}$ can be expressed as their linear combinations:

$\displaystyle {\bf u}=c_{11}{\bf w}_1+c_{21}{\bf w}_2,\;\;\;\;\;\;\;\;
{\bf v}=c_{12}{\bf w}_1+c_{22}{\bf w}_2$ (89)

or in matrix form:

$\displaystyle [{\bf u}\;{\bf v}]=[{\bf w}_1\;{\bf w}_2]
\left[\begin{array}{cc}...
...}\\ c_{21}&c_{22}\end{array}\right]
=[{\bf w}_1\;{\bf w}_2]{\bf C},\;\;\;\;\;\;$or$\displaystyle \;\;\;\;\;
[{\bf w}_1\;{\bf w}_2]=[{\bf u}\;{\bf v}]{\bf C}^{-1}$ (90)

Now we have:
$\displaystyle {\bf A}[{\bf w}_1\;{\bf w}_2]$ $\displaystyle =$ $\displaystyle {\bf A}[{\bf u}\;{\bf v}]{\bf C}^{-1}
=[{\bf u}\;{\bf v}]
\left[\begin{array}{rr}\alpha & \beta\\ -\beta & \alpha\end{array}\right]{\bf C}^{-1}$  
  $\displaystyle =$ $\displaystyle [{\bf w}_1\;{\bf w}_2]{\bf C}\left[\begin{array}{rr}\alpha & \beta\\ -\beta & \alpha\end{array}\right]{\bf C}^{-1}
=[{\bf w}_1\;{\bf w}_2]{\bf B}$ (91)

where

$\displaystyle {\bf B}={\bf C}\left[\begin{array}{rr}\alpha & \beta\\ -\beta & \alpha\end{array}\right]{\bf C}^{-1}
$

is a $2\times 2$ matrix that is similar to $\left[\begin{array}{rr}\alpha & \beta\\ -\beta & \alpha\end{array}\right]$, sharing the same complex conjugate eigenvalues $\alpha\pm j\beta$. Pre-multiplying $[{\bf w}_1\;{\bf w}_2]^T$ on both sides of the equation above, we get

$\displaystyle [{\bf w}_1\;{\bf w}_2]^T{\bf A}[{\bf w}_1\;{\bf w}_2]
=[{\bf w}_...
...T{\bf w}_1 & {\bf w}_2^T{\bf w}_2 \end{array}\right]{\bf B}
={\bf IB}={\bf B}
$

Now we are ready to prove the theorem, by following the same induction steps in the proof of the complex Schur decomposition. Let $\lambda$ be an eigenvalue of ${\bf A}$. If $\lambda$ is real, it has a real eigenvector ${\bf u}$ and we proceed as in the proof of the previus theorem. If $\lambda$ is complex, then it must have be one of a pair of complex conjugate eigenvalue $\alpha\pm j\beta$ with the corresponding eigenvectors ${\bf u}\pm j{\bf v}$. We can now construct an $N\times N$ orthogonal matrix ${\bf Q}_0=[{\bf w}_1\,{\bf w}_2\,{\bf W}]$, where ${\bf w}_1$ and ${\bf w}_2$ are the orthonormal basis of $span({\bf u}, {\bf v})$ as defined above, and ${\bf W}=[{\bf w}_3,\cdots,{\bf w}_N]$ is an $N\times(N-2)$ matrix composed of $N-2$ orthonormal vectors all orthogonal to ${\bf w}_1$ and ${\bf w}_2$. Now we have

$\displaystyle {\bf Q}_0^T{\bf A}{\bf Q}_0$ $\displaystyle =$ $\displaystyle \left[\begin{array}{c}[{\bf w}_1\;{\bf w}_2]^T\\ {\bf W}^T\end{ar...
...f W}^T{\bf A}[{\bf w}_1\;{\bf w}_2] & {\bf W}^*{\bf A}{\bf W}\end{array}\right]$  
  $\displaystyle =$ $\displaystyle \left[\begin{array}{cc}{\bf B} & ***\\ {\bf0} & {\bf A}_1\end{array}\right]$ (92)

Note that the bottom-left block is an $(N-2)\times 2$ zero matrix

$\displaystyle {\bf W}^T{\bf A}[{\bf w}_1\;{\bf w}_2]
={\bf W}^T[{\bf w}_1\;{\bf w}_2]{\bf B}={\bf0}$ (93)

as ${\bf W}^T{\bf w}_1={\bf W}^T{\bf w}_2={\bf0}$, and the lower-right block is an $(N-2)\times(N-2)$ matrix ${\bf A}_1={\bf W}^*{\bf A}{\bf W}$, which can be decomposed into an upper triangular matrix by a unitary matrix. The theorem is thereby proven by the same argument of the inductive proof of the previous theorem.

QED

In particular, if ${\bf A}={\bf A}^T$ is real and symmetric, all of its eigenvalues are real, and all entries above the diagonal of ${\bf U}$ are eliminated as well as those below, i.e., ${\bf U}$ becomes the diagonal eigenvalue matrix containing all real eigenvalues, and the corresponding ${\bf Q}$ matrix becomes the orthogonal eigenvector matrix.

Example 1: The QR algorithm applied to this given complex matrix

$\displaystyle {\bf A}=\left[\begin{array}{lllll}
3+4i & 5+5i & 1+1i & 3+3i & 2...
... & 2+2i & 2+2i & 1+2i\\
5+5i & 3+2i & 4+2i & 3+3i & 3+3i
\end{array}\right]
$

produces an upper matrix after some $k=100$ iterations:

$\displaystyle {\bf A}_k=\left[\begin{array}{ccccc}
13.7481 +15.383i &-1.181 + ...
...1.391i &-0.976 + 0.425i\\
0 & 0 & 0 & 0 &-1.715 - 0.668i
\end{array}\right]
$

Example 2:

$\displaystyle {\bf A}=\left[\begin{array}{ccccc}
1 & 2 & 2 & 6 & 4 \\
5 & 4 ...
... & 9 & 7 & 8 \\
2 & 1 & 1 & 5 & 5 \\
2 & 4 & 7 & 7 & 9
\end{array}\right]
$

$\displaystyle {\bf U}=\left[\begin{array}{ccccc}
24.4227 & 3.9999 & -11.3831 &...
...0 & -2.5214 & 1.8609 & 0.2790 \\
0 & 0 & 0 & 0 & 1.2877
\end{array}\right]
$

Along the diagonal of ${\bf U}$, there are three $1\times 1$ blocks: ${\bf B}_1=\lambda_1=24.4277$, ${\bf B}_2=\lambda_2=-1.4323$, and ${\bf B}_4=\lambda_5=1.2877$, and one $2\times 2$ block:

$\displaystyle {\bf B}_3=\left[\begin{array}{rr}1.8609 & 1.0125\\ -2.5214 & 1.8609\end{array}\right]
$

with two complex conjugate eigenvalues: $\lambda_{3,4}=1.8609 \pm j\,1.5978$, which also belong to ${\bf A}$.

Example 3:

$\displaystyle {\bf A}=\left[\begin{array}{ccccc}
0.8929 & 0.5951 & 0.4381 & 0....
...3 & 0.7364 \\
0.1077 & 0.6225 & 0.6454 & 0.8238 & 0.7091
\end{array}\right]
$

$\displaystyle {\bf U}=\left[\begin{array}{rrrrr}
2.8767 & -0.5304 & -0.0322 & ...
... & 0 & -0.1222 & -0.3573 \\
0 & 0 & 0 & 0.0940 & -0.1222
\end{array}\right]
$

Along the diagonal there exist three blocks, ${\bf B}_1=\lambda_1=2.8767$, and

$\displaystyle {\bf B}_2=\left[\begin{array}{rr}
0.4469 & -0.6092 \\ 0.0194 & 0.4469
\end{array}\right],\;\;\;\;\;\lambda_{2,3}=0.4469 \pm j\,0.1087
$

$\displaystyle {\bf B}_2=\left[\begin{array}{rr}
-0.1222 & -0.3573 \\ 0.0940 & -0.1222
\end{array}\right],\;\;\;\;\;\lambda_{4,5}=-0.1222 \pm j\,0.1833
$

As well as $\lambda_1$, the complex conjugate eigenvalues $\lambda_{2,3}$ of ${\bf B}_2$ and $\lambda_{4,5}$ of ${\bf B}_3$ also belong to ${\bf A}$.

The proof of the Schur decomposition theorems is not constructive (it is based on the unknown eigenvalues of ${\bf A}$), it does not lead to any specific algorithm for actually obtaining ${\bf Q}$. However, the QR algorithm below can be used to actually implement the Schur decomposition, thereby solving the eigenvalue problem of a real square matrix ${\bf A}$.

The QR algorithm

The first step is to perform the QR decomposition of the given matrix: ${\bf A}={\bf A}_0={\bf Q}_0{\bf R}_0$, i.e., ${\bf Q}^{-1}_0{\bf A}_0={\bf Q}^T_0{\bf A}_0={\bf R}_0$. The second step is to construct a new matrix ${\bf A}_1={\bf R}_0{\bf Q}_0={\bf Q}^T_0{\bf A}_0{\bf Q}_0$. Then these two steps are carried out iteratively until ${\bf A}_k$ becomes a quasi upper triangular matrix:

Here

$\displaystyle {\bf Q}^{(k)}={\bf Q}_0\cdots{\bf Q}_k,\;\;\;\;\;\;$and$\displaystyle \;\;\;\;\;
({\bf Q}^{(k)})^T={\bf Q}_k^T\cdots{\bf Q}_0^T$ (95)

which is orthogonal as all ${\bf Q}_i$ are orthogonal. Pre-multiplying ${\bf Q}^{(k)}$ and post-multiplying $({\bf Q}^{(k)})^{-1}=({\bf Q}^{(k)})^T$ on both sides of the equation above, we get:

$\displaystyle {\bf A}={\bf Q}^{(k)}{\bf A}_{k+1}({\bf Q}^{(k)})^T$ (96)

i.e., ${\bf A}_{k+1}$ is orthogonally similar to ${\bf A}$. The iteration above will converge to an upper triangular matrix $\lim\limits_{k\rightarrow\infty}{\bf A}_k={\bf U}$, as stated in the following theorem.

Theorem: Let ${\bf A}$ have distinct eigenvalues satisfying $\vert\lambda_1\vert>\vert\lambda_2\vert>\cdots>\vert\lambda_N\vert$. The iteration on ${\bf A}_k$ of the QR algorithm given above converges to an upper triangular matrix ${\bf U}$, i.e.,

$\displaystyle ({\bf Q}^{(k)})^T{\bf A}_0{\bf Q}^{(k)}={\bf A}_k
\stackrel{k\rightarrow\infty}{\Longrightarrow}{\bf U}$ (97)

which leads to the Schur decomposition of ${\bf A}$:

$\displaystyle {\bf A}={\bf Q}{\bf U}{\bf Q}^{-1}={\bf Q}{\bf U}{\bf Q}^T$ (98)

where ${\bf Q}={\bf Q}^{(k)}$ is an orthogonal matrix.

The proof of this theorem can be found in $\S$ 7.3, Matrix Computations 4th ed. G.H. Golub and C. F. Van Loan, The Johns Hopkins University Press,

In this QR algorithm, the QR decomposition with complexity $O(N^3)$ is carried out in every iteration. To reduce the complexity, we can first convert ${\bf A}$ into a Hessenberg matrix ${\bf H}={\bf QAQ}^T$ with all components below the subdiagonal being zero. As the complexity of each column transformation is constant $O(1)$ (instead of $O(N)$), the QR decomposition of ${\bf H}$ is $N(O^2)$ (instead of $O(N^3)$, and the overall complexity of the iterative algorithm is much reduced.

As ${\bf A}$ and ${\bf U}={\bf Q}^T{\bf AQ}$ are similar, they share the same eigenvalues $\lambda_1,\cdots,\lambda_N$, which are on the diagonal elements of the upper triangular matrix ${\bf U}$. We can further find the corresponding eigenvectors of ${\bf A}$ by either of the two methods below: