Next: Positive/Negative (semi)-definite matrices Up: algebra Previous: Hermitian and unitary matrices

Eigenvalues and eigenvectors

For any matrix ${\bf A}$ , if there exist a vector $\phi$ and a value $\lambda$ such that

$\begin{displaymath}{\bf A}{\bf\phi}=\lambda {\bf\phi} \end{displaymath}$

then $\lambda$ and ${\bf\phi}$ are called the eigenvalue and eigenvector of matrix ${\bf A}$ , respectively. Note that the eigenvector ${\bf\phi}$ is not unique but up to any scaling factor, i.e, if ${\bf\phi}$ is the eigenvector of ${\bf A}$ so is $c{\bf\phi}$ with any costant

. Typically for the uniqueness of ${\bf\phi}$ , we keep if normalized with $\vert\vert{\bf\phi}\vert\vert=1$ .

To obtain $\lambda$ , rewrite the above equation as

$\begin{displaymath}(\lambda {\bf I}-{\bf A}) {\bf\phi} =0 \end{displaymath}$

which is a homogeneous equation system. For this system to have non-zero solution for ${\bf\phi}$ , the determinant of its coefficient matrix has to be zero:

$\begin{displaymath} det(\lambda {\bf I}-{\bf A})=\vert \lambda {\bf I}-{\bf A} \vert = 0 \end{displaymath}$

Solving this

th order equation of $\lambda$ , we get

eigenvalues $\{ \lambda_1,\cdots,\lambda_n \}$ . Substituting each $\lambda_i$ back into the equation system, we get the corresponding eigenvector ${\bf\phi_i}$ . We can put all

equations ${\bf A}{\bf\phi}_i=\lambda_i{\bf\phi}_i$ together and have

$\begin{displaymath} {\bf A}[{\bf\phi_1},\cdots,{\bf\phi}_n] =[\lambda_1 {\bf\p... ... \vdots \\ 0 & 0 & \cdots & \lambda_n \end{array} \right] \end{displaymath}$

or in a more compact matrix form,

$\begin{displaymath} {\bf A}{\bf\Phi}={\bf\Phi}{\bf\Lambda}, \;\;\;\;\;\;\mbox{or}\;\;\;\;\;\;\; {\bf\Phi}^{-1}{\bf A}{\bf\Phi}={\bf\Lambda} \end{displaymath}$

where

$\begin{displaymath} {\bf\Lambda}=diag[\lambda_1,\cdots,\lambda_n] \;\;\;\;\;\... ...and}\;\;\;\;\;\;\; {\bf\Phi}=[{\bf\phi}_1,\cdots,{\bf\phi}_n] \end{displaymath}$

are the diagonal eigenvalue matrix and eigenvector matrix, respectively.

The spectrum of an $N\times N$ square matrix ${\bf A}$ the set of its eigenvalues $\{\lambda_1,\cdots,\lambda_N\}$ . The spectral radius of ${\bf A}$ , denoted by $\rho({\bf A})$ , is the maximum of the absolute values of the elements of its spectrum:

$\begin{displaymath} \rho({\bf A})=\max(\vert\lambda_1\vert,\cdots,\vert\lambda_N\vert) \end{displaymath}$

The absolute value of a complex number

is $\vert z\vert=\sqrt{x^2+y^2}$ , the Euclidean distance between

and the origin of the complex plane. If all eigenvalues are sorted such that $\vert\lambda_1\vert\ge\cdots \ge\vert\lambda_N\vert$ then $\rho({\bf A})=\vert\lambda_1\vert$ .

The trace and determinant of ${\bf A}$ can be obtained from its eigenvalues

$\begin{displaymath} tr({\bf A}) =\sum_{k=1}^n \lambda_k, \;\;\;\;\;\mbox{and}\;\;\;\;\;\;\; det({\bf A}) =\prod_{k=1}^n \lambda_k \end{displaymath}$

It can also be shown that

${\bf A}^T$ has the same eigenvalues and eigenvectors as ${\bf A}$ .
${\bf A}^m$ has the same eigenvectors as ${\bf A}$ , but its eigenvalues are $\{\lambda_1^m,\cdots,\lambda_n^m \}$ , where is a positive integer.
In particular when , i.e., the eigenvalues of ${\bf A}^{-1}$ are $\{1/\lambda_1,\cdots,1/\lambda_n \}$ .

The eigenvalues and eigenvectors of a matrix are invariant under any unitary transform ${\bf B}={\bf R}^*{\bf A}{\bf R}$ . To prove this, we first assume ${\bf\Lambda}=diag(\lambda_1,\cdots,\lambda_N)$ and ${\bf\Phi}$ are the eigenvalue and eigenvector matrices of a square matrix ${\bf A}$ :

$\begin{displaymath} {\bf A}{\bf\Phi}={\bf\Phi}{\bf\Lambda} \end{displaymath}$

and ${\bf\Sigma}=diag(\sigma_1,\cdots,\sigma_N)$ and ${\bf\Psi}$ are the eigenvalue and eigenvector matrices of ${\bf B}={\bf R}^*{\bf A}{\bf R}$ , a unitary transform of ${\bf A}$ :

$\begin{displaymath} {\bf B}{\bf\Psi}=\left({\bf R}^*{\bf A}{\bf R}\right){\bf\Psi}={\bf\Psi}{\bf\Sigma} \end{displaymath}$

Left-multiplying ${\bf R}$ on both sides we get the eigenequation of ${\bf A}$

$\begin{displaymath} {\bf R}{\bf R}^*{\bf A}{\bf R}{\bf\Psi} ={\bf A}{\bf R}{\bf\Psi}={\bf R}{\bf\Psi}{\bf\Sigma} \end{displaymath}$

with same eigenvalue and igenvector matrices

$\begin{displaymath} {\bf R}{\bf\Psi}={\bf\Phi},\;\;\;\;\;\;{\bf\Sigma}={\bf\Lambda} \end{displaymath}$

If ${\bf A}$ is Hermitian (symmetric if real), such as a covarance matrix, then all of its eigenvalues $\lambda_i$ 's are real and all eigenvectors ${\bf\phi}_i$ 's are orthogonal and can be normalized (orthonormal):

$\begin{displaymath} <{\bf\phi}_i,{\bf\phi}_j>=\delta_{ij}=\left\{\begin{array}{ll} 1&i=j\\ 0&i\ne j\end{array}\right. \end{displaymath}$

and the eigenvector matrix ${\bf\Phi}$ is unitary (orthogonal if ${\bf A}$ is real):

$\begin{displaymath} {\bf\Phi}^{-1}={\bf\Phi}^* \end{displaymath}$

and we have

$\begin{displaymath} {\bf\Phi}^{-1}{\bf A}{\bf\Phi}={\bf\Phi}^*{\bf A}{\bf\Phi}={\bf\Lambda}, \end{displaymath}$

Left and right multiplying by ${\bf\Phi}$ and ${\bf\Phi}^*={\bf\Phi}^{_1}$ respectively on the two sides, we get

$\begin{displaymath} {\bf A}={\bf\Phi} {\bf\Lambda}{\bf\Phi}^*=[{\bf\phi}_1,\cdo... ...} \right] =\sum_{i=1}^n \lambda_i {\bf\phi}_i {\bf\phi}_i^* \end{displaymath}$

The generalized eigenvalue problem of two symmetric matrices ${\bf A}={\bf A}^T$ and ${\bf B}={\bf B}^T$ is to find a scalar $\lambda$ and the corresponding vector ${\bf\phi}$ for the following equation to hold:

$\begin{displaymath} {\bf A}{\bf\phi}_i=\lambda_i{\bf B}{\bf\phi}_i, \;\;\;\;\;\;\;(i=1,\cdots,n) \end{displaymath}$

or in matrix form

$\begin{displaymath} {\bf A}{\bf\Phi}={\bf B}{\bf\Phi}{\bf\Lambda} \end{displaymath}$

The eigenvalue and eigenvector matrices ${\bf\Lambda}$ and ${\bf\Phi}$ can be found in the following steps.

Solve the eigenvalue problem of ${\bf B}$ to find its diagonal eigenvalue matrix ${\bf\Lambda}_B$ and orthogonal eigenvector matrix ${\bf\Phi}_B=({\bf\Phi}_B^T)^{-1}$ so that

$\begin{displaymath} {\bf B}{\bf\Phi}_B={\bf\Phi}_B{\bf\Lambda}_B, \;\;\;\;\;\;... ...}{\bf\Phi}_B={\bf\Phi}_B^T{\bf B}{\bf\Phi}_B ={\bf\Lambda}_B \end{displaymath}$
Left and right multiplying both sides of the second equation above by ${\bf\Lambda}^{-1/2}$ (whitening) we get

$\begin{displaymath} {\bf\Lambda}^{-1/2}_B ({\bf\Phi}_B^T{\bf B}{\bf\Phi}_B) {\b... ...f\Lambda}^{-1/2}_B{\bf\Lambda}_B{\bf\Lambda}^{-1/2}_B={\bf I} \end{displaymath}$

We define

$\begin{displaymath} {\bf\Phi}'_B={\bf\Phi}_B{\bf\Lambda}^{-1/2}_B \end{displaymath}$

and get

$\begin{displaymath} ({\bf\Phi}'_B)^T{\bf B}{\bf\Phi}'_B={\bf I} \end{displaymath}$

Note that ${\bf\Phi}_B'$ is not orthogonal

$\begin{displaymath} ({\bf\Phi}')^{-1}_B=({\bf\Phi}_B{\bf\Lambda}^{-1/2}_B)^{-1}... ...hi}^T_B \ne {\bf\Lambda}^{-1/2}_B{\bf\Phi}^T_B={\bf\Phi}^T_B \end{displaymath}$
Apply the same transform to ${\bf A}$ :

$\begin{displaymath} ({\bf\Phi}'_B)^T{\bf A}{\bf\Phi}'_B =({\bf\Lambda}^{-1/2}_... ...Phi}_B^T){\bf A}({\bf\Phi}_B{\bf\Lambda}^{-1/2}_B) ={\bf A}' \end{displaymath}$

Note that ${\bf A}'$ is symmetric as well as ${\bf A}$ :

$\begin{displaymath} {\bf A}'^T=({\bf\Phi}'_B^T{\bf A}{\bf\Phi}'_B)^T ={\bf\Phi}'_B^T{\bf A}{\bf\Phi}'_B={\bf A}' \end{displaymath}$
Diagonalize ${\bf A}'$
As ${\bf A}'$ is symmetric,, it can be diagonalized by its orthogonal eigenvector matrix ${\bf\Phi}_A$ :

$\begin{displaymath} {\bf\Phi}_A^T{\bf A}'{\bf\Phi}_A={\bf\Lambda} \end{displaymath}$

i.e.,

$\displaystyle {\bf\Phi}_A^T ({\bf\Lambda}^{-1/2}_B{\bf\Phi}_B^T{\bf A}{\bf\Phi}_B{\bf\Lambda}^{-1/2}_B) {\bf\Phi}_A$ $\textstyle =$ $\displaystyle ({\bf\Phi}_A^T {\bf\Lambda}^{-1/2}_B{\bf\Phi}_B^T){\bf A}( {\bf\Phi}_B{\bf\Lambda}^{-1/2}_B {\bf\Phi}_A)$

$\textstyle =$ $\displaystyle {\bf\Phi}^T {\bf A}{\bf\Phi} ={\bf\Lambda}$

where we have defined

$\begin{displaymath} {\bf\Phi}={\bf\Phi}_B{\bf\Lambda}^{-1/2}_B{\bf\Phi}_A \end{displaymath}$

which is not orthogonal:

$\begin{displaymath} {\bf\Phi}^{-1}=( {\bf\Phi}_B{\bf\Lambda}^{-1/2}_B{\bf\Phi}_... ...e {\bf\Phi}_A^T{\bf\Lambda}^{-1/2}_B{\bf\Phi}_B^T={\bf\Phi}^T \end{displaymath}$
This ${\bf\Phi}$ also diagonalizes ${\bf B}$ :

$\displaystyle {\bf\Phi}^T{\bf B}{\bf\Phi}$ $\textstyle =$ $\displaystyle ({\bf\Phi}_B{\bf\Lambda}^{-1/2}_B{\bf\Phi}_A)^T {\bf B}({\bf\Phi... ...da}^{-1/2}_B ({\bf\Phi}_B^T{\bf B}{\bf\Phi}_B){\bf\Lambda}^{-1/2}_B{\bf\Phi}_A$

$\textstyle =$ $\displaystyle {\bf\Phi}_A^T{\bf\Lambda}^{-1/2}_B{\bf\Lambda}_B{\bf\Lambda}^{-1/2}_B{\bf\Phi}_A ={\bf\Phi}_A^T{\bf\Phi}_A={\bf I}$
Now we have

$\begin{displaymath} \left\{ \begin{array}{l} {\bf\Phi}^T{\bf A}{\bf\Phi}={\bf\Lambda}\\ {\bf\Phi}^T{\bf B}{\bf\Phi}={\bf I}\end{array}\right. \end{displaymath}$

Right multiplying both sides of the second equation by ${\bf\Lambda}$ and equating the left-hand side to that of the first equation, we get

$\begin{displaymath} {\bf A}{\bf\Phi}={\bf B}{\bf\Phi}{\bf\Lambda} \end{displaymath}$

i.e., ${\bf\Lambda}$ and ${\bf\Phi}$ are the eigenvalue and eigenvector matrices of the generalized eigenvalue problem. Note, however, as shown above, ${\bf\Phi}$ is not orthogonal.

The Rayleigh quotient of two symmetric matrices ${\bf A}$ and ${\bf B}$ is a function of a vector ${\bf w}$ defined as:

$\begin{displaymath} R({\bf w})=\frac{{\bf w}^T{\bf A}{\bf w}}{{\bf w}^T{\bf B}{\bf w}} \end{displaymath}$

To find the optimal ${\bf w}$ corresponding to the extremum (maximum or minimum) of $R({\bf w})$ , we find its derivative with respect to ${\bf w}$ :

$\begin{displaymath} \frac{d}{d{\bf w}}R({\bf w}) =\frac{2{\bf A}{\bf w}({\bf w}^... ...\bf w}({\bf w}^T{\bf A}{\bf w})} {({\bf w}^T{\bf B}{\bf w})^2} \end{displaymath}$

Setting it to zero we get

$\begin{displaymath} {\bf A}{\bf w}({\bf w}^T{\bf B}{\bf w})={\bf B}{\bf w}({\bf... ...bf B}{\bf w} =R({\bf w}){\bf B}{\bf w}=\lambda {\bf B}{\bf w} \end{displaymath}$

The second equation can be recognized as a generalized eigenvalue problem with $\lambda=R({\bf w})$ being the eigenvalue and and ${\bf w}$ the corresponding eigenvector. Solving this we get the vector ${\bf w}={\bf\phi}$ corresponding to the maximum/minimum eigenvalue $\lambda=R({\bf w})$ , which maximizes/minimizes the Rayleigh quotient.

Next: Positive/Negative (semi)-definite matrices Up: algebra Previous: Hermitian and unitary matrices

Ruye Wang 2014-06-05

$\displaystyle {\bf\Phi}_A^T ({\bf\Lambda}^{-1/2}_B{\bf\Phi}_B^T{\bf A}{\bf\Phi}_B{\bf\Lambda}^{-1/2}_B) {\bf\Phi}_A$	$\textstyle =$	$\displaystyle ({\bf\Phi}_A^T {\bf\Lambda}^{-1/2}_B{\bf\Phi}_B^T){\bf A}( {\bf\Phi}_B{\bf\Lambda}^{-1/2}_B {\bf\Phi}_A)$
	$\textstyle =$	$\displaystyle {\bf\Phi}^T {\bf A}{\bf\Phi} ={\bf\Lambda}$

$\displaystyle {\bf\Phi}^T{\bf B}{\bf\Phi}$	$\textstyle =$	$\displaystyle ({\bf\Phi}_B{\bf\Lambda}^{-1/2}_B{\bf\Phi}_A)^T {\bf B}({\bf\Phi... ...da}^{-1/2}_B ({\bf\Phi}_B^T{\bf B}{\bf\Phi}_B){\bf\Lambda}^{-1/2}_B{\bf\Phi}_A$
	$\textstyle =$	$\displaystyle {\bf\Phi}_A^T{\bf\Lambda}^{-1/2}_B{\bf\Lambda}_B{\bf\Lambda}^{-1/2}_B{\bf\Phi}_A ={\bf\Phi}_A^T{\bf\Phi}_A={\bf I}$