next up previous
Next: Comparison with Other Orthogonal Up: pca Previous: KLT Completely Decorrelates the

KLT Optimally Compacts the Energy

Consider a general orthogonal transform pair defined as

\begin{displaymath}\left\{ \begin{array}{l} Y=A^{T}X  X=AY \end{array} \right. \end{displaymath}

where $X$ and $Y$ are N by 1 vectors and $A$ is an arbitrary N by N orthogonal matrix $A^{-1}=A^{T}$.

We represent $A$ by its column vectors $A_i,\;\;\;\;(i=0, \cdots, N-1)$ as

\begin{displaymath}A=[A_0, \cdots, A_{N-1}] \end{displaymath}

or

\begin{displaymath}A^{T}=\left[ \begin{array}{c} A_0^{T}  . . A_{N-1}^{T} \end{array} \right]
\end{displaymath}

Now the ith component of $Y$ can be written as

\begin{displaymath}y_i=A_i^{T}\;X \end{displaymath}

As we assume the mean vector of $X$ is zero $M_X=0$ (and obviously we also have $M_Y=A^TM_x=0$), we have $\Sigma_X=R_X$, and the variance of the ith element in both $X$ and $Y$ are

\begin{displaymath}\sigma_{x_i}^2=E (x_i^2) \;\stackrel{\triangle}{=}E(e_{x_i})
\end{displaymath}

and

\begin{displaymath}\sigma_{y_i}^2=E (y_i^2) \;\stackrel{\triangle}{=}E(e_{y_i})
\end{displaymath}

where $e_{x_i}\stackrel{\triangle}{=}x_i^2$ and $e_{y_i}\stackrel{\triangle}{=}y_i^2$ represent the energy contained in the ith component of $X$ and $Y$, respectively. In order words, the trace of $\Sigma_X$ (the sum of all the diagonal elements of the matrix) represents the expectation of the total amount of energy contained in the signal $X$

\begin{displaymath}\mbox{Total energy contained in $X$}= tr \Sigma_X=\sum_{i=0}^...
...x_i}^2
=\sum_{i=0}^{N-1} E (x_i^2)=E(\sum_{i=0}^{N-1}e_{x_i})
\end{displaymath}

Since an orthogonal transform $A$ does not change the length of a vector X, i.e., $\parallel Y \parallel = \parallel AX \parallel = \parallel X \parallel$, where

\begin{displaymath}\parallel X \parallel \stackrel{\triangle}{=}\sqrt{ \sum_{i=0}^{N-1} x_i^2 }=
\sqrt{ \sum_{i=0}^{N-1} e_{x_i} }
\end{displaymath}

the total energy contained in the signal vector $X$ is conserved after the orthogonal transform. (This conclusion can also be obtained from the fact that orthogonal transforms do not change the trace of a matrix.)

We next define

\begin{displaymath}S_m(A)\stackrel{\triangle}{=}\sum_{i=0}^{m-1} E(y_i^2)=
\sum_{i=0}^{m-1}\sigma_{y_i}^2=
\sum_{i=0}^{m-1}E(e_{y_i})
\end{displaymath}

where $m \leq N$. $S_m(A)$ is a function of the transform matrix $A$ and represents the amount of energy contained in the first $m$ components of $Y=A^{T}X$. Since the total energy is conserved, $S_m(A)$ also represents the percentage of energy contained in the first $m$ components. In the following we will show that $S_m(A)$ is maximized if and only if the transform $A$ is the KLT:

\begin{displaymath}S_m(\Phi) \geq S_m(A) \end{displaymath}

i.e., KLT optimally compacts energy into a few components of the signal.

Consider

$\displaystyle S_m(A)$ $\textstyle \stackrel{\triangle}{=}$ $\displaystyle \sum_{i=0}^{m-1} E(y_i^2)=
\sum_{i=0}^{m-1} E[A_i^{T}X (A_i^{T}X)^{T}]$  
  $\textstyle =$ $\displaystyle \sum_{i=0}^{m-1} E[A_i^{T}X (X^{T}A_i)]
=\sum_{i=0}^{m-1} A_i^{T}E(X X^{T})A_i$  
  $\textstyle =$ $\displaystyle \sum_{i=0}^{m-1} A_i^{T} R_X A_i$  

Now we need to find a transform matrix $A$ so that

\begin{displaymath}\left\{ \begin{array}{l} S_m(A) \rightarrow max \\
\mbox{su...
...A_j^{T}A_j=1 \;\;\;\;(j=0, \cdots, m-1)$}
\end{array} \right.
\end{displaymath}

The constraint $A_j^{T}A_j=1$ is to guarantee that the column vectors in $A$ are normalized. This constrained optimization problem can be solved by Lagrange multiplier method as shown below.

We let

    $\displaystyle \frac{\partial}{\partial A_i}[S_m(A)-\sum_{j=0}^{m-1}
\lambda_j(A_j^{T}A_j-1) ]=0$  
  $\textstyle =$ $\displaystyle \frac{\partial}{\partial A_i}[\sum_{j=0}^{m-1}
(A_j^{T}R_XA_j-\lambda_j A_j^{T}A_j+\lambda_j) ]$  
  $\textstyle =$ $\displaystyle \frac{\partial}{\partial A_i}
[A_i^{T}R_XA_i-\lambda_i A_i^{T}A_i ]$  
  $\textstyle \stackrel{*}{=}$ $\displaystyle 2R_xA_i-2\lambda_iA_i =0$  

(* the last equal sign is due to explanation in the handout of review of linear algebra.) We see that the column vectors of $A$ must be the eigenvectors of $R_X$:

\begin{displaymath}R_XA_i=\lambda_iA_i \;\;\;\;\;\;(i=0, \cdots, m-1) \end{displaymath}

i.e., the transform matrix must be

\begin{displaymath}A=[A_0, \cdots,A_{N-1}]=\Phi=[\phi_0, \cdots, \phi_{N-1}] \end{displaymath}

Thus we have proved that the optimal transform is indeed KLT, and

\begin{displaymath}
S_m(\Phi)=\sum_{i=0}^{m-1} \phi_i^{T}R_X\phi_i = \sum_{i=0}^{m-1} \lambda_i
\end{displaymath}

where the ith eigenvalue $\lambda_i$ of $R_X$ is also the average (expectation) energy contained in the ith component of the signal. If we choose those $\phi_i's$ that correspond to the $m$ largest eigenvalues of $R_X$: $\lambda_0 \geq \lambda_1 \geq \cdots \lambda_m \cdots \geq \lambda_{N-1}$, then $S_m(\Phi)$ will achieve maximum.


next up previous
Next: Comparison with Other Orthogonal Up: pca Previous: KLT Completely Decorrelates the
Ruye Wang 2004-09-29