Next: Comparison with Other Orthogonal Up: pca Previous: KLT Completely Decorrelates the

KLT Optimally Compacts the Energy

Consider a general orthogonal transform pair defined as

$\begin{displaymath}\left\{ \begin{array}{l} Y=A^{T}X X=AY \end{array} \right. \end{displaymath}$

where

and

are N by 1 vectors and

is an arbitrary N by N orthogonal matrix $A^{-1}=A^{T}$ .

We represent by its column vectors $A_i,\;\;\;\;(i=0, \cdots, N-1)$ as

$\begin{displaymath}A=[A_0, \cdots, A_{N-1}] \end{displaymath}$

$\begin{displaymath}A^{T}=\left[ \begin{array}{c} A_0^{T} . . A_{N-1}^{T} \end{array} \right] \end{displaymath}$

Now the ith component of

can be written as

$\begin{displaymath}y_i=A_i^{T}\;X \end{displaymath}$

As we assume the mean vector of is zero (and obviously we also have ), we have $\Sigma_X=R_X$ , and the variance of the ith element in both and are

$\begin{displaymath}\sigma_{x_i}^2=E (x_i^2) \;\stackrel{\triangle}{=}E(e_{x_i}) \end{displaymath}$

and

$\begin{displaymath}\sigma_{y_i}^2=E (y_i^2) \;\stackrel{\triangle}{=}E(e_{y_i}) \end{displaymath}$

where $e_{x_i}\stackrel{\triangle}{=}x_i^2$ and $e_{y_i}\stackrel{\triangle}{=}y_i^2$ represent the energy contained in the ith component of

and

, respectively. In order words, the trace of $\Sigma_X$ (the sum of all the diagonal elements of the matrix) represents the expectation of the total amount of energy contained in the signal

$\begin{displaymath}\mbox{Total energy contained in $X$}= tr \Sigma_X=\sum_{i=0}^... ...x_i}^2 =\sum_{i=0}^{N-1} E (x_i^2)=E(\sum_{i=0}^{N-1}e_{x_i}) \end{displaymath}$

Since an orthogonal transform

does not change the length of a vector X, i.e., $\parallel Y \parallel = \parallel AX \parallel = \parallel X \parallel$ , where

$\begin{displaymath}\parallel X \parallel \stackrel{\triangle}{=}\sqrt{ \sum_{i=0}^{N-1} x_i^2 }= \sqrt{ \sum_{i=0}^{N-1} e_{x_i} } \end{displaymath}$

the total energy contained in the signal vector

is conserved after the orthogonal transform. (This conclusion can also be obtained from the fact that orthogonal transforms do not change the trace of a matrix.)

We next define

$\begin{displaymath}S_m(A)\stackrel{\triangle}{=}\sum_{i=0}^{m-1} E(y_i^2)= \sum_{i=0}^{m-1}\sigma_{y_i}^2= \sum_{i=0}^{m-1}E(e_{y_i}) \end{displaymath}$

where $m \leq N$ .

is a function of the transform matrix

and represents the amount of energy contained in the first

components of $Y=A^{T}X$ . Since the total energy is conserved,

also represents the percentage of energy contained in the first

components. In the following we will show that

is maximized if and only if the transform

is the KLT:

$\begin{displaymath}S_m(\Phi) \geq S_m(A) \end{displaymath}$

i.e., KLT optimally compacts energy into a few components of the signal.

Consider

$\displaystyle S_m(A)$	$\textstyle \stackrel{\triangle}{=}$	$\displaystyle \sum_{i=0}^{m-1} E(y_i^2)= \sum_{i=0}^{m-1} E[A_i^{T}X (A_i^{T}X)^{T}]$
	$\textstyle =$	$\displaystyle \sum_{i=0}^{m-1} E[A_i^{T}X (X^{T}A_i)] =\sum_{i=0}^{m-1} A_i^{T}E(X X^{T})A_i$
	$\textstyle =$	$\displaystyle \sum_{i=0}^{m-1} A_i^{T} R_X A_i$

Now we need to find a transform matrix so that

$\begin{displaymath}\left\{ \begin{array}{l} S_m(A) \rightarrow max \\ \mbox{su... ...A_j^{T}A_j=1 \;\;\;\;(j=0, \cdots, m-1)$} \end{array} \right. \end{displaymath}$

The constraint $A_j^{T}A_j=1$ is to guarantee that the column vectors in

are normalized. This constrained optimization problem can be solved by Lagrange multiplier method as shown below.

We let

		$\displaystyle \frac{\partial}{\partial A_i}[S_m(A)-\sum_{j=0}^{m-1} \lambda_j(A_j^{T}A_j-1) ]=0$
	$\textstyle =$	$\displaystyle \frac{\partial}{\partial A_i}[\sum_{j=0}^{m-1} (A_j^{T}R_XA_j-\lambda_j A_j^{T}A_j+\lambda_j) ]$
	$\textstyle =$	$\displaystyle \frac{\partial}{\partial A_i} [A_i^{T}R_XA_i-\lambda_i A_i^{T}A_i ]$
	$\textstyle \stackrel{*}{=}$	$\displaystyle 2R_xA_i-2\lambda_iA_i =0$

(* the last equal sign is due to explanation in the handout of review of linear algebra.) We see that the column vectors of

must be the eigenvectors of

$\begin{displaymath}R_XA_i=\lambda_iA_i \;\;\;\;\;\;(i=0, \cdots, m-1) \end{displaymath}$

i.e., the transform matrix must be

$\begin{displaymath}A=[A_0, \cdots,A_{N-1}]=\Phi=[\phi_0, \cdots, \phi_{N-1}] \end{displaymath}$

Thus we have proved that the optimal transform is indeed KLT, and

$\begin{displaymath} S_m(\Phi)=\sum_{i=0}^{m-1} \phi_i^{T}R_X\phi_i = \sum_{i=0}^{m-1} \lambda_i \end{displaymath}$

where the ith eigenvalue $\lambda_i$ of

is also the average (expectation) energy contained in the ith component of the signal. If we choose those $\phi_i's$ that correspond to the

largest eigenvalues of

: $\lambda_0 \geq \lambda_1 \geq \cdots \lambda_m \cdots \geq \lambda_{N-1}$ , then $S_m(\Phi)$ will achieve maximum.

Next: Comparison with Other Orthogonal Up: pca Previous: KLT Completely Decorrelates the

Ruye Wang 2004-09-29