Optimality of KLT

The KLT is the optimal orthogonal transform among all possible orthogonal transforms ${\bf y}={\bf A}^T{\bf x}$ based on any orthogonal matrix ${\bf A}$ satisfying ${\bf A}^T{\bf A}={\bf I}$ , in the following sense:

KLT completely decorrelates the signal.
KLT maximally compacts the energy (information) contained in the signal.

KLT Completely Decorrelates the Signal

The mean vector and covariance matrix of the random vector ${\bf y}={\bf V}^T{\bf x}$ after the KLT transform can found as

$\displaystyle {\bf m}_y$	$\displaystyle =$	$\displaystyle E[ {\bf y}]=E[{\bf V}^T {\bf x}] ={\bf V}^T E[{\bf x}] ={\bf V}^T {\bf m}_x$	(60)
$\displaystyle {\bf\Sigma}_y$	$\displaystyle =$	$\displaystyle E[({\bf y}-{\bf m}_y)({\bf y}-{\bf m}_y)^T] =E[{\bf V}^T ({\bf x}-{\bf m}_x) ][{\bf V}^T ({\bf x}-{\bf m}_x) ]^T$
	$\displaystyle =$	$\displaystyle {\bf V}^TE \left[({\bf x}-{\bf m}_x)({\bf x}-{\bf m}_x)^T\right]{\bf V} = {\bf V}^T{\bf\Sigma}_x{\bf V}={\bf\Lambda}$
	$\displaystyle =$	$\displaystyle \left[ \begin{array}{ccc} \lambda_1 & \cdots & 0 \\ \vdots & \dd... ... \vdots & \ddots & \vdots \\ 0 & \cdots & \sigma^2_{{y_d}} \end{array} \right]$	(61)

Based on the fact that the covariance matrix ${\bf\Sigma}_y$ is a diagnal matrix, we have the following two observations:

The zero off-diagonal component $\sigma_{ij}^2=0\;(i\ne j)$ for the covariance between any two components and indicates they are completely decorrelated after the KLT, i.e., each component carries its own independent information;
The nonzero diagonal component for the variance $\lambda_i=\sigma_i^2=E[(y_i-\mu_{y_i})^2]\;(i=1,\cdots,c)$ of the ith component represents the dynamic energy or amount of information contained in . As ${\bf\Sigma}_x$ and ${\bf\Sigma}_y={\bf\Lambda}$ share the same eigenvalues $\{\lambda_1,\cdots,\lambda_d\}$ , and therefore have the same trace

$\displaystyle {\cal E}_x=tr{\bf\Sigma}_x=\sum_{i=1}^d\lambda_i=tr{\bf\Sigma}_y ={\cal E}_y$ (62)

We see that the total amount of energy or information contained in the signal ${\bf x}$ before the KLT and ${\bf y}$ after the KLT remains the same, i.e., the energy or information contained in the signal is conserved by the KLT. This is generally true for all orthogonal transforms ${\bf y}={\bf A}^T{\bf x}$ (such as the Fourier transform), due to the fact that the eigenvalues remain the same under any orthogonal transform ${\bf B}={\bf R}^T{\bf AR}$ , i.e., matrices ${\bf A}$ and ${\bf B}$ share the same eigenvalues, and thereby the same trace and determinant (see here).

KLT Optimally Compacts Signal Energy

While the total energy contained in the signal is conserved by any orthogonal transform, the distribution of this energy among the $d$ compoents before and after the transform may be very different. We can show that the KLT is optimal in the sense that it redistributes the total energy in such a way that it is maximally compacted into a subset of components of ${\bf y}={\bf V}^T{\bf x}$ .

Consider a generic orthorgonal transform ${\bf y}={\bf A}^T{\bf x}$ based on an arbitary orthogonal matrix ${\bf A}=[{\bf a}_1,\cdots,{\bf a}_d]$ satisfying ${\bf A}^T={\bf A}^{-1}$ :

$\displaystyle {\bf y}=\left[ \begin{array}{c} y_1 \\ \vdots \\ y_d \end{array} ... ...{ccc} &{\bf a}_1^T & \\ &\vdots& \\ & {\bf a}_d^T & \end{array} \right]{\bf x}$

(63)

We also have

$\displaystyle y_i={\bf a}_i^T\;{\bf x},\;\;\;\;\;\;\; \mu_{y_i}=E(y_i)={\bf a}_i^T\;E({\bf x}) ={\bf a}_i^T{\bf m}_x, \;\;\;\;\;\;\;\; (i=1,\cdots,d)$

(64)

We then consider the energy contained in $d'<d$ out of the $d$ components of ${\bf y}={\bf A}^T{\bf x}$ expressed as a function of the transform matrix ${\bf A}=[{\bf a}_1,\cdots,{\bf a}_d]$ :

$\displaystyle {\cal E}'({\bf A})$	$\displaystyle =$	$\displaystyle \sum_{i=1}^{d'}\sigma_{y_i}^2 =\sum_{i=1}^{d'} E\left[(y_i-\mu_{y... ...{\bf a}_i^T({\bf x}-{\bf m}_{x_i})\; ({\bf x}-{\bf m}_{x_i})^T{\bf a}_i \right]$
	$\displaystyle =$	$\displaystyle \sum_{i=1}^{d'} {\bf a}_i^{T} E\left[({\bf x}-{\bf m}_{x_i}) ({\b... ...)^T\right] {\bf a}_i^{T} =\sum_{i=1}^{d'} {\bf a}_i^{T} {\bf\Sigma}_x {\bf a}_i$	(65)

We can find the optimal orthogonal transform matrix ${\bf A}$ that maximizes this partial energy ${\cal E}'$ by solving the following optimization problem with the constraint that the column vectors of ${\bf A}$ are normalized with ${\bf a}_j^{T}{\bf a}_j=1$ :

$\displaystyle \left\{ \begin{array}{l} \mbox{maximize: } {\cal E}'({\bf A}) \\ ... ...o: }\;\; {\bf a}_j^{T}{\bf a}_j=1 \;\;\;\;(j=1, \cdots, d') \end{array} \right.$

(66)

The Lagrange function of this constrained optimization problem is

$\displaystyle {\cal L}({\bf A})={\cal E}'({\bf A}) -\sum_{j=1}^{d'}\lambda_j({\bf a}_j^T{\bf a}_j-1)$

(67)

To find the optimal ${\bf A}$ , we set partial derivative with respect to each ${\bf a}_i\;(i=1,\cdots,d')$ to zero:

$\displaystyle \frac{\partial}{\partial {\bf a}_i}{\cal L}({\bf A})$	$\displaystyle =$	$\displaystyle \frac{\partial}{\partial {\bf a}_i}\left[\sum_{j=1}^{d'} ({\bf a}_j^T{\bf\Sigma}_x{\bf a}_j-\lambda_j {\bf a}_j^T{\bf a}_j+\lambda_j) \right]$
	$\displaystyle =$	$\displaystyle \frac{\partial}{\partial {\bf a}_i}\left({\bf a}_i^T{\bf\Sigma}_x... ...f a}_i^T{\bf a}_i \right) = 2{\bf\Sigma}_x{\bf a}_i-2\lambda_i{\bf a}_i ={\bf0}$	(68)

The resulting equation ${\bf\Sigma}_x{\bf a}_i=\lambda_i{\bf a}_i$ happens to be the eigenequation of the covariance matrix ${\bf\Sigma}_x$ , i.e., the column vectors of the optimal transform matrix ${\bf A}$ must be the eigenvectors of ${\bf\Sigma}_x$ satisfying:

$\displaystyle {\bf\Sigma}_x{\bf a}_i=\lambda_i{\bf a}_i,\;\;\;\;\;$ i.e. $\displaystyle \;\;\;\;\; {\bf a}_i^T{\bf\Sigma}_x{\bf a}_i=\lambda_i,\;\;\;\;\;\;(i=1,\cdots,d')$

(69)

We therefore see that the optimal transform is indeed the KLT transform

$\displaystyle {\bf A}=[{\bf a}_1, \cdots,{\bf a}_d]={\bf V}=[{\bf v}_1, \cdots, {\bf v}_d]$

(70)

and

$\displaystyle {\cal E}'({\bf V})=\sum_{i=1}^{d'} {\bf v}_i^{T}{\bf\Sigma}_x{\bf... ...i = \sum_{i=1}^{d'} \lambda_i \geq \sum_{i=1}^d \lambda_i={\cal E}_y={\cal E}_x$

(71)

This partial energy ${\cal E}'({\bf V})$ is maximized if we choose those eigenvectors ${\bf v}_i$ corresponding to the $d'$

greatest eigenvalues of ${\bf\Sigma}_x$ : $\lambda_1 \geq \lambda_2 \geq \cdots \lambda_{d'} \cdots \geq \lambda_d$ .