Optimality of KLT

The KLT is the optimal orthogonal transform among all possible orthogonal transforms ${\bf y}={\bf A}^T{\bf x}$ based on any orthogonal matrix ${\bf A}$ satisfying ${\bf A}^T{\bf A}={\bf I}$, in the following sense:

KLT Completely Decorrelates the Signal

The mean vector and covariance matrix of the random vector ${\bf y}={\bf V}^T{\bf x}$ after the KLT transform can found as

$\displaystyle {\bf m}_y$ $\displaystyle =$ $\displaystyle E[ {\bf y}]=E[{\bf V}^T {\bf x}]
={\bf V}^T E[{\bf x}] ={\bf V}^T {\bf m}_x$ (60)
$\displaystyle {\bf\Sigma}_y$ $\displaystyle =$ $\displaystyle E[({\bf y}-{\bf m}_y)({\bf y}-{\bf m}_y)^T]
=E[{\bf V}^T ({\bf x}-{\bf m}_x) ][{\bf V}^T ({\bf x}-{\bf m}_x) ]^T$  
  $\displaystyle =$ $\displaystyle {\bf V}^TE \left[({\bf x}-{\bf m}_x)({\bf x}-{\bf m}_x)^T\right]{\bf V}
= {\bf V}^T{\bf\Sigma}_x{\bf V}={\bf\Lambda}$  
  $\displaystyle =$ $\displaystyle \left[ \begin{array}{ccc}
\lambda_1 & \cdots & 0 \\
\vdots & \dd...
...
\vdots & \ddots & \vdots \\
0 & \cdots & \sigma^2_{{y_d}} \end{array} \right]$ (61)

Based on the fact that the covariance matrix ${\bf\Sigma}_y$ is a diagnal matrix, we have the following two observations:

KLT Optimally Compacts Signal Energy

While the total energy contained in the signal is conserved by any orthogonal transform, the distribution of this energy among the $d$ compoents before and after the transform may be very different. We can show that the KLT is optimal in the sense that it redistributes the total energy in such a way that it is maximally compacted into a subset of components of ${\bf y}={\bf V}^T{\bf x}$.

Consider a generic orthorgonal transform ${\bf y}={\bf A}^T{\bf x}$ based on an arbitary orthogonal matrix ${\bf A}=[{\bf a}_1,\cdots,{\bf a}_d]$ satisfying ${\bf A}^T={\bf A}^{-1}$:

$\displaystyle {\bf y}=\left[ \begin{array}{c} y_1 \\ \vdots \\ y_d \end{array} ...
...{ccc} &{\bf a}_1^T & \\ &\vdots& \\
& {\bf a}_d^T & \end{array} \right]{\bf x}$ (63)

We also have

$\displaystyle y_i={\bf a}_i^T\;{\bf x},\;\;\;\;\;\;\; \mu_{y_i}=E(y_i)={\bf a}_i^T\;E({\bf x})
={\bf a}_i^T{\bf m}_x, \;\;\;\;\;\;\;\; (i=1,\cdots,d)$ (64)

We then consider the energy contained in $d'<d$ out of the $d$ components of ${\bf y}={\bf A}^T{\bf x}$ expressed as a function of the transform matrix ${\bf A}=[{\bf a}_1,\cdots,{\bf a}_d]$:

$\displaystyle {\cal E}'({\bf A})$ $\displaystyle =$ $\displaystyle \sum_{i=1}^{d'}\sigma_{y_i}^2
=\sum_{i=1}^{d'} E\left[(y_i-\mu_{y...
...{\bf a}_i^T({\bf x}-{\bf m}_{x_i})\;
({\bf x}-{\bf m}_{x_i})^T{\bf a}_i \right]$  
  $\displaystyle =$ $\displaystyle \sum_{i=1}^{d'} {\bf a}_i^{T}
E\left[({\bf x}-{\bf m}_{x_i}) ({\b...
...)^T\right] {\bf a}_i^{T}
=\sum_{i=1}^{d'} {\bf a}_i^{T} {\bf\Sigma}_x {\bf a}_i$ (65)

We can find the optimal orthogonal transform matrix ${\bf A}$ that maximizes this partial energy ${\cal E}'$ by solving the following optimization problem with the constraint that the column vectors of ${\bf A}$ are normalized with ${\bf a}_j^{T}{\bf a}_j=1$:

$\displaystyle \left\{ \begin{array}{l} \mbox{maximize: } {\cal E}'({\bf A}) \\ ...
...o: }\;\; {\bf a}_j^{T}{\bf a}_j=1 \;\;\;\;(j=1, \cdots, d')
\end{array} \right.$ (66)

The Lagrange function of this constrained optimization problem is

$\displaystyle {\cal L}({\bf A})={\cal E}'({\bf A})
-\sum_{j=1}^{d'}\lambda_j({\bf a}_j^T{\bf a}_j-1)$ (67)

To find the optimal ${\bf A}$, we set partial derivative with respect to each ${\bf a}_i\;(i=1,\cdots,d')$ to zero:
$\displaystyle \frac{\partial}{\partial {\bf a}_i}{\cal L}({\bf A})$ $\displaystyle =$ $\displaystyle \frac{\partial}{\partial {\bf a}_i}\left[\sum_{j=1}^{d'}
({\bf a}_j^T{\bf\Sigma}_x{\bf a}_j-\lambda_j {\bf a}_j^T{\bf a}_j+\lambda_j) \right]$  
  $\displaystyle =$ $\displaystyle \frac{\partial}{\partial {\bf a}_i}\left({\bf a}_i^T{\bf\Sigma}_x...
...f a}_i^T{\bf a}_i \right)
= 2{\bf\Sigma}_x{\bf a}_i-2\lambda_i{\bf a}_i ={\bf0}$ (68)

The resulting equation ${\bf\Sigma}_x{\bf a}_i=\lambda_i{\bf a}_i$ happens to be the eigenequation of the covariance matrix ${\bf\Sigma}_x$, i.e., the column vectors of the optimal transform matrix ${\bf A}$ must be the eigenvectors of ${\bf\Sigma}_x$ satisfying:

$\displaystyle {\bf\Sigma}_x{\bf a}_i=\lambda_i{\bf a}_i,\;\;\;\;\;$i.e.$\displaystyle \;\;\;\;\;
{\bf a}_i^T{\bf\Sigma}_x{\bf a}_i=\lambda_i,\;\;\;\;\;\;(i=1,\cdots,d')$ (69)

We therefore see that the optimal transform is indeed the KLT transform

$\displaystyle {\bf A}=[{\bf a}_1, \cdots,{\bf a}_d]={\bf V}=[{\bf v}_1, \cdots, {\bf v}_d]$ (70)

and

$\displaystyle {\cal E}'({\bf V})=\sum_{i=1}^{d'} {\bf v}_i^{T}{\bf\Sigma}_x{\bf...
...i
= \sum_{i=1}^{d'} \lambda_i \geq \sum_{i=1}^d \lambda_i={\cal E}_y={\cal E}_x$ (71)

This partial energy ${\cal E}'({\bf V})$ is maximized if we choose those eigenvectors ${\bf v}_i$ corresponding to the $d'$ greatest eigenvalues of ${\bf\Sigma}_x$: $\lambda_1 \geq \lambda_2 \geq \cdots \lambda_{d'} \cdots \geq \lambda_d$.