next up previous
Next: Geometric Interpretation of KLT Up: klt Previous: KLT Completely Decorrelates the

KLT Optimally Compacts Signal Energy

Now we show that KLT redistributes the energy contained in the signal so that it is maximally compacted into a small number of components after the KLT transform. Let ${\bf A}$ be an arbitrary orthogonal matrix satisfying ${\bf A}^{-1}={\bf A}^{T}$, and we represent ${\bf A}$ in terms of its column vectors ${\bf a}_i,\;\;\;\;(i=1,
\cdots, N)$ as

\begin{displaymath}{\bf A}=[{\bf a}_1, \cdots, {\bf a}_{N}],\;\;\;\;\;\;\mbox{or...
...{\bf a}_1^{T}  \vdots  {\bf a}_{N}^{T} \end{array} \right] \end{displaymath}

Based on this matrix $A$, an orthogonal transform of a given signal vector ${\bf x}$ can be defined as

\begin{displaymath}
{\bf y}=\left[ \begin{array}{c} y_1  \vdots  y_{N} \end...
...;\;\;\;\;
y_i={\bf a}_i^{T}\;{\bf x},\;\;\;\;\;(i=1,\cdots,N)
\end{displaymath}

The inverse transform is:

\begin{displaymath}{\bf x}={\bf A}{\bf y}=[{\bf a}_1,\cdots,{\bf a}_{N}]
\left[ ...
...ots  y_{N} \end{array} \right]
=\sum_{i=1}^{N} y_i {\bf a}_i \end{displaymath}

Now we consider the variances of the signal components before and after the KLT transform:

\begin{displaymath}\sigma_{x_i}^2=E [(x_i-\mu_{x_i})^2] \;\stackrel{\triangle}{=...
...i}^2=E [(y_i-\mu_{y_i})^2] \;\stackrel{\triangle}{=}E(e_{y_i}) \end{displaymath}

where $e_{x_i}\stackrel{\triangle}{=}(x_i-\mu_{x_i})^2$ can be considered as the dynamic energy or information contained in the ith component of the signal, and the trace of the covariance matrix $tr {\bf\Sigma}_x$ represents the total energy ${\cal E}_x$ or information contained in the signal ${\bf x}$:

\begin{displaymath}
{\cal E}_x= tr {\bf\Sigma}_x
=\sum_{i=1}^{N} \sigma_{x_i}^2=\sum_{i=1}^{N} E[({x_i}-\mu_{x_i})^2]
=\sum_{i=1}^{N}E(e_{x_i})
\end{displaymath}

Due to the commutativity of trace: $tr({\bf AB})=tr({\bf BA})$, we have:

\begin{displaymath}
{\cal E}_y=tr {\bf\Sigma_y}=tr ({\bf\Phi}^T{\bf\Sigma}_x{\b...
...{\bf\Phi}^T{\bf\Phi}{\bf\Sigma}_x)=tr {\bf\Sigma}_x={\cal E}_x
\end{displaymath}

reflecting the fact that the total energy or information of the signal is conserved after the KLT transform, or any unitary (orthogonal) transform (Parseval's theorem. However, the energy distribution among all $N$ components can be very different before and after the KLT transform, as shown below.

We define the energy contained in the first $M<N$ components after the transform ${\bf y}={\bf A}^T{\bf x}$ as

\begin{displaymath}
S_M({\bf A})\stackrel{\triangle}{=}\sum_{i=1}^{M} E\left[(y_...
...right]
=\sum_{i=1}^{M}\sigma_{y_i}^2=\sum_{i=1}^{M}E(e_{y_i})
\end{displaymath}

$S_M({\bf A})$ is a function of the transform matrix ${\bf A}$. Since the total energy is conserved, $S_M({\bf A})$ is also proportional to the percentage of energy contained in the first $M$ components. In the following we will show that $S_M({\bf A})$ is maximized if and only if the transform matrix is the same as that of the KLT:

\begin{displaymath}S_M({\bf\Phi}) \geq S_M({\bf A}) \end{displaymath}

In other words, the KLT optimally compacts energy into a few components of the signal. Consider
$\displaystyle S_M({\bf A})$ $\textstyle \stackrel{\triangle}{=}$ $\displaystyle \sum_{i=1}^{M} E(y_i-\mu_{y_1})^2
=\sum_{i=1}^{M} E\left[ {\bf a}_i^{T}({\bf x}-{\bf m}_{x_i})\;({\bf x}-{\bf m}_{x_i})^T{\bf a}_i \right]$  
  $\textstyle =$ $\displaystyle \sum_{i=1}^{M} {\bf a}_i^{T}E\left[({\bf x}-{\bf m}_{x_i}) ({\bf ...
...)^T\right] {\bf a}_i^{T}
=\sum_{i=1}^{M} {\bf a}_i^{T} {\bf\Sigma}_x {\bf a}_i$  

Now we need to find a transform matrix ${\bf A}=[{\bf a}_1,\cdots,{\bf a}_N]$, so that

\begin{displaymath}\left\{ \begin{array}{l} S_M({\bf A}) \rightarrow max \\
\m...
...j^{T}{\bf a}_j=1 \;\;\;\;(j=1, \cdots, M)
\end{array} \right.
\end{displaymath}

The constraint ${\bf a}_j^{T}{\bf a}_j=1$ is to guarantee that the column vectors in ${\bf A}$ are normalized. This constrained optimization problem can be solved using Lagrange multiplier method by letting the following partial derivative be zero:
    $\displaystyle \frac{\partial}{\partial {\bf a}_i}\left[S_M({\bf A})-\sum_{j=1}^...
...j^{T}{\bf\Sigma}_x{\bf a}_j-\lambda_j {\bf a}_j^{T}{\bf a}_j+\lambda_j) \right]$  
  $\textstyle =$ $\displaystyle \frac{\partial}{\partial {\bf a}_i}\left({\bf a}_i^{T}{\bf\Sigma}...
...=} 2{\bf\Sigma}_x{\bf a}_i-2\lambda_i{\bf a}_i =0, \;\;\;\;\;\;(i=1, \cdots, M)$  

(* The last equal sign is due to explanation in the review of linear algebra.) We see that the column vectors of ${\bf A}$ must be the eigenvectors of ${\bf\Sigma}_x$:

\begin{displaymath}{\bf\Sigma}_x{\bf a}_i=\lambda_i{\bf a}_i \;\;\;\;\;\;(i=1, \cdots, M) \end{displaymath}

i.e., ${\bf a}_i={\bf\phi}_i$ has to be an eigen-vector of ${\bf\Sigma}_x$, and the transform matrix must be

\begin{displaymath}{\bf A}=[{\bf a}_1, \cdots,{\bf a}_{N}]={\bf\Phi}=[{\bf\phi}_1, \cdots, {\bf\phi}_{N}] \end{displaymath}

where ${\bf\phi}_i$'s are the orthogonal eigenvectors of ${\bf\Sigma}_x$ corresponding to eigenvalues $\lambda_i$ ($i=1,\cdots,N$):

\begin{displaymath}{\bf\Sigma}_x {\bf\phi}_i=\lambda_i {\bf\phi}_i,\;\;\;\;\;\mb...
...ma}_x {\bf\phi}_i=\lambda_i {\bf\phi}_i^T{\bf\phi}_i=\lambda_i \end{displaymath}

Thus we have proved that the optimal transform is indeed KLT, and

\begin{displaymath}
S_M({\bf\Phi})=\sum_{i=1}^{M} {\bf\phi}_i^{T}{\bf\Sigma}_x{\...
...hi}_i =
\sum_{i=1}^{M} \lambda_i=\sum_{i=1}^{M} \sigma_{x_i}^2
\end{displaymath}

where the ith eigenvalue $\lambda_i$ of ${\bf\Sigma}_x$ is also the average (expectation) energy contained in the ith component of the signal. If we choose those ${\bf\phi}_i's$ that correspond to the $M$ largest eigenvalues of ${\bf\Sigma}_x$: $\lambda_1 \geq \lambda_2 \geq \cdots \lambda_M \cdots \geq \lambda_{N}$, then $S_M({\bf\Phi})$ will be maximized.

Due to KLT's properties of signal decorrelation and energy compaction, it can be used for data compression by reducing the dimensionality of the data set. Specifically, we carry out the following steps:

  1. Find the mean vector ${\bf m}_x$ and the covariance matrix ${\bf\Sigma}_x$ of the signal vectors ${\bf x}$.
  2. Find the eigenvalues $\lambda_i$, $(i=1,\cdots,N$) of ${\bf\Sigma}_x$, sorted in descending order, and their corresponding eigenvectors ${\bf\phi}_i$ $(i=1,\cdots,N$).
  3. Choose a lowered dimensionality $M<N$ so that the percentage of energy contained is no less than a given threshold (e.g., 95%):

    \begin{displaymath}
\frac{\sum_{i=1}^{M} \lambda_i}{\sum_{i=1}^{N} \lambda_i}
...
...1}^{M} \sigma^2_{x_i}}{\sum_{i=1}^{N} \sigma^2_{x_i}}\ge 0.95
\end{displaymath}

  4. Construct an $N\times M$ transform matrix composed of the $M$ largest eigenvectors of ${\bf\Sigma}_x$:

    \begin{displaymath}
{\bf\Phi}_M=[ {\bf\phi}_1,\cdots,{\bf\phi}_M ]_{N\times M}
\end{displaymath}

    and carry out KLT based on ${\bf\Phi}_M$:

    \begin{displaymath}
{\bf y}=\left[ \begin{array}{c} y_1 \vdots  y_{M} \end{...
...\vdots  \vdots  x_{N}
\end{array} \right]_{N \times 1}
\end{displaymath}

    As the dimension $M$ of ${\bf y}$ is less than the dimension $N$ of ${\bf x}$, data compression is achieved for storage or transmission. This is a lossy compression with the error representing the percentage of information lost: $\sum_{i=M+1}^{N} \lambda_i / \sum_{i=1}^{N} \lambda_i$. But as these $\lambda_i$'s for $i=M+1,\cdots,N$ are the smallest eigenvalues, the error is minimized (e.g., 5%).
  5. Reconstruct ${\bf x}$ by inverse KLT:

    \begin{displaymath}{\bf x}=\left[ \begin{array}{c} x_1 \vdots  \vdots  x_{...
...{c} y_1 \vdots  y_{M}
\end{array} \right]_{M \times 1}
\end{displaymath}


next up previous
Next: Geometric Interpretation of KLT Up: klt Previous: KLT Completely Decorrelates the
Ruye Wang 2016-04-06