Computation of the KLT

Due to KLT's properties of signal decorrelation and energy compaction, it can be used for data compression by reducing the dimensionality of the dataset. Specifically, we carry out the following steps:

Find the mean vector ${\bf m}_x$ and the covariance matrix ${\bf\Sigma}_x$ of the random vectors ${\bf x}$ based on the given dataset ${\bf X}=[{\bf x}_1,\cdots,{\bf x}_N]$ containing samples of ${\bf x}$ .
Find the eigenvalues $\lambda_1\ge\lambda_2\ge\cdots\ge\lambda_d$ of ${\bf\Sigma}_x$ , sorted in descending order, and their corresponding eigenvectors $\{{\bf v}_1,{\bf v}_2,\cdots,{\bf v}_d\}$ .
Determine the reduced dimensionality so that the percentage of energy contained after the transform is no less than a given threshold (e.g., 95%):

$\displaystyle \frac{\sum_{i=1}^{d'} \lambda_i}{\sum_{i=1}^d \lambda_i} =\frac{\sum_{i=1}^{d'} \sigma^2_{y_i}}{\sum_{i=1}^d \sigma^2_{y_i}}\ge 0.95$ (77)
Construct an $d\times d'$ transform matrix composed of the eigenvectors corresponding to the greatest eigenvalues of ${\bf\Sigma}_x$ :

$\displaystyle {\bf V}'=[ {\bf v}_1,\cdots,{\bf v}_{d'} ]_{d\times d'}$ (78)

and carry out the KLT based on ${\bf V}'$ :

$\displaystyle {\bf y}=\left[ \begin{array}{c} y_1\\ \vdots\\ y_{d'}\end{array}\... ...\begin{array}{c} x_1\\ \vdots \\ \vdots \\ x_d \end{array} \right]_{d \times 1}$ (79)

This is a lossy compression by which the dimensionality is reduced from to , with the following error representing the percentage of information lost:

$\displaystyle \sum_{i=d'+1}^d \lambda_i / \sum_{i=1}^d \lambda_i$ (80)

But as $\lambda_{d'+1},\cdots,\lambda_d$ are the smallest eigenvalues, the error is minimum (e.g., 5%).
Reconstruct ${\bf x}$ by inverse KLT:

$\displaystyle {\bf x} =\left[ \begin{array}{c} x_1\\ \vdots \\ \vdots \\ x_d \e... ...dots \\ y_{d '} \end{array} \right]_{d' \times 1} =\sum_{i=1}^{d'} y_i{\bf v}_i$ (81)

The rank of the $d \times d$ estimated covariance matrix $\hat{\bf\Sigma}_x$ in Eq. (40) based on the dataset ${\bf X}=[{\bf x}_1,\cdots,{\bf x}_N]$ is no greater than $N-1$ . When $N\le d$ , then the rank of $\hat{\bf\Sigma}$ is at most $N-1<d$ , i.e., there are no more than $N-1$ non-zero eigenvalues while all remaining $d-(N-1)=d-N+1$ eigenvalues are zero. Also, if the dimensionality $d \gg N$ is high, the computational complexity $O(d^3)$ for solving this eigenvalue problem of $\hat{\bf\Sigma}_x$ may be expensive. In this case, we can find the $N-1$ non-zero eigenvalues and their corresponding eigenvectors of $\hat{\bf\Sigma}_x$ by solving the eigenvalue problem of an $N\times N$ matrix with much lower complexity of $O(N^3)$ .

Specifically, for convenience and without loss of generality, we assume that the data have zero mean ${\bf m}_x={\bf0}$ , so that the covariance matrix of the given dataset ${\bf X}$ can be estimate as

$\displaystyle \hat{\bf\Sigma}_x =\frac{1}{N}\sum_{n=1}^N({\bf x}_n-{\bf m}_x)({... ...}{c}{\bf x}_1^T\\ \vdots\\ {\bf x}_N^T\end{array}\right] =\frac{1}{N}{\bf XX}^T$

and the eigenequation of this $d \times d$ matrix can be written as

$\displaystyle \hat{\bf\Sigma}_x{\bf v}_i=\frac{1}{N}{\bf XX}^T{\bf v}_i=\lambda_i{\bf v}_i$

(82)

Pre-multiplying ${\bf X}^T$ on both sides, we get the eigenequation of an $N\times N$ matrix ${\bf X}^T{\bf X}/N$ :

$\displaystyle \frac{1}{N}{\bf X}^T{\bf X}\left({\bf X}^T{\bf v}_i\right) =\lambda_i\left( {\bf X}^T{\bf v}_i\right) \;\;\;\;\;\;$ i.e. $\displaystyle \;\;\;\;\;\; \frac{1}{N}{\bf X}^T{\bf X}{\bf u}_i=\lambda_i{\bf u}_i$

(83)

where ${\bf u}_i={\bf X}^T{\bf v}_i$ is the eigenvector of ${\bf X}^T{\bf X}/N$ . We see that ${\bf XX}^T/N$ and ${\bf X}^T{\bf X}/N$ share the same non-zero eigenvalues. When $N<d$

, we can solve the eigenvalue problem in Eq. (83) with complexity $O(N^3)$

, instead that of the original problem in Eq. (82) with $O(d^3)$

, to get the $N-1$

non-zero eigenvalues $\lambda_i,\;(i=1,\cdots,N-1)$ , and the corresponding eigenvectors ${\bf u}_i={\bf X}^T{\bf v}_i$ . To find the eigenvectors ${\bf v}_i$ of the original matrix $\hat{\bf\Sigma}_x={\bf XX}^T/N$ , we pre-multiply ${\bf X}$ on both sides of the eigenequation above and get

$\displaystyle \frac{1}{N}{\bf X}{\bf X}^T \left({\bf X}{\bf u}_i\right) =\lambda_i\left({\bf X}{\bf u}_i\right)$

(84)

Comparing this with Eq. (82), we see that ${\bf Xu}_i={\bf XX}^T{\bf v}_i=N\lambda_i{\bf v}_i$ is proportional to the original eigenvector ${\bf v}_i$ corresponding to $\lambda_i$ , and they become the same when normalized.