Next: About this document ... Up: compression Previous: Predictive coding

Transform Coding (lossy) and JPEG Image Compression

The Joint Photographic Experts Group (JPEG) is the working group of ISO, International Standard Organization, that defined the popular JPEG Imaging Standard for compression used in still image applications. The counter part in moving picture is the ``Moving Picture Experts Group" (MPEG).

JPEG compression is based on certain transform, either DCT or wavelet transform, due to the essential properties of orthogonal transforms in general:

Decorrelation of the signal;
Compaction of its energy.

Check this ACM page for review of DCT vs. wavelet transform used for image compression.

Here are the steps of JPEG image compression based on DCT:

Divide the image to form a set of $8 \times 8$ blocks and carry out 2D DCT transform of each block. The computational complexity for 2D DCT of an $N \times N$ image is , while the complexity of 2D DCT of all by blocks of image is

$\begin{displaymath}\frac{N^2}{8^2}\;O(8^2 log_2 8)=O(N^2) \end{displaymath}$

The larger the image size , the more saving by sub-block transform. As adjacent pixels are highly correlated, most of energy in an 8 by 8 block is concentrated in the low frequency region of the spectrum (upper-left corner) and the rest transform coefficients are very close to zero.
Threshold all DCT coefficients smaller than a value T to zero, or alternatively, low-pass (either ideal or smooth) filter the 2D DCT spectrum of each sub-image;
Quantize remaining coefficients (convert floating-point values to integers). First, the elements in each block are divided (element-wise) by the elements in a quantization matrix Q:

$\begin{displaymath}y(i,j)= \left[ \frac{x(i,j)}{Q(i,j)} \right] \end{displaymath}$

where

$\begin{displaymath}Q=\left[ \begin{array}{ccc} ... & ... & ... ... & q_{ij} & ... \\ ... & ... & ... \end{array} \right]_{8\times 8} \end{displaymath}$

and each of the resulting 8 by 8 elements is rounded to the nearest integer ( represents rounding to the closest integer). At the receiving end, the coefficients are recovered by:

$\begin{displaymath}\hat{x}=Q(i,j) \cdot y(i,j) \end{displaymath}$

Two observations can be made:
- Larger causes larger error. Let , and K be an integer as the rounding result of a pixel , then the possible value for the pixel is in the range:
  
  $\begin{displaymath}KC-\frac{C}{2} \le x(i,j) < KC+\frac{C}{2} \end{displaymath}$
  
  i.e., the range of rounding error is proportional to .
- Larger tends to suppress more pixels to zero and they will not be recovered at the receiving end.
In general, assign smaller numbers around the top-left corner (low frequency components) and larger ones around the lower-right corner (high frequency components). The values are also heuristically determined according to perceptual and psycho-visual tests.
Predictive code all DC components of the blocks (as the DC components are highly correlated);
Scan the rest coefficients in each block in a zigzag way (for higher probability of longer consecutive 0's) to code them by run-length encoding;
Huffman code the data stream;
Store and/or transmit the encoded image as well as the quantization matrix.

Next: About this document ... Up: compression Previous: Predictive coding

Ruye Wang 2021-03-28