Assume a set of symbols (26 English letters and some additional symbols such
as space, period, etc.) is to be transmitted through the communication channel.
These symbols can be treated as
independent samples of a random variable
with probability
and entropy
. For example,
is much higher than
. To minimize the code length, number
of bits, for these symbols, it is natural to assign shorter code for symbols of
high probabilities. For example, shorter code for ``s'' than the code for ``z''.
The length of the code for a symbol
with
can be its surprise
. Let
be the average number of bits to encode the
symbols.
Shannon proved that the minimum
satisfies
In reality, the true probability is not available and can only be estimated
by
to be used for source coding purpose. In this case, the minimum
satisfies: