What is the lower bound on the average code length?
Let
denote the smallest integer
.
Then suppose
the length of symbol i from a source S is given by
where pi is the probability of i.
Then,
giving
.
Therefore
The Kraft inequality holds and so the code is practical.
Consider now the average length
keeping in
mind that
.
where H(S) is the entropy of the source. So the chosen scheme is not more than 1 greater than the source entropy!
How closely can
in fact approach the source entropy? The
answer is ``as closely as we please'' if we code the symbols in
blocks of k, i.e. instead of coding each symbol on the fly, we wait
until we accumulate k symbols and then encode the k-length block
as a single unit.
For example, with a binary alphabet and a block size of 2, instead of assigning codes to 1 and 0, we would assign codes to the 4 possible strings 00, 01, 10 and 11.
Let S be a source whose alphabet consists of M symbols
.
If
.
Now consider a source S' whose alphabet consists of all possible
k-length sequences
drawn independently from
the alphabet of S.
If we assume that S' is memoryless and that each xi was
drawn independently using the same alphabet and distribution as the
original source, then
We will show that by coding in this manner, and making the above memoryless assumption, we can obtain an entropy for the block-coding source that is arbitrarily close to the entropy of the original source.
If
is the average length of a block code from S', then
since
,
we have
.
Thus,
and so
.
Since
,
the average per-symbol length of
the block-coding source is equal to the entropy of the original source
in the limit.