Bounds on Code Length next up previous
Next: Introduction to MML/MDL Up: Information Theory Previous: The Kraft-Szilard Inequality (1949)

Bounds on Code Length

What is the lower bound on the average code length?

Let $\lceil x \rceil$ denote the smallest integer $j \ge x$. Then suppose the length of symbol i from a source S is given by $l_i =
\lceil -\log p_i \rceil$ where pi is the probability of i.

Then, $l_i \ge -\log p_i$ giving $-l_i \le \log p_i$. Therefore


\begin{eqnarray*}\sum_i 2^{-l_i} &\le& \sum_i 2^{-\log p_i}\\
&\le& \sum_i p_i\\
&\le& 1\\
\end{eqnarray*}


The Kraft inequality holds and so the code is practical.

Consider now the average length $\bar l = \sum_i p_i l_i$ keeping in mind that $x \le \lceil x \rceil < 1+x$.


\begin{eqnarray*}&&\bar l = \sum_i p_i \lceil -\log p_i \rceil\\
&&\Rightarrow ...
...sum_i p_i\log p_i\\
&&\Rightarrow H(S) \le \bar l < 1 + H(S)\\
\end{eqnarray*}


where H(S) is the entropy of the source. So the chosen scheme is not more than 1 greater than the source entropy!

How closely can $\bar l$ in fact approach the source entropy? The answer is ``as closely as we please'' if we code the symbols in blocks of k, i.e. instead of coding each symbol on the fly, we wait until we accumulate k symbols and then encode the k-length block as a single unit.

For example, with a binary alphabet and a block size of 2, instead of assigning codes to 1 and 0, we would assign codes to the 4 possible strings 00, 01, 10 and 11.

Let S be a source whose alphabet consists of M symbols $v_1, \cdots,
v_m$. If $P(v_i) = p_i, H(S) = - \sum_{i=1}^{M} p_i \log p_i$.

Now consider a source S' whose alphabet consists of all possible k-length sequences $x_1, x_2, \cdots, x_k$ drawn independently from the alphabet of S.

\begin{displaymath}H(S') = -\sum_{x_1, \cdots, x_k} P(x_1, \cdots, x_k)
\log P(x_1, \cdots, x_k)
\end{displaymath}

If we assume that S' is memoryless and that each xi was drawn independently using the same alphabet and distribution as the original source, then

\begin{displaymath}P(x_1, \cdots, x_k) = \prod_{i=1}^k P(x_i)
\end{displaymath}

We will show that by coding in this manner, and making the above memoryless assumption, we can obtain an entropy for the block-coding source that is arbitrarily close to the entropy of the original source.


\begin{eqnarray*}H(S') &=& -\sum_{x_1, \cdots, x_k} P(x_1, \cdots, x_k)
\log \...
..., \cdots, x_k} P(x_1, \cdots, x_k)
\sum_{i=1}^k \log P(x_i)\\
\end{eqnarray*}


The above quantity can be written as a sum over the alphabet $\{v_1,
v_2, \cdots, v_M\}$.

\begin{eqnarray*}H(S') &=& -\sum_{j=1}^M \left[\log P(v_j)
\sum_{\underset{{\r...
...right]\\
&=& -\sum_{j=1}^M k P(v_j)\log P(v_j) \\
&=& k H(S)
\end{eqnarray*}


If $\bar l'$ is the average length of a block code from S', then since $H(S') \le \bar l' < 1+H(S')$, we have $kH(S) \le \bar l' <
1+kH(S)$.

Thus, $H(S) \le \frac{\bar l'}{k} < H(S) + \frac{1}{k}$ and so $\lim\limits_{k \rightarrow \infty} \frac{\bar l'}{k} = H(S)$.

Since $\bar l = \frac{\bar l'}{k}$, the average per-symbol length of the block-coding source is equal to the entropy of the original source in the limit.


next up previous
Next: Introduction to MML/MDL Up: Information Theory Previous: The Kraft-Szilard Inequality (1949)
Anand Venkataraman
1999-09-16