Some Useful Theorems next up previous
Next: Coding Theory Up: Information Theory Previous: Derivation of Shannon's Result

Some Useful Theorems

Theorem 1: $\underset{P}{\rm argmax}\quad H(P) = (1/n, 1/n, \cdots, 1/n)$

Interpretation.

We have least information when we have no reason to prefer any one outcome over others.

Proof.

Since $\log_e x = k\log_n x$, a proof using the former notion of the $\log$ function as below is equally valid when the base of the logarithm is changed.

Let $\Delta = H(p_1, p_2, \cdots, p_n) - H(1/n, 1/n, \cdots, 1/n)$

Then

\begin{eqnarray*}\Delta &=& - \sum_{i=1}^n p_i \log p_i + \log 1/n\\
&=& \sum_...
...^n p_i \log p_i\\
&=& \sum_{i=1}^n p_i \log \frac{1}{np_i} \\
\end{eqnarray*}


Since $\log x \le x-1$, we have

\begin{eqnarray*}\Delta &\le& \sum_{i=1}^n p_i \left(\frac{1}{np_i} - 1\right)\\...
...c{1}{n} - \sum_{i=1}^n p_i\\
&\le& 0 \qquad \mbox {\bf Q.E.D.}
\end{eqnarray*}


Theorem 2: $H(P) \ge 0$ with equality iff some $p_i \in P = 1$

Interpretation.

Information/uncertainty cannot be a negative quantity.

Proof.

Note that we conventionally define $0\log 0$ to be $\lim\limits_{x\downarrow 0} x\log x = 0$

Now the proof is simple. $\forall i: 0 \le p_i \le 1$ whence $p_i
\log p_i <= 0$. Consequently, $\forall i: - p_i \log p_i \ge 0$ and so sum of many such quantities is also non-negative. Equality results iff pi = 1 for some i and $\forall j \ne i: p_j = 0$.

Theorem 3: Jensen's inequality: $H(P) \le -\sum_{i=1}^n p_i \log q_i$ with equality iff $\forall i: p_i = q_i$

Interpretation.

Misestimation of the distribution governing the symbols in a source results in increased uncertainty/information.

Let $\Delta = -\sum_{i=1}^m p_i \log p_i + \sum_{i=1}^m p_i \log q_i$

Again, we use the property that $\log x \le x-1$. Then,

\begin{eqnarray*}\Delta &=& \sum_{i=1}^m p_i \log \frac{q_i}{p_i}\\
&\le& \sum_{i=1}^m p_i \left(\frac{q_i}{p_i} - 1\right)\\
&\le& 0\\
\end{eqnarray*}


Q.E.D.

The very useful quantity $\sum_i p_i\log\frac{p_i}{q_i}$ is often denoted by I(p:q) or D(p||q) and is called the Kullback-Liebler distance, I-directed divergence, relative entropy, or discrimination between the densities p and q. It is also defined for the continuous case by:

\begin{displaymath}I(p:q) = \int_{\Omega}p(\omega)\log{\frac{p(\omega)}
{q(\omega)}d\omega}
\end{displaymath}


next up previous
Next: Coding Theory Up: Information Theory Previous: Derivation of Shannon's Result
Anand Venkataraman
1999-09-16