Our derivation will follow directly from our earlier
definition of information. Let
denote the information content of a source consisting of n
equiprobable symbols.
Now we know
Then,
,
Now consider three sources with sm, tn and sm+1 equiprobable symbols respectively. Using property (3), A(sm) = mA(s), A(tn) = nA(t) and A(sm+1) = (m+1)A(s). (Why? Discuss this.)
Since A must increase monotonically with the
number of equiprobable choices (property 2),
or equivalently using property (3) and dividing by nA(s),
Since n is aribtrarily large,
Let
pi = ni/N where the ni are integers and
.
Now consider a choice over N symbols. We can break this down into a
choice over m symbols each with probability pi as defined above
and then within each choice i make a second choice over ni
equally likely symbols. Let
.
Using
(10), we can now write:
Hence,
Since
,
If the pi are real numbers and not rational as per our above
assumption, they can be approximated by rationals as closely as we
please. And since H is a continuous function by property (1), the
expression is valid for
.
By analogy with thermodynamics, we call H the Entropy of the source. Note that we choose to interpret the entropy of the source as the amount of information contained in it.