Search SRILM-USER Archives

Match: Format: Sort by:
Search:

Re: Perplexity

From: Andreas Stolcke <stolcke at ADDRESS HIDDEN>
Date: Mon, 12 Feb 2007 09:41:23 -0800

Martha Yifiru wrote:
> Hi,
>
> I want to compare morph-based language model with
> word-based one. To do this I have to do some
> manipulation on the calculation of perplexity for
> morph-based language model so as to have fair
> comparison. I was thinking that the source code for
> perplexity calculation is in ngram.cc but it does not
> seem that the actual perplexity calculation is in
> ngram.cc.
>
> Can anyone help me?
>
>  
The source code for perplexity computation is in lm/src/TextStats.cc .
However, there is no need to modify the code.
When you have different token counts (words versus morphs) the
perplexities are no longer comparable, but the log probabilities are.
You can get the log probability from the perplexity output, e.g.:

file ../ngram-count-gt/eval97.text: 5290 sentences, 38238 words, 681 OOVs
0 zeroprobs, logprob= -86334.6 ppl= 103.502 ppl1= 198.958
                                   ^^^^^^^^
Assume the "words" in this example are actually morphs, and the actual
number
of words (including sentence boundaries) is less, say, 25000.  then the
word-perplexity is

    10^ -(-86334.6 / 25000 ) = 2840.43

--Andreas

Click here to go to the SRILM home page.