Search SRILM-USER Archives

Match: Format: Sort by:
Search:

Re: Perplexity calculation: Strange behavior

From: Andreas Stolcke <stolcke at ADDRESS HIDDEN>
Date: Wed, 31 Aug 2005 13:06:53 PDT

In message <200508312031.45859.hahn at ADDRESS HIDDEN>you wrote:
> Hi!
>
> During some language modeling using the SRI Toolkit (V.1.4.3 and V.1.4.5) on
> i686 Intel GNU/Linux I encountered some strange behavior concerning perplexit
> y
> calculation:
> For any order greater than 3, the perplexity calculated with ngram seems to b
> e
> fixed and wrong.
> For example, I used Defoe's "Robinson Crusoe" to create modified Kneser-Ney
> discounted Language Models for orders 1 up to 6 and calculated the perplexity
>  
> for the same text using "ngram" and our own software:
>
>         +------------------------+
>         I      perplexity        I
> +-------+-------------+----------+
> I order | SRI-Toolkit I our Tool I
> +-------+-------------+----------+
> I   1   I   394.79    I 394.794  I
> +-------+-------------+----------+
> I   2   I   68.0706   I 68.071   I
> +-------+-------------+----------+
> I   3   I   54.29     I 54.2903  I
> +-------+-------------+----------+
> I   4   I   57.1554   I 52.6306  I
> +-------+-------------+----------+
> I   5   I   57.1554   I 52.6502  I
> +-------+-------------+----------+
> I   6   I   57.1554   I 52.7033  I
> +-------+-------------+----------+

I haven't looked at your script, but my guess is that you didn't specify
the -order option when evaluating the LM.  The default is to only use
up to trigram probabilities regardless of what is in the LM file.
(That's for historical reasons.)  So of course you get same result for
any LM order >=4 . Also, because of KN, you are getting a degradation
relative to the trigram, as the lower-order probabilities are optimized
to minimize the higher-order estimates.

If this is not the case then we may have a bug, but I can assure you that
we use order >= 4 all the time.

--Andreas

>
> The script I used to download "Robinson Crusoe", create the LMs and
> SRI-results:
>
> wget "http://www-i6.informatik.rwth-aachen.de/~gollan/make-lm-01.sh"
> chmod a+x make-lm-01.sh
> ./make-lm-01.sh
>
> Is there any error in my script?
> Thanks,
>  Stefan

Click here to go to the SRILM home page.