Search SRILM-USER Archives

Match: Format: Sort by:
Search:

Perplexity calculation: Strange behavior

From: Stefan Hahn <hahn at ADDRESS HIDDEN>
Date: Wed, 31 Aug 2005 20:31:45 +0200

Hi!

During some language modeling using the SRI Toolkit (V.1.4.3 and V.1.4.5) on
i686 Intel GNU/Linux I encountered some strange behavior concerning perplexity
calculation:
For any order greater than 3, the perplexity calculated with ngram seems to be
fixed and wrong.
For example, I used Defoe's "Robinson Crusoe" to create modified Kneser-Ney
discounted Language Models for orders 1 up to 6 and calculated the perplexity
for the same text using "ngram" and our own software:

        +------------------------+
        I      perplexity        I
+-------+-------------+----------+
I order | SRI-Toolkit I our Tool I
+-------+-------------+----------+
I   1   I   394.79    I 394.794  I
+-------+-------------+----------+
I   2   I   68.0706   I 68.071   I
+-------+-------------+----------+
I   3   I   54.29     I 54.2903  I
+-------+-------------+----------+
I   4   I   57.1554   I 52.6306  I
+-------+-------------+----------+
I   5   I   57.1554   I 52.6502  I
+-------+-------------+----------+
I   6   I   57.1554   I 52.7033  I
+-------+-------------+----------+

The script I used to download "Robinson Crusoe", create the LMs and
SRI-results:

wget "http://www-i6.informatik.rwth-aachen.de/~gollan/make-lm-01.sh"
chmod a+x make-lm-01.sh
./make-lm-01.sh

Is there any error in my script?
Thanks,
Stefan

Click here to go to the SRILM home page.