[SRILM User List] different perplexity

Md. Akmal Haidar akmalcuet00 at yahoo.com
Fri Aug 28 12:17:10 PDT 2009


 
Hi,
Thanks for your reply.
I need to compare two lm file by perplexity evaluation.
 
1. i) ngram -lm general.lm -lambda .5 -mix-lm l1.lm -ppl test1.txt 
    ii) ngram -lm general.lm -lambda .5 -mix-lm l1.lm -ppl test1.txt -bayes 0
        in both commands it gives same perplexity but when
2. i) ngram -lm general.lm -lambda .5 -mix-lm l2.lm -ppl test1.txt 
        ppl=460
   ii)ngram -lm general.lm -lambda .5 -mix-lm l2.lm -ppl test1.txt -bayes 0
      ppl=148
    the 2(ii)  command gives lower perplexity.
 
could you please tell me why the second one gives lower perplexity? 
 
thanks
akmal
    
Md. Akmal Haidar wrote:
> Hi,
>  i faced a problem in perplexity calculation..
> when i used the commands: 1) ngram -lm l1.lm -ppl t.txt                                          2) ngram -lm l2.lm -lambda 0 -mix-lm l1.lm -ppl  t.txt
>  the first gives lowest perplexity that the second one.
> Should the above commands give the different perplexity?
They may, though not by much.

Realize that ngram -mix-lm WITHOUT the -bayes option performs an "ngram merging" that APPROXIMATES the result of interpolating the two LMs according to the classical formula.  This is describe in the the SRILM paper:
> The ability to approximate class-based and interpolated Ngram
> LMs by a single word N-gram model deserves some discussion.
> Both of these operations are useful in situations where
> other software (e.g., a speech recognizer) supports only standard
> N-grams. Class N-grams are approximated by expanding class labels
> into their members (which can contain multiword strings) and
> then computing the marginal probabilities of word N-gram strings.
> This operation increases the number of N-grams combinatorially,
> and is therefore feasible only for relatively small models.
> An interpolated backoff model is obtained by taking the union
> of N-grams of the input models, assigning each N-gram the
> weighted average of the probabilities from those models (in some
> of the models this probability might be computed by backoff), and
> then renormalizing the new model. We found that such interpolated
> backoff models consistently give slightly lower perplexities
> than the corresponding standard word-level interpolated models.
> The reason could be that the backoff distributions are themselves
> obtained by interpolation, unlike in standard interpolation, where
> each component model backs off individually.
So the result may differ because because the merging process introduces new backoff nodes into the LM and that may change some probabilities arrived at through backing off. However, if you use

  ngram -lm l2.lm -lambda 0 -mix-lm l1.lm -ppl  t.txt -bayes 0

you get exact interpolation and then the perplexities should be identical.
But you cannot save such an interpolated model back into a single ngram LM.

In practice the difference should not matter (at least in my experience).

Andreas


>  thanks
>  Akmal
> 
>  
> ------------------------------------------------------------------------
> 
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://www.speech.sri.com/mailman/listinfo/srilm-user



________________________________

From: Andreas Stolcke <stolcke at speech.sri.com>
To: Md. Akmal Haidar <akmalcuet00 at yahoo.com>
Cc: srilm-user <srilm-user at speech.sri.com>
Sent: Friday, August 28, 2009 1:39:45 PM
Subject: Re: [SRILM User List] different perplexity



      
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20090828/ca260f0e/attachment.html>


More information about the SRILM-User mailing list