[SRILM User List] different perplexity

Andreas Stolcke stolcke at speech.sri.com
Fri Aug 28 10:39:45 PDT 2009


Md. Akmal Haidar wrote:
> Hi,
>  
> i faced a problem in perplexity calculation..
> when i used the commands: 1) ngram -lm l1.lm -ppl t.txt 
>                                           2) ngram -lm l2.lm -lambda 0 
> -mix-lm l1.lm -ppl  t.txt
>  
> the first gives lowest perplexity that the second one.
> Should the above commands give the different perplexity?
They may, though not by much.

Realize that ngram -mix-lm WITHOUT the -bayes option performs an "ngram 
merging" that APPROXIMATES the result of interpolating the two LMs 
according to the classical formula.  This is describe in the the SRILM 
paper:
> The ability to approximate class-based and interpolated Ngram
> LMs by a single word N-gram model deserves some discussion.
> Both of these operations are useful in situations where
> other software (e.g., a speech recognizer) supports only standard
> N-grams. Class N-grams are approximated by expanding class labels
> into their members (which can contain multiword strings) and
> then computing the marginal probabilities of word N-gram strings.
> This operation increases the number of N-grams combinatorially,
> and is therefore feasible only for relatively small models.
> An interpolated backoff model is obtained by taking the union
> of N-grams of the input models, assigning each N-gram the
> weighted average of the probabilities from those models (in some
> of the models this probability might be computed by backoff), and
> then renormalizing the new model. We found that such interpolated
> backoff models consistently give slightly lower perplexities
> than the corresponding standard word-level interpolated models.
> The reason could be that the backoff distributions are themselves
> obtained by interpolation, unlike in standard interpolation, where
> each component model backs off individually.
So the result may differ because because the merging process introduces 
new backoff nodes into the LM and that may change some probabilities 
arrived at through backing off. However, if you use

    ngram -lm l2.lm -lambda 0 -mix-lm l1.lm -ppl  t.txt -bayes 0

you get exact interpolation and then the perplexities should be identical.
But you cannot save such an interpolated model back into a single ngram LM.

In practice the difference should not matter (at least in my experience).

Andreas


>  
> thanks
>  
> Akmal
>
>  
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://www.speech.sri.com/mailman/listinfo/srilm-user



More information about the SRILM-User mailing list