[SRILM User List] different perplexity

Fri Aug 28 18:04:55 PDT 2009

Md. Akmal Haidar wrote:
> the perplexity for 1(i)=450, 1(ii)=450. both are same
>
> by the way, some back-off weights for l2.lm are greater than 1.
My guess would be that l2.lm is not properly normalized.
Try running it with ngram -debug 3 -ppl on some test data.

When you interpolate with -bayes 0 no normalization is applied to the 
resulting model (it should be automatically normalized assuming the 
component models are normalized), to the resulting more will also be 
unnormalized and give bogus low perplexity.

Andreas

>
> thanks
> Akmal
>
>
> ------------------------------------------------------------------------
> *From:* Andreas Stolcke <stolcke at speech.sri.com>
> *To:* Md. Akmal Haidar <akmalcuet00 at yahoo.com>
> *Sent:* Friday, August 28, 2009 4:55:16 PM
> *Subject:* Re: [SRILM User List] different perplexity
>
> Md. Akmal Haidar wrote:
> >
> >  Hi,
> > Thanks for your reply.
> > I need to compare two lm file by perplexity evaluation.
> >  1. i) ngram -lm general.lm -lambda .5 -mix-lm l1.lm -ppl test1.txt
> >    ii) ngram -lm general.lm -lambda .5 -mix-lm l1.lm -ppl test1.txt 
> -bayes 0
> >        in both commands it gives same perplexity but when
> > 2. i) ngram -lm general.lm -lambda .5 -mix-lm l2.lm -ppl test1.txt
> >        ppl=460
> >    ii)ngram -lm general.lm -lambda .5 -mix-lm l2.lm -ppl test1.txt 
> -bayes 0
> >      ppl=148
> >    the 2(ii)  command gives lower perplexity.
> that is quite odd.  What is the perplexity for 1(i) and 1(ii) ?
>
> andreas
>
> >  could you please tell me why the second one gives lower perplexity?
> >  thanks
> > akmal
> >    
> ------------------------------------------------------------------------
> > *From:* Andreas Stolcke <stolcke at speech.sri.com 
> <mailto:stolcke at speech.sri.com>>
> > *To:* Md. Akmal Haidar <akmalcuet00 at yahoo.com 
> <mailto:akmalcuet00 at yahoo.com>>
> > *Cc:* srilm-user <srilm-user at speech.sri.com 
> <mailto:srilm-user at speech.sri.com>>
> > *Sent:* Friday, August 28, 2009 1:39:45 PM
> > *Subject:* Re: [SRILM User List] different perplexity
> >
> > Md. Akmal Haidar wrote:
> > > Hi,
> > >  i faced a problem in perplexity calculation..
> > > when i used the commands: 1) ngram -lm l1.lm -ppl t.txt            
>                               2) ngram -lm l2.lm -lambda 0 -mix-lm 
> l1.lm -ppl  t.txt
> > >  the first gives lowest perplexity that the second one.
> > > Should the above commands give the different perplexity?
> > They may, though not by much.
> >
> > Realize that ngram -mix-lm WITHOUT the -bayes option performs an 
> "ngram merging" that APPROXIMATES the result of interpolating the two 
> LMs according to the classical formula.  This is describe in the the 
> SRILM paper:
> > > The ability to approximate class-based and interpolated Ngram
> > > LMs by a single word N-gram model deserves some discussion.
> > > Both of these operations are useful in situations where
> > > other software (e.g., a speech recognizer) supports only standard
> > > N-grams. Class N-grams are approximated by expanding class labels
> > > into their members (which can contain multiword strings) and
> > > then computing the marginal probabilities of word N-gram strings.
> > > This operation increases the number of N-grams combinatorially,
> > > and is therefore feasible only for relatively small models.
> > > An interpolated backoff model is obtained by taking the union
> > > of N-grams of the input models, assigning each N-gram the
> > > weighted average of the probabilities from those models (in some
> > > of the models this probability might be computed by backoff), and
> > > then renormalizing the new model. We found that such interpolated
> > > backoff models consistently give slightly lower perplexities
> > > than the corresponding standard word-level interpolated models.
> > > The reason could be that the backoff distributions are themselves
> > > obtained by interpolation, unlike in standard interpolation, where
> > > each component model backs off individually.
> > So the result may differ because because the merging process 
> introduces new backoff nodes into the LM and that may change some 
> probabilities arrived at through backing off. However, if you use
> >
> >  ngram -lm l2.lm -lambda 0 -mix-lm l1.lm -ppl  t.txt -bayes 0
> >
> > you get exact interpolation and then the perplexities should be 
> identical.
> > But you cannot save such an interpolated model back into a single 
> ngram LM.
> >
> > In practice the difference should not matter (at least in my 
> experience).
> >
> > Andreas
> >
> >
> > >  thanks
> > >  Akmal
> > >
> > > > 
> ------------------------------------------------------------------------
> > >
> > > _______________________________________________
> > > SRILM-User site list
> > > SRILM-User at speech.sri.com <mailto:SRILM-User at speech.sri.com> 
> <mailto:SRILM-User at speech.sri.com <mailto:SRILM-User at speech.sri.com>>
> > > http://www.speech.sri.com/mailman/listinfo/srilm-user
> >
> >
> > ------------------------------------------------------------------------
> >
> > _______________________________________________
> > SRILM-User site list
> > SRILM-User at speech.sri.com <mailto:SRILM-User at speech.sri.com>
> > http://www.speech.sri.com/mailman/listinfo/srilm-user
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://www.speech.sri.com/mailman/listinfo/srilm-user