[SRILM User List] language models

Wed Aug 19 17:05:40 PDT 2009

Md. Akmal Haidar wrote:
> Hi,
> I have three 3 lm file.
> The first one i got by ngram-count.
> The second one is by applying some matlab programming on the first.
> The third one is by renormalizing the second one using ngram -renorm 
> option.
>  
> In creating the third one, i faced some message like: BOW denominator 
> for context "been has" is -0.382151<=0, numerator is 0.846874
That's expected if you changed the probabilities such that they sum to > 
1 for a given context.
ngram -renorm cannot deal with this.  It simply recomputes the backoff 
weights to normalize the model, but it won't change the existing ngram 
probabilities.  Obviously if just the explicit ngram probabilities sum 
to > 1 there is no way to assign backoff weights such that the model is 
normalized, hence the above message.
>  
> The second and third one gives too lowest perplexity(7.53 & 5.70) . 
> The first one gives 73.73
That's right, if your probabilities don't sum to 1 (over the entire 
vocabulary, for all contexts) perplexities are meaningless.

You can run ngram -debug 3 -ppl to check that probabilities are 
normalized for all contexts occurring in your test set.

I don't have a simple solution for your problem.  Since you manipulated 
the probabilities you have to figure out a way to get them normalized 
!   I suggest you use the srilm-user mailing list if you want to seek 
further advice this.  But you would first have to explain in more detail 
how you assign your probabilities.

Andreas

>  
> Could you please tell me whats the meaning of these message?
>  
> Thanks & Regards
> Haidar
>
>  
> ------------------------------------------------------------------------
> *From:* Andreas Stolcke <stolcke at speech.sri.com>
> *To:* Md. Akmal Haidar akmalcuet00 at yahoo.com 
> <mailto:akmalcuet00 at yahoo.com>
> *Sent:* Thursday, August 13, 2009 1:24:41 PM
> *Subject:* Re: language models
>
>
> In message <92580.94445.qm at web38002.mail.mud.yahoo.com 
> <mailto:92580.94445.qm at web38002.mail.mud.yahoo.com>>you wrote:
> >
> > Dear Andreas,
> > I attahced 2 lm file.
> > here, train3.lm is the original lm file which i got by applying 
> ngram-count.
>
> So does that file have probabilities summing to 1?
> I would think not.
>
> > ntrain3.lm is the modified lm which i got by some matlab 
> programming. But, he
> > re sum the of seen 2-gram probabilities sharing common 1 gram is 
> greater than
> >  1.
>
> I cannot help you debugging you matlab script if that's what's giving
> you unnormalized probabilities.
>
> >
> > If i changed the 1 gram back off weight to make the sum of 
> 2-gram(seen & unse
> > en) proability sharing common 1 gram is equal to 1, is the method 
> will correc
> > t?
>
> yes.
>
> ngram -renorm will also do this for you.
>
> Andreas
>
>