[SRILM User List] language models

Thu Aug 27 13:38:35 PDT 2009

Md. Akmal Haidar wrote:
>
>  Hi,
> Thanks for your reply.
> I need to mix 20 topic models. srilm provide 10 LM file one at a time.
> I use the following command:(t:topic,w:topic weight)
> ngram -lm t1.lm w1 -mix-lm t2.lm w2 -mix-lm2 t3.lm w3 
> .............-mix-lm9 t10.lm w10 -write-lm t1to10.lm
> ngram -lm t11.lm w11 -mix-lm t12.lm w12 -mix-lm2 t13.lm w13 
> .............-mix-lm9 t20.lm w20 -write-lm t11to20.lm
> ngram -lm t1to10.lm .5 -mix-lm t11to20.lm .5 -write-lm t1to20.lm
You can mix the models recursively.  To mix three models  L1 L2 L3 with 
weights w1 w2 w3 (w1 + w2+ w3  = 1)
you first build

       L12 = w1/(w1+w2) L1 + w2/(w1+w2) L2

and then

       L = (w1 + w2) L12 + w3 L3.

I'll leave it to you to generalize this to a larger number of models.

Please direct future questions of this nature to the srilm-user mailing 
list.

Andreas

>
> could you please tell me is the command correct for mixing LM file?
>
> Thanks
> Akmal
>
> ------------------------------------------------------------------------
> *From:* Andreas Stolcke <stolcke at speech.sri.com>
> *To:* Md. Akmal Haidar <akmalcuet00 at yahoo.com>
> *Cc:* srilm-user <srilm-user at speech.sri.com>
> *Sent:* Wednesday, August 19, 2009 8:05:40 PM
> *Subject:* Re: language models
>
> Md. Akmal Haidar wrote:
> > Hi,
> > I have three 3 lm file.
> > The first one i got by ngram-count.
> > The second one is by applying some matlab programming on the first.
> > The third one is by renormalizing the second one using ngram -renorm 
> option.
> >  In creating the third one, i faced some message like: BOW 
> denominator for context "been has" is -0.382151<=0, numerator is 0.846874
> That's expected if you changed the probabilities such that they sum to 
> > 1 for a given context.
> ngram -renorm cannot deal with this.  It simply recomputes the backoff 
> weights to normalize the model, but it won't change the existing ngram 
> probabilities.  Obviously if just the explicit ngram probabilities sum 
> to > 1 there is no way to assign backoff weights such that the model 
> is normalized, hence the above message.
> >  The second and third one gives too lowest perplexity(7.53 & 5.70) . 
> The first one gives 73.73
> That's right, if your probabilities don't sum to 1 (over the entire 
> vocabulary, for all contexts) perplexities are meaningless.
>
> You can run ngram -debug 3 -ppl to check that probabilities are 
> normalized for all contexts occurring in your test set.
>
> I don't have a simple solution for your problem.  Since you 
> manipulated the probabilities you have to figure out a way to get them 
> normalized !  I suggest you use the srilm-user mailing list if you 
> want to seek further advice this.  But you would first have to explain 
> in more detail how you assign your probabilities.
>
> Andreas
>
> >  Could you please tell me whats the meaning of these message?
> >  Thanks & Regards
> > Haidar
> >
> >  
> ------------------------------------------------------------------------
> > *From:* Andreas Stolcke <stolcke at speech.sri.com 
> <mailto:stolcke at speech.sri.com>>
> > *To:* Md. Akmal Haidar akmalcuet00 at yahoo.com 
> <mailto:akmalcuet00 at yahoo.com> <mailto:akmalcuet00 at yahoo.com 
> <mailto:akmalcuet00 at yahoo.com>>
> > *Sent:* Thursday, August 13, 2009 1:24:41 PM
> > *Subject:* Re: language models
> >
> >
> > In message <92580.94445.qm at web38002.mail.mud.yahoo.com 
> <mailto:92580.94445.qm at web38002.mail.mud.yahoo.com> 
> <mailto:92580.94445.qm at web38002.mail.mud.yahoo.com 
> <mailto:92580.94445.qm at web38002.mail.mud.yahoo.com>>>you wrote:
> > >
> > > Dear Andreas,
> > > I attahced 2 lm file.
> > > here, train3.lm is the original lm file which i got by applying 
> ngram-count.
> >
> > So does that file have probabilities summing to 1?
> > I would think not.
> >
> > > ntrain3.lm is the modified lm which i got by some matlab 
> programming. But, he
> > > re sum the of seen 2-gram probabilities sharing common 1 gram is 
> greater than
> > >  1.
> >
> > I cannot help you debugging you matlab script if that's what's giving
> > you unnormalized probabilities.
> >
> > >
> > > If i changed the 1 gram back off weight to make the sum of 
> 2-gram(seen & unse
> > > en) proability sharing common 1 gram is equal to 1, is the method 
> will correc
> > > t?
> >
> > yes.
> >
> > ngram -renorm will also do this for you.
> >
> > Andreas
> >
> >
>
>