LM missing back-off probabilities

Andreas Stolcke stolcke at speech.sri.com
Wed May 25 15:49:35 PDT 2005


In message <4294E6CE.3020104 at lium.univ-lemans.fr>you wrote:
> I hope this message can help you.
> 
> To use CMU Sphinx with LM estimated with SRILM you have to use two tools 
> provided with SRILM toolkit :
> 
> -add-dummy-bows:  this program adds the 'missing' back-off weights (in 
> fact, when these weights equal to 0 ngram-count doesn't print them)
> -sort-lm: this program sorts n-grams in lexical order (lm3gdmp works 
> only if the n-grams are sorted. In fact, 2-3-...-k-grams have to be 
> sorted in the same order).
> 
> These two tools are programmed in awk (awk or gawk have to be installed 
> on your computer).
> 
> -- Yannick

I agree with the above.
But I think there is something else going on in the case described.
The default minimum ngram count for trigrams is 2, so trigrams
occurring only once in your data will not show up in the LM.

Use
	 ngram-count -gt3min 1 ....

and you will (hopefully) find that the trigram "accounting tricks </s>"
shows up in the LM, along with all its prefixes. 

--Andreas 

> 
> 
> Goldee Udani a écrit :
> 
> > Hi there,
> >
> > I am sorry if this problem has already been addressed before on this 
> > forum.
> >
> > I am trying to generate a small LM for using in Sphinx Speech 
> > Recognition system but the back-off probabilities for every ngram 
> > occuring at the end of sentence(s) are missing.
> > For example -
> >
> > <s> we cannot afford to fight the war against poverty with accounting 
> > tricks </s>
> >
> > For a trigram LM, it doesn't generate back-off probabilities for 
> > "tricks" (unigram) and "accounting tricks " (bigram). This tends to 
> > happen for all the sentences in the test set taken from the corpus.
> >
> > I am trying to use the "ngram-count" script with witten bell 
> > discounting applied to all n-grams in a trigram model.
> >
> > If any of you have faced a similar problem before, I would appreciate 
> > it if you could help me out here.
> >
> > Thanks,
> > Goldee
> >
> >
> 
> 
> 




More information about the SRILM-User mailing list