FLM

Andreas Stolcke stolcke at speech.sri.com
Mon Apr 9 22:47:45 PDT 2007


Antoine Ghaoui wrote:
> Hello,
>
> I'm using FLM to test some models.
>
> I'm using the same data and the same vocabulary in both tools, 
> ngram-count and fngram-count.
> I'm not able to generate the same trigram model.
> The number of bigram and trigram in the LM files generated are different.
>
> using ngram-count, I'm getting: 
> \data\
> ngram 1=315
> ngram 2=23800
> ngram 3=120408
>
> using fngram-count, I'm getting:
> \data\
> ngram 0x0=315
> ngram 0x1=23523
> ngram 0x2=0
> ngram 0x3=86366
>
> knowing that ngram-count is used with the default parameters and the 
> factor file for the fngram-count is:
>
> ##rule trigram
> 1
> U : 2 U(-1) U(-2) ntextfile.flm.cnt ntextfile.flm.lm 3
> U1U2 U2 wbdiscount gtmin 3 interpolate
> U1 U1 wbdiscount gtmin 1 interpolate
> 0 0
>
> What are the parameters  to use in the factor file in order to get the 
> same LM output?
For one thing, the default gtmin values in ngram-count are

unigrams   1
bigrams   1
trigrams 2

Andreas





More information about the SRILM-User mailing list