Where have all the 3-grams gone?

Andreas Stolcke stolcke at speech.sri.com
Tue Mar 18 14:43:42 PST 2003


In message <Pine.LNX.4.44.0303182241510.28027-100000 at linux14.phonetik.uni-muenc
hen.de>you wrote:
> Hi Andreas,
> 
> experimenting a little with SRILM, I found that ngram-count does not enter
> trigrams into the language model, that occur only once, while it does so
> with bigrams. The command
> 
> echo "the man hit the ball" | ngram-count -order 3 -text - -cdiscount3 0.5
> -cdiscount2 0.5 -cdiscount1 0.5 -unk -lm test_C3gram.lm

The default minimum counts are as follows:

1grams	1
2grams	1
3grams	2
4grams	2

You can use the -gt1min, -gt2min, etc. options to change these thresholds
at will. (Maybe counter-intuitively, these options apply to all smoothing
schemes.)

--Andreas 




More information about the SRILM-User mailing list