once occuring trigram discarded

Andreas Stolcke stolcke
Mon Jan 31 10:01:16 PST 2005


In message <41FE6F03.5040103 at irisa.fr>you wrote:
> Hi,
> I made a trigram model using Kneser-Ney modified smoothing and 
> interpolation and I don't understand why there are only 5828 trigrams in 
> the model whereas there are 102520 trigrams in the corpus. I think that 
> the trigrams discarded occur just once because there are 96692 trigrams 
> occuring once which is the difference between the trigrams in the corpus 
> and the trigram in the model. I tried to use other smoothing and even no 
> smoothing but every time the trigrams are discarded.
> I don't understand why since the bigram occuring once (there are 58764 
> of such bigrams) are not discarded in the bigram model I built using 
> Kneser-Ney modified smoothing and interpolation.

The default cutoff for trigrams (and higher) is count 2.  
The default cutoff for unigrams and bigrams is count 1.

Use ngram-count -gt3min 1 to include all trigrams.

ngram-count -help displays the default values for all the options.

--Andreas 




More information about the SRILM-User mailing list