Smoothing Error

Andreas Stolcke stolcke at speech.sri.com
Sun Oct 17 09:06:49 PDT 2004


In message <8903a7f304101621147622e36d at mail.gmail.com>you wrote:
> Hi 
> 
> I am new to SRILM. While trying to build a language model, I am
> getting the follwoing error:
> 
> > one of required modified KneserNey count-of-counts is zero
> > error in discount estimator for order 1
> 
> I used the following command to build the model:
> 
> ../../tools/SRILM/bin/i686/ngram-count -order 4 -text temp.txt
> -kndiscount -kndiscount1 -kndiscount2 -kndiscount3 -kndiscount4
> -interpolate -interpolate1 -interpolate2 -interpolate3 -interpolate4
> -lm temp.lm  -gt1min 0 -gt2min 0 -gt3min 0 -gt4min 0 -debug 1
> 
> Output was:
> 
> temp.txt: line 50000: 50000 sentences, 6348678 words, 0 OOVs
> 0 zeroprobs, logprob= 0 ppl= 1 ppl1= 1
> modifying 1-gram counts for Kneser-Ney smoothing
> Kneser-Ney smoothing 1-grams
> n1 = 3
> n2 = 0
> n3 = 1
> n4 = 0
> one of required modified KneserNey count-of-counts is zero
> error in discount estimator for order 1

The count-of-count statistics of your data are not suitable for 
KN smoothing. They are also very odd:  you have 6348678 words, yet
only 3 words occurring once, 0 words occurring twice, etc.
I suspect you data was artificially generated or manipulated in some way.

In any case, please try another smoothing method that is not 
based on counts-of-counts, such as Witten-Bell.

--Andreas 




More information about the SRILM-User mailing list