Search SRILM-USER Archives

Match: Format: Sort by:
Search:

Re: error in discount estimator for order 3

From: Andreas Stolcke <stolcke at ADDRESS HIDDEN>
Date: Thu, 03 Aug 2006 23:37:19 -0700

Rebecca Madsen wrote:
> Is there a reason why duplicating my data would give me the following
> error:
>
> using ModKneserNey for 3-grams
> Kneser-Ney smoothing 3-grams
> n1 = 0
> n2 = 94762
> n3 = 0
> n4 = 37773
> one of required modified KneserNey count-of-counts is zero
> error in discount estimator for order 3
If you look at the formulae for KN discounting you see that they lead to
undefined values when n1 = 0. The same is true of GT discounting.
These dicsounting methods assume that the ngram distribution is "natural",
not manipulated like in your case.
>
> I can build a language model using the following command line with the
> normal data, but concatenating two copies of the data together gives
> me the discount estimator error.
That's completely expected (see above).  What are you trying to
accomplish by duplicating your data?  Obviously you are not adding
any information by doing so.

--Andreas

Click here to go to the SRILM home page.