Search SRILM-USER Archives

Match: Format: Sort by:
Search:

Re: KN discounting and zeroton words

From: Tanel =?ISO-8859-1?Q?Alum=E4e?= <tanel.alumae at ADDRESS HIDDEN>
Date: Mon, 13 Jun 2005 16:50:00 +0300

> The unigram probabilities for zeroton words are obtained by distributing
> the backoff mass left by the non-zeroton words evenly over all the zerotons
> (this corresponds to backing off to a uniform distribution).
> Now, if the number of zerotons is small they might actually get more
> probability than the low-count observed unigrams that way.
>
> The -interpolate1 option should prevent this since it distributes the
> backoff mass over ALL unigrams (adding to the probability of those words
> that were observed).
> Please check if this is the case, and if not, send me a test case so
> I can look into why it doesn't work as intended.

Yes, the -interpolate1 option prevents this from happening.

hanks for the help.

Tanel

Click here to go to the SRILM home page.