Search SRILM-USER Archives

KN discounting and zeroton words

From: Tanel =?ISO-8859-1?Q?Alum=E4e?= <tanel.alumae at ADDRESS HIDDEN>
Date: Mon, 06 Jun 2005 19:03:31 +0300

Hello,

I've noticed that when using -kndiscount, the zeroton words (words that
are in the vocabulary but not in the training corpus) get a higher
unigram LM probability than words that actually occur (rarely) in the
training corpus. Shouldn't the zeroton words get the same unigram
probability as the words that are discounted to 0 using the -gt1min
option?

With GT, WB and natural discounting, everything works as expected:
zeroton words get the same unigram probability as the words discounted
to 0.

Regards,
Tanel A.

Click here to go to the SRILM home page.