[SRILM User List] Probability of Unknown Words - Kneser Ney?
stolcke at icsi.berkeley.edu
Mon May 21 11:35:15 PDT 2012
On 5/20/2012 8:28 PM, Burkay Gur wrote:
> I was wondering how we calculate the probability of unk words while
> using unmodified Kneser Ney. I know that Kneser Ney never assigns zero
> probs. How is that possible with words that are never seen? Or words
> that are in the dictionary but not in the training corpus?
There is nothing special that KN smoothing does with unknown words.
Like all smoothing methods, unknown words are either ignored (assigned 0
probability) or modeled by a designated <unk> token, depending on how
your data is prepared and the ngram-count -unk option.
For more information see the FAQ page
look for "unknown" .
More information about the SRILM-User