[SRILM User List] A confusion of the interpolated language model

Thu Aug 27 00:21:25 PDT 2009



I am a new student user of srilm from Asia.Here I used the command below to construct a interpolated mod-kn discount language model:
~ ngram-count -read merge_counts_1994-2003.gz -gt1min 0 -gt2min 0 -gt3min 2 -kndiscount -interpolate -order 3 -vocab ChWord.lexno -lm 1994-2003_lm_all_pruned.lm


 However in my model several N-grams' back-off werght(bow) appears to be greater than 1.That is ,in the text LM file,I've got a line:
-6.457229    <s> 1635    0.1270406
(Here we just use a kind of index to represent a chinese word)
in whitch the 1og10(bow) is greater than 0.We don't think a normal interplotate discount method can produce an N-gram bow greater than 1,besides this circumstance only occured to several(less than 5) different N-grams.So I am confused and would like to ask if there is someyone who encounterd this circumstance or happens to know what is wrong.
Thank you very much!

史海龙
Hailoon Shi
w63,EE Dpt.Thu Univ.PRC

__________________________________________________
赶快注册雅虎超大容量免费邮箱?
http://cn.mail.yahoo.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20090827/c0a2cf61/attachment.html>