<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">On 10/31/2013 6:21 PM, Sergey
Zablotskiy wrote:<br>
</div>
<blockquote cite="mid:52722F30.7050205@uni-ulm.de" type="cite">Hi
Everybody,
<br>
<br>
is there any workaround to combine modified Kneser-Ney smoothing
for lower-order n-grams along with Witten-Bell smooting for
higher-order n-grams using the MAKE-BIG-LM training script?
<br>
<br>
I am getting the following error/message:
<br>
make-big-lm: must use one of GT, KN, or WB discounting for all
orders
<br>
<br>
while executing:
<br>
>> make-big-lm -read ${count_file} -vocab ${vocab} -unk
-order 4 \
<br>
-kndiscount1 -kndiscount2 -kndiscount3 -wbdiscount4 \
<br>
-interpolate -lm name.lm
<br>
<br>
I can not use the kndiscount for 4-Gram because some counts of
counts are zero in my case.
<br>
<br>
</blockquote>
<br>
1) It does not make sense to combine KN discounting for lower-order
ngrams with some other method since the KN method of discounting the
lower-order ngram is designed precisely to complement the
discounting for the highest-order ngrams.<br>
<br>
2) make-big-lm invokes a helper script called make-kn-discounts to
compute the discounting factors based on the counts-of-counts. It
tries to fill in for missing (zero) counts-of-counts based on an
empirical regularity in the counts-of-counts (the details are in
Section 4 of <a
href="http://www.speech.sri.com/cgi-bin/run-distill?papers/asru2007-mt-lm.ps.gz">this
paper</a>).<br>
If that mechanism doesn't work for some reason we should try to fix
it. <br>
<br>
Andreas<br>
<br>
</body>
</html>