<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">On 5/27/2013 12:19 AM, 贺天行 wrote:<br>
</div>
<blockquote
cite="mid:CANyb1j2irAizJC+oUBcO_uS_OsfhLVvzH1ahtTMrxBSK4X4+FA@mail.gmail.com"
type="cite">
<div dir="ltr">
<div style="">The manual wrote:</div>
<div style=""><dt style="color:rgb(0,0,0);font-family:'Times New
Roman';font-size:medium"><b>-addsmooth</b><i> D</i></dt>
<dd style="color:rgb(0,0,0);font-family:'Times New
Roman';font-size:medium">
Smooth by adding <i>D </i>to each N-gram count. This is
usually a poor smoothing method, included mainly for
instructional purposes.
<pre> <i>p</i>(<i>a</i>_<i>z</i>) = (<i>c</i>(<i>a</i>_<i>z</i>) + <i>D</i>) / (<i>c</i>(<i>a</i>_) + <i>D</i> <i>n</i>(*))</pre>
</dd>
</div>
<div style="">My script is:</div>
ngram-count -write allcnt -order 3 -debug 2 -text test_htx.dat
-addsmooth 0 -lm lmtest<br>
<div style="">The the debug wrote:</div>
<div style="">
<div>test_htx.dat: line 3: 2 sentences, 6 words, 0 OOVs</div>
<div>0 zeroprobs, logprob= 0 ppl= 1 ppl1= 1</div>
<div>using AddSmooth for 1-grams</div>
<div>using AddSmooth for 2-grams</div>
<div>using AddSmooth for 3-grams</div>
<div>discarded 1 2-gram contexts containing pseudo-events</div>
<div>discarded 2 3-gram contexts containing pseudo-events</div>
<div>discarded 6 3-gram probs discounted to zero</div>
<div>writing 6 1-grams</div>
<div>writing 8 2-grams</div>
<div>writing 0 3-grams</div>
</div>
</div>
</blockquote>
<blockquote
cite="mid:CANyb1j2irAizJC+oUBcO_uS_OsfhLVvzH1ahtTMrxBSK4X4+FA@mail.gmail.com"
type="cite">
<div dir="ltr">
<div style="">
<div style="">So there's still discounting, I'm confused that
why addsmooth still has discounting?</div>
</div>
</div>
</blockquote>
<br>
You also have to change that mincount parameter to include all
trigrams, even those that occur only once.<br>
<br>
ngram-count -write allcnt -order 3 -debug 2 -text test_htx.dat
-addsmooth 0 <b>-gt3min 1</b> -lm lmtest<br>
<br>
The default is -gt3min 2 .<br>
<br>
Andreas<br>
<br>
</body>
</html>