<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">On 6/13/2013 8:23 AM, Meng CHEN wrote:<br>
</div>
<blockquote
cite="mid:lqi37x5f0utvpqjqx8gpy8kv.1371135103438@email.android.com"
type="cite">Hi, in make-big-lm command, it specifies
-read-with-mincounts and -meta-tag by default. In the help page,
it says "if -meta-tag is defined, these low-count N-grams will be
converted to count-of-count N-grams, so that smoothing methods
that need this information still work correctly". However, for
wbdiscount, we don't need the count-of-count infomation to compute
the discounting parameters. So, why does make-big-lm specify
-meta-tag option for wbdiscount by default? Is that necessary? Can
I remove it?(I tried that, and find the ngrams are the same in
model, but the probability is different.)<br>
Thanks!<br>
</blockquote>
<br>
WB discounting requires the count of the distinct word types for
each context. That information can also be gotten from the
meta-counts, and that's why you're getting different results without
-meta-tag.<br>
<br>
BTW, I should update the man page to say that WB discounting is also
supported in make-big-lm.<br>
<br>
Andreas<br>
<br>
<blockquote
cite="mid:lqi37x5f0utvpqjqx8gpy8kv.1371135103438@email.android.com"
type="cite"><br>
<br>
Meng CHEN<br>
<br>
<br>
<br>
发送自魅族MX<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
SRILM-User site list
<a class="moz-txt-link-abbreviated" href="mailto:SRILM-User@speech.sri.com">SRILM-User@speech.sri.com</a>
<a class="moz-txt-link-freetext" href="http://www.speech.sri.com/mailman/listinfo/srilm-user">http://www.speech.sri.com/mailman/listinfo/srilm-user</a></pre>
</blockquote>
<br>
</body>
</html>