<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
On 3/22/2010 11:33 PM, tuzhaopeng wrote:
<blockquote cite="mid:201003231433290350650@ict.ac.cn" type="cite">
<meta http-equiv="Content-Type"
content="text/html; charset=ISO-8859-1">
<meta content="MSHTML 6.00.2900.2963" name="GENERATOR">
<link
href="BLOCKQUOTE%7Bmargin-Top:%200px;%20margin-Bottom:%200px;%20margin-Left:%202em%7D"
rel="stylesheet">
<div>Hi People,</div>
<div> </div>
<div>I meet a problem when I train a language model with option
"-text-has-weights".</div>
<div> </div>
</blockquote>
<br>
<blockquote cite="mid:201003231433290350650@ict.ac.cn" type="cite"><br>
<div> </div>
<div>Then I went to look for more information on Internet, and found
that for the option "-float-counts", <span class="Apple-style-span"
style="word-spacing: 0px; font-family: Monaco; font-style: normal; font-variant: normal; font-weight: normal; font-size: medium; line-height: normal; font-size-adjust: none; font-stretch: normal; -x-system-font: none; text-transform: none; color: rgb(0, 0, 0); text-indent: 0px; white-space: normal; letter-spacing: normal; border-collapse: separate; orphans: 2; widows: 2;"><font
face="Verdana" size="2">only certain discounting </font></span></div>
</blockquote>
correct.<br>
<blockquote cite="mid:201003231433290350650@ict.ac.cn" type="cite">
<div><span class="Apple-style-span"
style="word-spacing: 0px; font-family: Monaco; font-style: normal; font-variant: normal; font-weight: normal; font-size: medium; line-height: normal; font-size-adjust: none; font-stretch: normal; -x-system-font: none; text-transform: none; color: rgb(0, 0, 0); text-indent: 0px; white-space: normal; letter-spacing: normal; border-collapse: separate; orphans: 2; widows: 2;"><font
face="Verdana" size="2">methods support non-integer counts (wbdiscount
and cdiscount). So I use the wb-discount with the command:</font></span></div>
<div><span class="Apple-style-span"
style="word-spacing: 0px; font-family: Monaco; font-style: normal; font-variant: normal; font-weight: normal; font-size: medium; line-height: normal; font-size-adjust: none; font-stretch: normal; -x-system-font: none; text-transform: none; color: rgb(0, 0, 0); text-indent: 0px; white-space: normal; letter-spacing: normal; border-collapse: separate; orphans: 2; widows: 2;"></span> </div>
<div><span class="Apple-style-span"
style="word-spacing: 0px; font-family: Monaco; font-style: normal; font-variant: normal; font-weight: normal; font-size: medium; line-height: normal; font-size-adjust: none; font-stretch: normal; -x-system-font: none; text-transform: none; color: rgb(0, 0, 0); text-indent: 0px; white-space: normal; letter-spacing: normal; border-collapse: separate; orphans: 2; widows: 2;"><strong><font
face="Verdana" size="2">./ngram-count -text-has-weights test -order 3 -lm test.o3.lm.gz -float-counts -unk -wbdiscount -debug 3</font></strong></span></div>
</blockquote>
<br>
The problem here is <br>
<br>
1) you forgot the -text option before your filename.
-text-has-weights is a switch that itself doesn't take an argument.<br>
2) With fractional counts the default minimum counts for retaining
ngrams in the LM still apply. So you might want to add these options
to ensure that all your ngrams end up in the model:<br>
<br>
-gt1min 0 -gt2min 0 -gt3min 0<br>
<br>
FYI, the default values are :<br>
<br>
-gt1min 1 -g2min 1 -gt3min 2<br>
<br>
Andreas <br>
<br>
<blockquote cite="mid:201003231433290350650@ict.ac.cn" type="cite">
<div><span class="Apple-style-span"
style="word-spacing: 0px; font-family: Monaco; font-style: normal; font-variant: normal; font-weight: normal; font-size: medium; line-height: normal; font-size-adjust: none; font-stretch: normal; -x-system-font: none; text-transform: none; color: rgb(0, 0, 0); text-indent: 0px; white-space: normal; letter-spacing: normal; border-collapse: separate; orphans: 2; widows: 2;">
</span></div>
<div> </div>
<div>and the output information is:</div>
<div> </div>
<div>
<div>using WittenBell for 1-grams</div>
<div>using WittenBell for 2-grams</div>
<div>using WittenBell for 3-grams</div>
<div>warning: distributing 1 left-over probability mass over 2 zeroton words</div>
<div>writing 3 1-grams</div>
<div>writing 0 2-grams</div>
<div>writing 0 3-grams</div>
</div>
<br>
</blockquote>
<br>
</body>
</html>