<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">On 8/15/2013 12:15 AM, HU Rile wrote:<br>
</div>
<blockquote
cite="mid:58519bc7.8a77.14080d44c86.Coremail.londis@163.com"
type="cite">
<div
style="line-height:1.7;color:#000000;font-size:14px;font-family:arial">
<div
style="line-height:1.7;color:#000000;font-size:14px;font-family:arial">Hi,<br>
<div>I would like to build an LM using the Google Web 1T
corpus. And I followed the steps on <a
moz-do-not-send="true"
href="http://www-speech.sri.com/projects/srilm/manpages/srilm-faq.7.html"
style="line-height: 1.7;">http://www-speech.sri.com/projects/srilm/manpages/srilm-faq.7.html</a>.
But when I used ngram-count to e<span style="font-family:
Simsun; font-size: medium; line-height: normal;">stimate
the mixture weights, the program can not run and gave the
response "</span><span style="line-height: 1.7;">google.countlm.0:
line 22: reached EOF before \end\</span></div>
<div><span style="line-height: 1.7;">format error in init-lm
file</span><span style="font-family: Simsun; font-size:
medium; line-height: normal;">".</span></div>
<div><font size="3" face="Simsun"><span style="line-height:
normal;">I tried to add \end\ to the end of googl!
e.countlm.0, but it did not work.</span></font></div>
<div><font size="3" face="Simsun"><span style="line-height:
normal;">Here is the content of my </span></font><span
style="font-family: Simsun; font-size: medium;
line-height: normal;">google.countlm.0: </span></div>
<div><span style="font-family: Simsun; font-size: medium;
line-height: normal;">
<div>order 3</div>
<div>vocabsize 13588391</div>
<div>totalcount 1024908267229</div>
<div>countmodulus 40</div>
<div>mixweights 15</div>
<div> 0.5 0.5 0.5</div>
<div> 0.5 0.5 0.5</div>
<div> 0.5 0.5 0.5</div>
<div> 0.5 0.5 0.5</div>
<div> 0.5 0.5 0.5</div>
<div> 0.5 0.5 0.5</div>
<div> 0.5 0.5 0.5</div>
<div> 0.5 0.5 0.5</div>
<div> 0.5 0.5 0.5</div>
<div> 0.5 0.5 0.5</div>
<div> 0.5 0.5 0.5</div>
<div> 0.5 0.5 0.5</div>
<div> 0.5 0.5 0.5</div>
<div> 0.5 0.5 0.5</div>
<div> 0.5 0.5 0.5</div>
<div> 0.5 0.5 0.5</div>
<div>google-counts /home/hurile/googleweb1T/google! LM/</div>
<div><br>
</div>
<div>Could someone please tell me how can i so
lve the problem? Thanks a lot!</div>
<div><br>
</div>
<div>Rile Hu</div>
</span></div>
<div><span style="font-family: Simsun; font-size: medium;
line-height: normal;"><br>
</span></div>
</div>
</div>
</blockquote>
You probably forgot the -count-lm option. Without it, ngram-count
will try to interpret the -lm file as a standard ngram LM (where the
\end\ line is expected).<br>
<br>
Andreas<br>
<br>
</body>
</html>