<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=EUC-KR" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
<br>
You are running out of memory, and the reasons could have to do with
the way your operating system is set up. SRILM itself has no inherent
limit in the size of memory it can use (other than what is given by the
width of your pointer type (32/64 bits).<br>
<br>
Write a small test program to see how much memory you can malloc() on
your system, and if it doesn't work as expected look for an expert in
windows/cygwin.<br>
<br>
Regardless of all this, you will not be able to convert the Google
N-gram collection into a backoff model, as I've explained before (see
<a class="moz-txt-link-freetext" href="http://www.speech.sri.com/pipermail/srilm-user/2009q2/000751.html">http://www.speech.sri.com/pipermail/srilm-user/2009q2/000751.html</a> ).<br>
<br>
Andreas<br>
<br>
On 2/9/2010 1:15 AM, ÀÌÀϺó wrote:
<blockquote cite="mid:D3610A9C768A436D84FBAE7A04589660@etri.info"
type="cite">
<div id="msgbody" style="font-size: 10pt; font-family: ±¼¸²;">
<div>Thank you for your prompt response.</div>
<div> </div>
<div>In fact, I was trying to interpolate two count files from Google
N-gram and a training corpus.</div>
<div>However, I found out there is a FAQ section about Google N-gram
so I'm trying it now.</div>
<div> </div>
<div>I finished all the processes given in the FAQ and I want to
convert it into ARPA format.</div>
<div>(As you know, the result of the process is just a count-LM
parameter file.)</div>
<div>So I tried the following command.</div>
<div> </div>
<div>----------------------------</div>
<div>ngram -debug 2 -order 3 -count-lm -lm google.countlm -vocab
vocab.txt -vocab-aliases google.aliases -limit-vocab -write-lm google.lm<br>
----------------------------</div>
<div> </div>
<div style="font-family: ±¼¸²;">
<div>
<div style="font-family: ±¼¸²;">
<div>But an error message came out. </div>
<div> </div>
<div>----------------------------</div>
<div>assertion "body != 0" failed: file "../../include/LHash.cc",
line 138<br>
3 [sig] ngram 21852 winpids::enumNT: error 0xC0000005 reading
system proce<br>
ss information<br>
Aborted (core dumped)</div>
<div>----------------------------</div>
<div> </div>
<div>However, I monitored the committed memory size and it reached
only 900MB.</div>
<div>So I'm wondering whether there is a memory usage limit in the
toolkit.</div>
<div> </div>
<div>If you could help me with this problem, then please do so.</div>
<div>But you can also suggest a good way to convert the count-LM
parameter file and count files into ARPA format.</div>
<div> </div>
<div>Thank you.</div>
<div> </div>
<div> </div>
<div>Best regards,</div>
<div>ILBIN</div>
<div> </div>
</div>
</div>
</div>
<div><br>
-----¿øº» ¸Þ½ÃÁö-----<br>
<b>From:</b> "Andreas Stolcke" <a class="moz-txt-link-rfc2396E" href="mailto:stolcke@speech.sri.com"><stolcke@speech.sri.com></a><br>
<b>From Date:</b> 2010-02-09 ¿ÀÀü 2:59:45<br>
<b>To:</b> ÀÌÀϺó <a class="moz-txt-link-rfc2396E" href="mailto:illee@etri.re.kr"><illee@etri.re.kr></a><br>
<b>Cc:</b> "srilm-user" <a class="moz-txt-link-rfc2396E" href="mailto:srilm-user@speech.sri.com"><srilm-user@speech.sri.com></a><br>
<b>Subject:</b> Re: SRI LM toolkit: ngram-count<br>
<br>
</div>
<div bgcolor="#ffffff" text="#000000">On 2/7/2010 9:35 PM, ÀÌÀϺó wrote:
<blockquote cite="mid:11B84580663B49428A09AEEC9666F6ED@etri.info"
type="cite">
<div id="msgbody" style="font-size: 10pt; font-family: ±¼¸²;">
<div>Dear Andreas Stolcke </div>
<div> </div>
<div>Hello. I'm ILBIN LEE who develops a speech recognizer in ETRI,
Korea.</div>
<div>While using ngram-count command of SRI LM toolkit, I
encountered the following error message.</div>
<div> </div>
<div>$ ngram-count.exe -order 3 -sort -float-counts -gt2min 1
-gt3min 1 -vocab vocab.txt -read count.txt -lm lm.txt</div>
<div>error in discount estimator for order 1</div>
<div style="font-family: ±¼¸²;">
<div>
<div style="font-family: ±¼¸²;">
<div> </div>
<div>The count file is an interpolation of two different count
files. So it has lots of fractional counts.</div>
<div>If you could suggest me some possible causes, it would help me
a lot.</div>
</div>
</div>
</div>
</div>
</blockquote>
You cannot use Good Turing discounting with fractional counts. Try
-wbdiscount or -cdiscount or -addsmooth. <br>
<br>
The fact that you didn't get an error message also indicates that you
weren't using -float-counts, which you must when processing fractional
counts.<br>
<br>
Please also read the FAQ section on Smoothing issues before proceedings
further.<br>
<br>
Andreas<br>
<br>
<blockquote cite="mid:11B84580663B49428A09AEEC9666F6ED@etri.info"
type="cite">
<div id="msgbody" style="font-size: 10pt; font-family: ±¼¸²;">
<div style="font-family: ±¼¸²;">
<div>
<div style="font-family: ±¼¸²;">
<div><dt>Thank you.</dt>
</div>
<div> </div>
<div> </div>
<div>Best regards,</div>
<div>ILBIN</div>
</div>
</div>
</div>
</div>
<img style="display: none;"
src="http://umail.etri.re.kr/External_ReadCheck.aspx?email=stolcke@speech.sri.com&name=stolcke%40speech.sri.com&fromemail=illee@etri.re.kr&messageid=%3Ce9b1d52f-52de-4f9b-bab4-5ce7c0ed99e1@etri.re.kr%3E"
moz-do-not-send="true" width="0" height="0"></blockquote>
<br>
</div>
</div>
<img moz-do-not-send="true" style="display: none;"
src="http://umail.etri.re.kr/External_ReadCheck.aspx?email=stolcke@speech.sri.com&name=Andreas+Stolcke&fromemail=illee@etri.re.kr&messageid=%3Cd30f2280-28ec-40d8-be55-909e97b2eacf@etri.re.kr%3E"
width="0" height="0"></blockquote>
<br>
</body>
</html>