<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta content="text/html; charset=EUC-KR" http-equiv="Content-Type">

</head>

<body bgcolor="#ffffff" text="#000000">

<br>

You are running out of memory, and the reasons could have to do with

the way your operating system is set up.&nbsp; SRILM itself has no inherent

limit in the size of memory it can use (other than what is given by the

width of your pointer type (32/64 bits).<br>

<br>

Write a small test program to see how much memory you can malloc() on

your system, and if it doesn't work as expected look for an expert in

windows/cygwin.<br>

<br>

Regardless of all this, you will not be able to convert the Google

N-gram collection into a backoff model, as I've explained before (see

<a class="moz-txt-link-freetext" href="http://www.speech.sri.com/pipermail/srilm-user/2009q2/000751.html">http://www.speech.sri.com/pipermail/srilm-user/2009q2/000751.html</a> ).<br>

<br>

Andreas<br>

<br>

On 2/9/2010 1:15 AM, 이일빈 wrote:

<blockquote cite="mid:D3610A9C768A436D84FBAE7A04589660@etri.info"

 type="cite">

  <div id="msgbody" style="font-size: 10pt; font-family: 굴림;">

  <div>Thank you for your prompt response.</div>

  <div>&nbsp;</div>

  <div>In fact, I was trying to interpolate two count files from Google

N-gram and a training corpus.</div>

  <div>However, I found out there is a FAQ section about Google N-gram

so I'm trying it now.</div>

  <div>&nbsp;</div>

  <div>I finished all the processes given in the FAQ and I want to

convert it into ARPA format.</div>

  <div>(As you know, the result of the process is just a count-LM

parameter file.)</div>

  <div>So I tried the following command.</div>

  <div>&nbsp;</div>

  <div>----------------------------</div>

  <div>ngram -debug 2 -order 3 -count-lm -lm google.countlm -vocab

vocab.txt&nbsp;-vocab-aliases google.aliases -limit-vocab -write-lm google.lm<br>

----------------------------</div>

  <div>&nbsp;</div>

  <div style="font-family: 굴림;">

  <div>

  <div style="font-family: 굴림;">

  <div>But an error message came out. </div>

  <div>&nbsp;</div>

  <div>----------------------------</div>

  <div>assertion "body != 0" failed: file "../../include/LHash.cc",

line 138<br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3 [sig] ngram 21852 winpids::enumNT: error 0xC0000005 reading

system proce<br>

ss information<br>

Aborted (core dumped)</div>

  <div>----------------------------</div>

  <div>&nbsp;</div>

  <div>However, I monitored the committed memory size and it reached

only 900MB.</div>

  <div>So I'm wondering whether there is a memory usage limit in the

toolkit.</div>

  <div>&nbsp;</div>

  <div>If you could help me with this problem, then please do so.</div>

  <div>But you can also suggest a good way to convert the count-LM

parameter file and count files into ARPA format.</div>

  <div>&nbsp;</div>

  <div>Thank you.</div>

  <div>&nbsp;</div>

  <div>&nbsp;</div>

  <div>Best regards,</div>

  <div>ILBIN</div>

  <div>&nbsp;</div>

  </div>

  </div>

  </div>

  <div><br>

-----원본 메시지-----<br>

  <b>From:</b> "Andreas Stolcke" <a class="moz-txt-link-rfc2396E" href="mailto:stolcke@speech.sri.com">&lt;stolcke@speech.sri.com&gt;</a><br>

  <b>From Date:</b> 2010-02-09 오전 2:59:45<br>

  <b>To:</b> 이일빈 <a class="moz-txt-link-rfc2396E" href="mailto:illee@etri.re.kr">&lt;illee@etri.re.kr&gt;</a><br>

  <b>Cc:</b> "srilm-user" <a class="moz-txt-link-rfc2396E" href="mailto:srilm-user@speech.sri.com">&lt;srilm-user@speech.sri.com&gt;</a><br>

  <b>Subject:</b> Re: SRI LM toolkit: ngram-count<br>

  <br>

  </div>

  <div bgcolor="#ffffff" text="#000000">On 2/7/2010 9:35 PM, 이일빈 wrote:

  <blockquote cite="mid:11B84580663B49428A09AEEC9666F6ED@etri.info"

 type="cite">

    <div id="msgbody" style="font-size: 10pt; font-family: 굴림;">

    <div>Dear Andreas Stolcke </div>

    <div>&nbsp;</div>

    <div>Hello. I'm ILBIN LEE who develops a speech recognizer in ETRI,

Korea.</div>

    <div>While using ngram-count command of SRI LM toolkit, I

encountered the following error message.</div>

    <div>&nbsp;</div>

    <div>$ ngram-count.exe -order 3 -sort -float-counts -gt2min 1

-gt3min 1 -vocab vocab.txt&nbsp;-read count.txt -lm lm.txt</div>

    <div>error in discount estimator for order 1</div>

    <div style="font-family: 굴림;">

    <div>

    <div style="font-family: 굴림;">

    <div>&nbsp;</div>

    <div>The count file is an interpolation of two different count

files. So it has lots of fractional counts.</div>

    <div>If you could suggest me some possible causes, it would help me

a lot.</div>

    </div>

    </div>

    </div>

    </div>

  </blockquote>

You cannot use Good Turing discounting with fractional counts.&nbsp; Try

-wbdiscount or -cdiscount or -addsmooth.&nbsp;&nbsp; <br>

  <br>

The fact that you didn't get an error message also indicates that you

weren't using -float-counts, which you must when processing fractional

counts.<br>

  <br>

Please also read the FAQ section on Smoothing issues before proceedings

further.<br>

  <br>

Andreas<br>

  <br>

  <blockquote cite="mid:11B84580663B49428A09AEEC9666F6ED@etri.info"

 type="cite">

    <div id="msgbody" style="font-size: 10pt; font-family: 굴림;">

    <div style="font-family: 굴림;">

    <div>

    <div style="font-family: 굴림;">

    <div><dt>Thank you.</dt>

    </div>

    <div>&nbsp;</div>

    <div>&nbsp;</div>

    <div>Best regards,</div>

    <div>ILBIN</div>

    </div>

    </div>

    </div>

    </div>

    <img style="display: none;"

 src="http://umail.etri.re.kr/External_ReadCheck.aspx?email=stolcke@speech.sri.com&amp;name=stolcke%40speech.sri.com&amp;fromemail=illee@etri.re.kr&amp;messageid=%3Ce9b1d52f-52de-4f9b-bab4-5ce7c0ed99e1@etri.re.kr%3E"

 moz-do-not-send="true" width="0" height="0"></blockquote>

  <br>

  </div>

  </div>

  <img moz-do-not-send="true" style="display: none;"

 src="http://umail.etri.re.kr/External_ReadCheck.aspx?email=stolcke@speech.sri.com&amp;name=Andreas+Stolcke&amp;fromemail=illee@etri.re.kr&amp;messageid=%3Cd30f2280-28ec-40d8-be55-909e97b2eacf@etri.re.kr%3E"

 width="0" height="0"></blockquote>

<br>

</body>

</html>