<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">On 10/9/2013 7:31 AM, E wrote:<br>
</div>
<blockquote
cite="mid:8D0931368982F22-1114-246A@webmail-m231.sysops.aol.com"
type="cite"><font color="black" face="arial" size="2">
<div id="AOLMsgPart_1_f47a2a1c-8748-452e-8559-e6492e497ca9">
<div id="AOLMsgPart_1_e39538f8-0bd2-42f9-8273-291e924d5738"
style="font-family: arial;"><font face="arial"><font
face="arial, helvetica">Perhaps the ngramCount file I
used crosses some limit on count of a particular ngram.
Because some very large count words have positive log
probability in the "ug.lm" file. BTW I used
bin/i686/ngram-count </font>executable<font face="arial,
helvetica">.</font>
</font></div>
<div id="AOLMsgPart_1_e39538f8-0bd2-42f9-8273-291e924d5738"
style="font-family: arial;"><font face="arial">I used Web1T
to obtain these counts. Is there a workaround, like
assigning artificial counts (= upperlimit) to the
troublesome ngrams?</font></div>
</div>
</font></blockquote>
<font size="2"><font face="arial"><br>
My suspicion is that you're exceeding memory limits with this
data. Possibly you are also exceeding the range of 32bit
integers with some large unigram counts.<br>
<br>
1) Make sure you're building 64-bit executables. If "file </font></font><font
color="black" face="arial" size="2"><font face="arial"><font
face="arial, helvetica">bin/i686/ngram-count" says that it's
an 32-bit executable, do a "make clean" and rebuilt with "make
MACHINE_TYPE=i686-m64 ..." .<br>
<br>
2) To find out what the memory demand of your job is, try
scaling back the data size (say take 1/100 or 1/10 of it), and
monitor the memory usage with "top" or similar utility. Then
extrapolate (linearly) to the full data size.<br>
<br>
3) If you find your computer doesn't have enough memory try
the memory saving techniques discussed at
<a class="moz-txt-link-freetext" href="http://www.speech.sri.com/projects/srilm/manpages/srilm-faq.7.html">http://www.speech.sri.com/projects/srilm/manpages/srilm-faq.7.html</a>
under "</font></font></font>Large data and memory issues".<br>
<br>
Good luck!<br>
<br>
Andreas<br>
<br>
</body>
</html>