Hello,<br><br>First off, to save you from having to read the below, suppose I used make-google-ngrams to store a small corpus of text&#39;s N-gram counts on disk in googles format.  How do I then convert this to ARPA format with SRILM?<br>


<br>I have read the Google Web N-gram section in the F.A.Q, I read all the emails with the search   term google in it and I read all the relevant man pages as well as looked at relevant run-tests without success. <br><br>

My goal is to make an arpa format language model from the N-gram counts inside the Google Web N-gram corpus.  I realize its too large to load into memory as discussed in the documentation, so as per one of the emails in the list suggested, I pruned out most of the junk or non dictionary words and merged different cases and fixed the config files.  So now I reduced the data quite significantly and am unable to figure out how to convert it to arpa format.  Below is what I tried:<br>


<br>1.ngram -order 5 -count-lm -lm google.countlm -write-lm arpaLM<br><br>This did not work. It produced the same duplicate file of google.countlm<br><br>2. I noticed in the man pages that using the command -expand-classes forced the output to be a single ngram model in ARPA format. So I tried:<br>


ngram -order 5 -count-lm -lm google.countlm -expand-classes 5 -write-lm arpaLM<br>Nothing happened but the output:<br>HMM, NgramCountLM, AdaptiveMix, Decipher, tagged, factored, DF, hidden N-gram, hidden-S, class N-gram, skip N-gram and stop-word N-gram models are mutually exclusive<br>


<br>3.I thought maybe using mix-lm would result in an arpa model as it also says in the man pages this would occur with mix-lm. I realize this was unlikely to work as I am combining the same lm&#39;s but tried regardless.<br>


ngram -order 5 -count-lm -lm google.countlm -expand-classes 5 -mix-lm google.countlm -write-lm arpaLM<br>Output was the same as google.countlm<br><br>I tried other things like using ngram-count and running the lm-scripts but no dice.  One of the relevant posts in the forum I posted below:<br>


<br><a href="http://www.speech.sri.com/projects/srilm/mail-archive/srilm-user/2007-April/8.html" target="_blank">http://www.speech.sri.com/projects/srilm/mail-archive/srilm-user/2007-April/8.html</a><br>The URL above mentions:<br>

<b><br>


<i>&gt;&gt; Could you give me an *example* about bulilding google 3-gram LM file<br>

&gt;&gt; ,please?<br>

&gt;&gt;   <br>&gt;Again, this will require using the  option with some tricks<br>

&gt;that are not documents<br>

&gt;as yet. Please be patient (or read all the manual pages carefully to<br>

&gt;figure it our yourself.)</i></b><br><b><br></b>Has any documentations been made regarding this? Did the trick infer using mix-lm or expand-classes to force arpa format?  <br><br>I figure worst case I do it manually but am sure there is something in SRILM that I am missing.<br>

<br>Thanks<br>Elias<br>