Search SRILM-USER Archives

[SRILM] FLM model training on large data

From: ilya oparin <ioparin at ADDRESS HIDDEN>
Date: Sun, 22 Oct 2006 18:50:56 +0100 (BST)

--0-1621906893-1161539456=:32962
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit

Hi, everybody!

Does anyone have any experience of building a Factored Language Model on large data? There is still no problem with, say, processing a file in FLM format containing 5 mln entries, but as far as I try to feed a 50 mln FLM corpus, it needs unfeasible amount of memory (since it loads everything in memory).

Does anyone know if there are any tricks how to train an FLM model in this case? Something like building partial LMs and then merging with standard ngram-count... What could you suggest as a solution?

best regards,
Ilya

---------------------------------
Try the all-new Yahoo! Mail . "The New Version is radically easier to use" – The Wall Street Journal
--0-1621906893-1161539456=:32962
Content-Type: text/html; charset=iso-8859-1
Content-Transfer-Encoding: 8bit

Hi, everybody! Does anyone have any experience of building a Factored Language Model on large data? There is still no problem with, say, processing a file in FLM format containing 5 mln entries, but as far as I try to feed a 50 mln FLM corpus, it needs unfeasible amount of memory (since it loads everything in memory). Does anyone know if there are any tricks how to train an FLM model in this case? Something like building partial LMs and then merging with standard ngram-count... What could you suggest as a solution? best regards, Ilya
<hr size=1>
Try the <a href="http://us.rd.yahoo.com/mail/uk/taglines/default/nowyoucan/wall_st_2/*http://us.rd.yahoo.com/evt=40565/*http://uk.docs.yahoo.com/nowyoucan.html">all-new Yahoo! Mail</a> . "The New Version is radically easier to use" – The Wall Street Journal
--0-1621906893-1161539456=:32962--

Click here to go to the SRILM home page.