Search SRILM-USER Archives

Match: Format: Sort by:
Search:

[SRILM]: FLM

From: ilya oparin <ioparin at ADDRESS HIDDEN>
Date: Sun, 21 May 2006 21:34:22 +0100 (BST)

--0-1127324684-1148243662=:79238
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit

Hello,

I've been recently playing with the factored language models for the Czech language. The FLM module works perfectly with small subcorpora. However, when I try to train the model even on my heldout data (60 mln tokens), it takes huge amount of time to get the model trained (by now it's been two days I have it running). Memory problems can expected as well. So, there is almost no sense in trying to train LM on my training data (550 mln).
The question is: does anybody have experience in training FLMs on huge corpora: parallelizing tasks and so on? There is no direct way as with normal models (ngram-merge and make-big-lm features) - but are there some indirect ones?

thanks in advance,
ilya

Send instant messages to your online friends http://uk.messenger.yahoo.com
--0-1127324684-1148243662=:79238
Content-Type: text/html; charset=iso-8859-1
Content-Transfer-Encoding: 8bit

Hello,<br><br>I've been recently playing with the factored language models for the Czech language. The FLM module works perfectly with small subcorpora. However, when I try to train the model even on my heldout data (60 mln tokens), it takes huge amount of time to get the model trained (by now it's been two days I have it running). Memory problems can expected as well. So, there is almost no sense in trying to train LM on my training data (550 mln).<br>The question is: does anybody have experience in training FLMs on huge corpora: parallelizing tasks and so on? There is no direct way as with normal models (ngram-merge and make-big-lm features) - but are there some indirect ones?<br><br>thanks in advance,<br>ilya<br><p>Send instant messages to your online friends http://uk.messenger.yahoo.com
--0-1127324684-1148243662=:79238--

Click here to go to the SRILM home page.