Hello,<br><br>I've been recently playing with the factored language models for the Czech language. The FLM module works perfectly with small subcorpora. However, when I try to train the model even on my heldout data (60 mln tokens), it takes huge amount of time to get the model trained (by now it's been two days I have it running). Memory problems can expected as well. So, there is almost no sense in trying to train LM on my training data (550 mln).<br>The question is: does anybody have experience in training FLMs on huge corpora: parallelizing tasks and so on? There is no direct way as with normal models (ngram-merge and make-big-lm features) - but are there some indirect ones?<br><br>thanks in advance,<br>ilya<br><p>Send instant messages to your online friends http://uk.messenger.yahoo.com