[SRILM User List] a question about srilm
stolcke at speech.sri.com
Thu Feb 11 09:58:10 PST 2010
On 2/10/2010 9:32 PM, Sun, Xie (MU-Student) wrote:
> Dear Dr.Andreas Stolcke,
> Sorry to bother you again. I know this kind of emails make your life
> really hard.
I would appreciate it if you could join the srilm-user mailing list and
ask your questions there.
> Right now, I want to train a class-based model and then come back with
> the word model. So I use (1), (2) and (3) to create the class model.
> In (4), I want to replace classes with words and generate the trigram
> word model. I am wondering if the commands below are correct:
> (1) ngram-class –vocab dict -tolower –text textfile –numclasses 5
> -classes classfile
> (2) replace-words-with-classes classes=classfile textfile >
> (3) ngram-count -tolower -text output_text_with_classes -lm
> (4) ngram -lm class_based_model -classes classfile -expand-classes
> 3 -write-lm output_trigram_word_model
> Everytime I tried this, I always get an error as below:
> assertion "body !=0" failed: file "../../include/LHash.cc", line 138
> Before this error happened, I observed that the memory occupancy could
> be around 2G. But I have 4G memory for my computer. So I don't know
> what's wrong. Could you give me some idea? I will really appreciate
> your help.
You are running out of memory. That's because -expand-classes can be
very memory intensive.
Even if you have 4GB our operating system might only support up to 2GB
if you are using 32bit pointers. You might have to compile SRILM for
64bit pointers. But even that is no guarantee that you will have enough
You could try pruning the class LM first and then expand. By choosing
different pruning thresholds you can get a feel for the growth of the
expanded model as a function of the class LM.
> *From:* Andreas Stolcke [stolcke at speech.sri.com]
> *Sent:* Thursday, February 04, 2010 5:15 PM
> *To:* Sun, Xie (MU-Student)
> *Cc:* srilm-user
> *Subject:* Re: a question about srilm
> On 2/3/2010 9:18 PM, Sun, Xie (MU-Student) wrote:
>> Dear Dr.Andreas Stolcke,
>> I am PhD student from University of Missouri. My name is Xie Sun.
>> Right now I am using the SRILM toolkit to train a language model. I
>> want to use the model adaptation function. I am using the command as
>> ngram -lm main_model -adapt-marginals model_adapted -base-marginals
>> unigram_model -ppl test_file
>> where the unigram_model is coming from the main_model and
>> model_adapted is the mode I want to adapt.
> I'm assuming model_adapted refers to the marginals of the adaptation
> data. Then this command is correct.
> Note that -base-marginals needs to be specified only if the unadapted
> unigrams are different from the unigrams in main_model.
>> I am not sure if what I did is correct. Besides, one more important
>> question is how I can output the adapted model. I used the option
>> -write-lm. But it does not work. Could you give me some hints? I will
>> really appreciate your help.
> The AdaptMarginals does not support -write-lm at this point. It
> should be very slow as you'd have to compute the normalization for
> every context appearing in the LM.
> You can simulate the effect of -write-lm by employing the
> -rescore-ngram option:
> ngram -lm -main_model -adapt-marginals ... -rescore-ngram main_model
> write-lm new_model
> Again, this will take a very long time for regular-sized LMs.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the SRILM-User