Fw: From logproba on sentences to logproba on words

Amin Mantrach amantrac at ulb.ac.be
Tue Jan 29 10:35:25 PST 2008


Apparently my question doesn't meet any answer, so I'll reformulate it  
in order to be more clear.

Actually, I want to create an LM model with the command > # ngram- 
count -text textfile -lm lmfile


In the case I'm concerned with I dispose of the log-probabilities for  
every sentences  of appearing. The same that you can obtain from  
(#ngram -lm lm_file -debug 1 -ppl testfile)

What I want to do ? Create a new LM file build from probabilities on  
sentences I have.

Current ideas :

1 / Produce a text file with the sentences. Each sentence can appear  
in file multiple times. It will appear in fact exactly n times.  Where  
n = exp(log-proba of the sentence)*1000) (Rounded to integer).

And then simply :  ngram-count -text newtextsentences -lm new_lm

2 /  Produce a count file (with only the counts needed (of the highest  
order, etc.) and for each n-gram multiply the nb of occurrence by the  
sum of proba of the sentences it belongs to.
This methods is clearly not fair.


Can you answer me if one of those ideas are correct. If not how should  
I proceed.


I hope the question in now clear enough.

Thanks a lot for your help.
Amin.





More information about the SRILM-User mailing list