ngram-class is too time consuming

Andreas Stolcke stolcke at speech.sri.com
Tue Oct 28 22:51:04 PDT 2008


In message <00163646b9f02245f0045a5d8106 at google.com>you wrote:
> --00163646b9f02245d8045a5d81db
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed; delsp=yes
> Content-Transfer-Encoding: 7bit
> 
> I run the "make TEST" ,it output as:
> *** Running test class-ngram ***
> 0.18user 0.05system 0:00.24elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+616outputs (0major+2336minor)pagefaults 0swaps
> class-ngram: stdout output IDENTICAL.
> class-ngram: stderr output IDENTICAL.
> 
> so is it right?

Looks ok, yes.

> 
> and yesterday, I used a vocabulary of 100K, the highest counts of 282K  
> vocabulary ,
> and used the count file by the "ngram-count ",not the text file, also to 2K  
> word classes,
> now it has run about 20 hours, it iterated 7685 times now ,and has another  
> 90K times of iteration,
> so the time is too long.

Well, you have to be patient when dealing with large data problems.
Note that each iteration takes less time, so the remaining iterations
will go ever faster.

> is it normal? or normally how can I do it to be a little quilk?

You can design (and implement and publish) a new and improved algorithm that
runs fast enough for your purposes!  I highly recommend this solution.

> 
> very thanks!

You are welcome!

Andreas




More information about the SRILM-User mailing list