ngram-count -read performance difference for different tokens

Ergun Bicici ebicici at ku.edu.tr
Sat Dec 20 16:00:18 PST 2008


Dear SRILM List Members,

I was experimenting with the "-use-server" option of ngram and it appears to
work for "-ppl" calculations from text but I was receiving different numbers
when working with count files. With some debugging, I realized that this was
due to the server receiving <unk> tokens from the client.

I made the following modification:

line 352, LM.cc, version 1.5.7:
    //vocab.getIndices(words, wids, order + 1, vocab.unkIndex());
    vocab.addWords(words, wids, order + 1);

and I am able to get the same results with or without using a server.

I have not checked whether this will effect "-cache-served-ngrams" policy or
whether this may have other impacts on the results.

Regards,
Ergun

Ergun Bicici
Koc University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20081221/9eaef1f8/attachment.html>


More information about the SRILM-User mailing list