questions about sri toolkit

Andreas Stolcke stolcke at speech.sri.com
Thu Dec 6 17:13:56 PST 2001


In message <3C1013BC.7090302 at stl.research.panasonic.com>you wrote:
> Dear Dr. stolcke,
> 
>       This is Yan from Panasonic Speech Tech Lab in Santa Barbara.  I am 
> trying to use SRI toolkit on BN data, but always get the complain , 
> "can't allocate trie". I have 1.5G memory on my machine, which I suppose 
> should be ok for this task. Could you please give me some hints or 
> suggestions to fix this problem.
> 
>       Thank you very much!
> 
>         Yan

Yan,

I have no idea whether 1.5GB is enough, it depends entirely on the data.
Please tell me 

1 - exactly what operation you are trying to perform (counting, LM estimation,
	LM usage),

2 - what is the command line for what you are trying to do.

3 - some idea of how big the input data is.

4 - what type of machine (OS etc.)

First of all, the amount of RAM is not all that matters.  The size of
a program's address space is limited by the configured swap space (plus the
amount of real memory, at least on OSs). Also, on 32bit processors the limit 
is usually either 2GB or 4GB, regardless of the amount of swap you have.

I should say that producing trigrams for the entire BN corpus (> 100 M words)
will usually not work even with 2GB using just ngram-count and keeping
everything in memory.  That's what the "merge-batch-counts" and "make-big-lm"
scripts were made for.  Please consult the "training-scripts" manual page for
details.

Also, consider subscribing to the srilm-user mailing list.  I will not 
always be able to help (at least not right away), and other users might be
able to. See http://www.speech.sri.com/projects/srilm/welcome.html#srilm-user
for instructions on how to join.

--Andreas 




More information about the SRILM-User mailing list