Search SRILM-USER Archives

Match: Format: Sort by:
Search:

</s> Backoff missing

From: "Tolos, Marta" <tolos at ADDRESS HIDDEN>
Date: Wed, 21 Aug 2002 11:06:22 +0200

Hi all,

I have a problem using the toolkit, I create a language model using only the
ngram-count command:

ngram-count -text my.text -lm my.arpa -wbdiscount1 -wbdiscount3 -wbdiscount3

My text file has the setences markers <s> </s>.

And then the arpa file I get, for the unigram </s> has no backoff weight and
also all the bigrams that contain </s> as the second word in the bigram have
no backoff either.
Does someone know how to get the backoff weight? My problem is that the
recognizer complains about the format of my language model, since all the
bigrams without the backoff are not considered and then at the end since
there are so many it stops.

I also have another question about the format of the arpa file created.
Between the probabilities and the words there is not a single space and this
causes problems also with the recognizer I am using. What I am doing right
now to avoid this problem is to use a perl script to fix the format and then
use the converted file that has only a single space, is there an option to
get a single space??

Thanks a lot.

Best,

Marta

Click here to go to the SRILM home page.