AW: SRILM to Sphinx lm.DMP

Andreas Stolcke stolcke at speech.sri.com
Wed Apr 16 10:22:53 PDT 2008


> 
> Dear Mr. Stolcke,
> =20
> Thanks for yor fast reply. I already tried the "sort-lm" script you =
> suggested. Unfortunately using this sorted n-gram lm with the tool =
> "lm3g2dmp" results in errors. I found out, that the lm output of the =
> SRILM n-gram tool has no values if the backoff weight is 0. This fact =
> causes the errors in "lm3g2dmp". By adding the value 0.0 to every 1-gram =
> and 2-gram with no backoff weight, I managed to have the model dumped in =
> Sphinx 3 format.
> I wonder why there are so many backoff weights 0? Does this depend on =
> these warnings I get? =20
> 
> warning: no singleton counts
> GT discounting disabled
> warning: no singleton counts
> GT discounting disabled
> warning: no singleton counts
> GT discounting disabled
> =20
> I call the program with:
> 
> ngram-count -order 3 -vocab in.vocab -read-with-mincounts -read in.count =
> -lm out.lm -gt1min 1 -gt2min 3 -gt3min 3 -gt1max 7 -gt2max 7 -gt3max 7
> =20
> What have I got the change for not getting these warnings? How can I get =
> backoff weights that are not 0 for 1-grams? Example output:
> 
> -4.648435 b_aI_n_a:_@ -2.68299
> -6.30688 b_aI_n_a:_m_ at _n
> -6.186905 b_aI_n_b_r_U_x -1.056842

You are getting the warnings because -read-with-mincounts discards 
counts below your minimum counts, yet those are needed for 
computing the discounting factors according to the Good Turing method.

If memory is not an issue, simply don't use -read-with-mincounts.
If memory is a problem, use the "make-big-lm" script instead of ngram-count.
(and the read the FAQ on memory issues).

The "missing" backoff weights are not there because they are redundant.
Only ngrams that are a prefix to longer ngrams need a backoff weight.
Due to count cutoffs, you typically have many lower-order ngrams that don't 
need backoff weights.

As someone already pointed out, you can use the "add-dummy-bows" command
to insert 0 backoff weights for software that requires them.

Andreas 




More information about the SRILM-User mailing list