Search SRILM-USER Archives

Re: 0-grams

From: Andreas Stolcke <stolcke at ADDRESS HIDDEN>
Date: Mon, 08 Jul 2002 11:40:41 PDT

There are no 0-gram models, mostly because the DARPA format does not
support that.  Because of that, SRILM handles the backoff probability mass
at the unigram level in a special way:  it is distributed over all unobserved
words.  This is equivalent to having a backoff to 0-th order distribution.

In practical terms, you use

ngram-count -vocab VOCAB -order 1 -lm LM

Since no ngram counts or text data are supplied, the mechanism that
distributes backoff probability mass for unigrams will spread all
probability uniformly over the entire vocabulary (which you have to
supply of course).

Of course -order 0 should not make the program core dump -- i'll fix that.

--Andreas

In message <3D2947A2.7040304 at ADDRESS HIDDEN>you wrote:
> Hello,
>
> I'd like to create 0-grams as well as higher-order n-grams, but when I
> call ngram-count with option -order 0 I get a segmentation fault (SRI LM
> 1.3.1).
>
> Regards
> Matthias
>

Click here to go to the SRILM home page.