Search SRILM-USER Archives

Re: class LM

From: Andreas Stolcke <stolcke at ADDRESS HIDDEN>
Date: Tue, 08 Oct 2002 08:52:48 PDT

In message <3DA2D0DD.AE6387DB at ADDRESS HIDDEN>you wrote:
> Andreas!
>
> Thank you for your answers.
>
> Few more questions:
>
> 1.)
> I understand the transitions like:
>
> [2gram]POSITION = 2 FROM: <504,NULL> TO: <756 504,NULL> WORD = primeri
> PROB = -1.76748 EXPANDPROB = 0.0106105
>
> (504, 756 are classs),
>
> but not the transitions like:
>
> [OOV]POSITION = 2 FROM: <504,NULL> TO: <,NULL> WORD = primeri PROB =
> -inf
>
> What does [OOV] mean? These transitions are not present in  the test
> example of the toolkit.

[OOV] means a word was not found even in the unigrams of your model.
The ClassNgram code handles LMs that contains both word and class ngrams.
It therefore always tries to also find an N-gram probabilty for each
word (without class lookup), and if you don't include all class member words
in your vocabulary when building the LM you will get this "OOV" condition.
But is is harmless since presumably all your words get some probability
by virtue of being members in some class.

> 2.) In which case  is the history string cleaned (FROM: <504,NULL> TO:
> <,NULL>) ?

When there a are no histories in the LM that start with the given class
(504).  The history is kept only a long as it needs to be to compute
subsequent N-gram probabilities (so as to minimize the state space).

>
> 3.) Is the vocabulary size in SRI-LM limited?

To the range of unsigned integers (2^32).

--Andreas

Click here to go to the SRILM home page.