Search SRILM-USER Archives

RE: perplexity evaluation

From: "Valsan, Zica" <valsan at ADDRESS HIDDEN>
Date: Wed, 4 Dec 2002 09:21:31 +0100

Thank you for your prompt answer.
I have understood that </s> is taken into account but the question is way
only it and not the other one, too? I read papers where people resort to
this strategy (choosing only one) but is not clear for me the reason for
which they do like this.

Regarding the CMU toolkit I did not say it doesn't output any probabilities
for these context cues, but it outputs the same small values for each of
them (-98.999 very close to the values outputted by SRILM toolkit). This is
somehow "equivalent" with saying there are not taken into account for
perplexity computation, I think.

Regards,
Zica

-----Original Message-----
From: Andreas Stolcke [mailto:stolcke at ADDRESS HIDDEN]
Sent: Dienstag, 3. Dezember 2002 17:48
To: Valsan, Zica
Cc: 'srilm-user at ADDRESS HIDDEN'
Subject: Re: perplexity evaluation

In message <B0793DB946E52942A49C1E8152A1358C8E3781 at ADDRESS HIDDEN>you
wrot
e:
> Hi all,
>
> I'm a new user of the toolkit and I need a little bit support in order to
> understand how the perplexity is computed and why it is different from the
> expected value.
>
> For instance, I have the training data in the file train.text that contain
> only a line:
> <s> a b c </s>
> and the vocabulary (train.vocab) that contains all these words, and I want
> to generate a LM based on unigram only and to evaluate it on the same
> training data. I don't want any discounting strategy to be applied.
> Here are the commands I used:
>
> ngram-count -order 1 -vocab train.vocab -text train.text -lm lm.arpa
gt1max
> 0
> ngram -lm out.arpa -debug 2 -vocab train.vocab -ppl train.text > out.ppl
>
>
> So, according to the theory, the expected value for perplexity is PP=3 if
> the context cues are not taken into account. This is also what one can get
> using CMU toolkit.
> Using this toolkit and the above commands what I've got actually, is PP=4.
> Looking inside of the created arpa model , I could see that </s> has the
> same probability as any of the real word (a, b,c).
> Does anybody could explain me why is like this? Did I make a mistake or is
> something that miss me?

You didn't make a mistake and this is the right answer as far as I can tell.
</s> needs to get a probability in order to be able to compute
a probability for the whole "sentence".

Are you saying that the CMU software doesn't give any probabiliy to </s> ?
that would be quite odd.

Maybe someone on this list who is more familiar with the CMU toolkit can
contribute an explanation.

--Andreas

Click here to go to the SRILM home page.