Search SRILM-USER Archives

Match: Format: Sort by:
Search:

Re: About two PPLs

From: Andreas Stolcke <stolcke at ADDRESS HIDDEN>
Date: Fri, 28 Feb 2003 21:38:08 PST

In message <002101c2dfb4$8801ec90$6314ce80@speechwork>you wrote:
> This is a multi-part message in MIME format.
>
> ------=_NextPart_000_001E_01C2DF71.79CB99C0
> Content-Type: text/plain;
> charset="gb2312"
> Content-Transfer-Encoding: quoted-printable
>
> Hi,
>     I have installed srilm successfully, thanks a lot! Now I have a =
> small question about PPL output:
>     when I run "ngram" to count PPL of a testing text, there are two =
> ppls output: ppl and ppl1, what's the difference of them?=20
> =A3=A8I can't find this from the documents).

ppl is the perplexity normalized over all input tokens,
ppl1 is omits end-of-sentence tokens from the denominator.

ppl1 is more meaningful for comparing texts that differ in their
sentence segmentations.

BTW, this will be documented in the man page for the next release.

--Andreas

Click here to go to the SRILM home page.