About two PPLs

Andreas Stolcke stolcke at speech.sri.com
Fri Feb 28 21:38:08 PST 2003


In message <002101c2dfb4$8801ec90$6314ce80 at speechwork>you wrote:
> This is a multi-part message in MIME format.
> 
> ------=_NextPart_000_001E_01C2DF71.79CB99C0
> Content-Type: text/plain;
> 	charset="gb2312"
> Content-Transfer-Encoding: quoted-printable
> 
> Hi,
>     I have installed srilm successfully, thanks a lot! Now I have a =
> small question about PPL output:
>     when I run "ngram" to count PPL of a testing text, there are two =
> ppls output: ppl and ppl1, what's the difference of them?=20
> =A3=A8I can't find this from the documents).

ppl is the perplexity normalized over all input tokens, 
ppl1 is omits end-of-sentence tokens from the denominator.

ppl1 is more meaningful for comparing texts that differ in their 
sentence segmentations.

BTW, this will be documented in the man page for the next release.

--Andreas 




More information about the SRILM-User mailing list