Search SRILM-USER Archives

Re: Disambig n-best scores

From: Andreas Stolcke <stolcke at ADDRESS HIDDEN>
Date: Tue, 30 Mar 2004 15:58:02 PST

In message <009501c4166e$a0b50cd0$34284484 at ADDRESS HIDDEN>you wrote:
> Hi,
>
> How is path score in disambig with n-best option calculated?
>
> For example, suppose that I have the sentence:
>
> W1 W2
> Which is tagged with T1 T2
>
> Then I calculated the path probability as follows:
>
> Log10 [ P(T1|<s>)*P(T2|T1)*P(<\s>|T2)*P(W1|T1)*P(W2|T2) ]
>
> I got it "almost right" . I checked for two paths:
> For one I got -20.549 (while disambig returned -120.549)
> For the other I got -20.837 (while disambig returned -120.837)
>
> What is the reason for this difference? Should I always ignore the "1"
> after the "-"?

The -100 comes from an OOV word. When the LM returns a probability of 0
AND the word is not in the LM it is considered an OOV. To allow the
probability computation to go on a large negative, but finite, log probability
of -100 is substituted (cf. the constant LogP_PseudoZero in disambig.cc).

--Andreas

Click here to go to the SRILM home page.