Search SRILM-USER Archives

Match: Format: Sort by:
Search:

Error in lattice rescoring?

From: Teemu Hirsimaki <teemu.hirsimaki at ADDRESS HIDDEN>
Date: Thu, 13 Oct 2005 12:03:10 +0300

While working on lattices, I noticed that lattice-tool seems to give
sometimes strange backoff probabilities when rescoring lattices.  I have
a simple 2-gram model test.arpa:

\data\
ngram 1=4
ngram 2=1

\1-grams:
-99 <s>
-1.00000 </s>
-0.69897 a
-0.15490 b -0.69897

\2-grams:
-0.09691 b </s>

\end\

and a simple HTK lattice file test.htk that has just words "b a </s>":

VERSION=1.1
base=10
dir=f
start=0 end=3
N=4 L=3
I=0
I=1
I=2
I=3
J=0 S=0 E=1 W=b
J=1 S=1 E=2 W=a
J=2 S=2 E=3 W=!NULL

Rescoring gives funny probabilities for "a" and "b":

$ lattice-tool -in-lattice test.htk -read-htk -lm test.arpa \
                -out-lattice - -write-htk
...
J=0     S=0     E=2     W=b     l=-0.85387 (*)
J=1     S=2     E=3     W=a     l=-0.69897
J=2     S=3     E=1     W=!NULL l=-1

The correct probabilities are given by the ngram tool:

$ echo "b a </s>" | ngram -debug 2 -lm test.arpa -ppl -
...
p( b | <s> )    = [1gram] 0.700003 [ -0.1549 ]
p( a | b ...)   = [1gram] 0.04 [ -1.39794 ]
p( </s> | a ...)= [1gram] 0.1 [ -1 ]

Did I miss something, or is there a bug in lattice-tool?  It looks like
the lattice-tool adds the backoff probability BO(b) for the first word
(*) instead of the next.  The bug seems to appear in toolkit versions
1.4.4 and 1.4.5 (OS is SuSE Linux 9.3 i686).

--
Teemu hirsimäki

Click here to go to the SRILM home page.