Error in lattice rescoring?

Teemu Hirsimaki teemu.hirsimaki at hut.fi
Thu Oct 13 02:03:10 PDT 2005


While working on lattices, I noticed that lattice-tool seems to give 
sometimes strange backoff probabilities when rescoring lattices.  I have 
a simple 2-gram model test.arpa:

\data\
ngram 1=4
ngram 2=1

\1-grams:
-99 <s>
-1.00000 </s>
-0.69897 a
-0.15490 b -0.69897

\2-grams:
-0.09691 b </s>

\end\

and a simple HTK lattice file test.htk that has just words "b a </s>":

VERSION=1.1
base=10
dir=f
start=0 end=3
N=4 L=3
I=0
I=1
I=2
I=3
J=0	S=0	E=1	W=b
J=1	S=1	E=2	W=a
J=2	S=2	E=3	W=!NULL

Rescoring gives funny probabilities for "a" and "b":

$ lattice-tool -in-lattice test.htk -read-htk -lm test.arpa \
                -out-lattice - -write-htk
...
J=0     S=0     E=2     W=b     l=-0.85387 (*)
J=1     S=2     E=3     W=a     l=-0.69897
J=2     S=3     E=1     W=!NULL l=-1

The correct probabilities are given by the ngram tool:

$ echo "b a </s>" | ngram -debug 2 -lm test.arpa -ppl -
...
p( b | <s> )    = [1gram] 0.700003 [ -0.1549 ]
p( a | b ...)   = [1gram] 0.04 [ -1.39794 ]
p( </s> | a ...)= [1gram] 0.1 [ -1 ]

Did I miss something, or is there a bug in lattice-tool?  It looks like 
the lattice-tool adds the backoff probability BO(b) for the first word 
(*) instead of the next.  The bug seems to appear in toolkit versions 
1.4.4 and 1.4.5 (OS is SuSE Linux 9.3 i686).

-- 
Teemu hirsimäki




More information about the SRILM-User mailing list