Divider
  Speech Technology and Research Laboratory
  People
  Current Research Activities
  Past Research Activities
  Publications
  Career Opportunities
  Seminars
  Technologies for License
  In the News
  Contact Us
  STAR Search
  Information and Computing Sciences Division
SpacerAbout UsDividerR and D DivisionsDividerCareersDividerNewsroomDividerContact UsDividerSRI HomeSpacer

Spacer
         
  SRI Logo

Search SRILM-USER Archives

Match: Format: Sort by:
Search:

Re: Error in lattice rescoring?

From: Andreas Stolcke <stolcke at ADDRESS HIDDEN>
Date: Thu, 13 Oct 2005 10:51:34 PDT

In message <434E22CE.3090005 at ADDRESS HIDDEN>you wrote:
> While working on lattices, I noticed that lattice-tool seems to give
> sometimes strange backoff probabilities when rescoring lattices.  I have
> a simple 2-gram model test.arpa:
>
> \data\
> ngram 1=4
> ngram 2=1
>
> \1-grams:
> -99 <s>
> -1.00000 </s>
> -0.69897 a
> -0.15490 b -0.69897
>
> \2-grams:
> -0.09691 b </s>
>
> \end\
>
> and a simple HTK lattice file test.htk that has just words "b a </s>":
>
> VERSION=1.1
> base=10
> dir=f
> start=0 end=3
> N=4 L=3
> I=0
> I=1
> I=2
> I=3
> J=0 S=0 E=1 W=b
> J=1 S=1 E=2 W=a
> J=2 S=2 E=3 W=!NULL
>
> Rescoring gives funny probabilities for "a" and "b":
>
> $ lattice-tool -in-lattice test.htk -read-htk -lm test.arpa \
>                 -out-lattice - -write-htk
> ...
> J=0     S=0     E=2     W=b     l=-0.85387 (*)
> J=1     S=2     E=3     W=a     l=-0.69897
> J=2     S=3     E=1     W=!NULL l=-1
>
> The correct probabilities are given by the ngram tool:
>
> $ echo "b a </s>" | ngram -debug 2 -lm test.arpa -ppl -
> ...
> p( b | <s> )    = [1gram] 0.700003 [ -0.1549 ]
> p( a | b ...)   = [1gram] 0.04 [ -1.39794 ]
> p( </s> | a ...)= [1gram] 0.1 [ -1 ]
>
> Did I miss something, or is there a bug in lattice-tool?  It looks like
> the lattice-tool adds the backoff probability BO(b) for the first word
> (*) instead of the next.  The bug seems to appear in toolkit versions
> 1.4.4 and 1.4.5 (OS is SuSE Linux 9.3 i686).

It's not a bug.  It you add all the scores along the path for
<s> b a </s> you get -2.55284, which is the right score.

You can verify this with

echo "<s> b a </s>" | \
lattice-tool -in-lattice test-rescored.htk -read-htk -ppl - -debug 2

which traces the path and aggregate probabilities of the path through
the lattice.

Since the nodes for a and b correspond to backoff contexts, the weights
are assigned as follows:

transition weight
<s> -> b p(b) + bow(b)
b -> a p(a) + bow(a)
a -> </a> p(</s>)

It is more compact to assign the backoff weight to the transitions coming
INto the corresponding node, in case that node has multiple successors.

If you want to see the weight assignment you expect you can use the
lattice-tool -old-expansion option, but it can only handle up to 3-gram LMs.
The default algorithm is both more general and yields more compact lattices.

--Andreas

Click here to go to the SRILM home page.

 

About Us  Vertical divider  R&D Divisions  Divider  Careers  Divider  Newsroom  Divider  Contact Us
©2006 SRI International, 333 Ravenswood Avenue, Menlo Park, CA 94025-3493
SRI International is an independent, nonprofit corporation. Privacy policy

Last modified Nov 21, 2008