Divider
  Speech Technology and Research Laboratory
  People
  Current Research Activities
  Past Research Activities
  Publications
  Career Opportunities
  Seminars
  Technologies for License
  In the News
  Contact Us
  STAR Search
  Information and Computing Sciences Division
SpacerAbout UsDividerR and D DivisionsDividerCareersDividerNewsroomDividerContact UsDividerSRI HomeSpacer

Spacer
         
  SRI Logo

Search SRILM-USER Archives

Match: Format: Sort by:
Search:

Re: Factored LMs and interpolated models

From: Andreas Stolcke <stolcke at ADDRESS HIDDEN>
Date: Fri, 07 May 2004 07:02:40 PDT

There are few knowns bugs in the FLM code as last released.
They will be fixed in the next release (1.4.1) which I expect to
be out in a couple days.

--Andreas

In message <1083912755.8267.7.camel@NOOL2>you wrote:
>
>
> > Let me know if this helps or if I have misunderstood your question...
> >
>
> Hello,
>
> First, thanks to everybody for help.
>
> My goal was, as Katrin correctly assumed, "to interpolate a
> traditional class-based model and a standard n-gram model but you want
> to express this within a single FLM file". This is currently not
> possible, but it's not very important because I learned that I can
> use:
>
> ngram -factored -lm <FLM1> -mix-lm <FLM2>
>
> The above really works.
>
> Still, I noticed a strange thing with perplexity calculation. Namely,
> the perplexity figures calculated by fngram and ngram are slightly
> different.  I used the following options and got following results:
>
> fngram -ppl <testtext> -factor-file tmp/fngram_m.conf
>
> Result:
> 61 sentences, 1009 words, 26 OOVs
> 0 zeroprobs, logprob= -2760.87 ppl= 441.076 ppl1= 643.604
>
> ngram -factored -ppl <testtext> -lm tmp/fngram_m.conf 61 sentences, 1009
> words,
>
> Result:
> 26 OOVs 0 zeroprobs, logprob= -2761.16 ppl= 441.359 ppl1= 644.042
>
>
> --
>
> The above is for a FLM that in fact is standard word trigram. The
> difference is very small.
>
> However, when I test a FLM that is a word-given-two-previous-classes
> trigram, the difference is much larger:
>
> fngram -ppl <testtext> -factor-file tmp/fngram_c.conf
>
> 61 sentences, 1009 words, 26 OOVs
> 0 zeroprobs, logprob= - 2826.73 ppl= 510.034 ppl1= 750.963
>
> And the same with ngram:
>
> ngram -factored -lm tmp/fngram_c.conf -ppl <testtext>
>
> 61 sentences, 1009 words, 26 OOVs
> 0 zeroprobs, logprob= -2863.71 ppl= 553.378 ppl1= 818.917
>
>
> As you see, here the difference (ppl1= 750 vs 818) is significant. Could
> this be a configuration issue, a bug or have I understood smth wrong?
>
> Regards,
>
> Tanel Alumäe
>

Click here to go to the SRILM home page.

 

About Us  Vertical divider  R&D Divisions  Divider  Careers  Divider  Newsroom  Divider  Contact Us
©2006 SRI International, 333 Ravenswood Avenue, Menlo Park, CA 94025-3493
SRI International is an independent, nonprofit corporation. Privacy policy

Last modified Nov 21, 2008