Divider
  Speech Technology and Research Laboratory
  People
  Current Research Activities
  Past Research Activities
  Publications
  Career Opportunities
  Seminars
  Technologies for License
  In the News
  Contact Us
  STAR Search
  Information and Computing Sciences Division
SpacerAbout UsDividerR and D DivisionsDividerCareersDividerNewsroomDividerContact UsDividerSRI HomeSpacer

Spacer
         
  SRI Logo

Search SRILM-USER Archives

Match: Format: Sort by:
Search:

Re: -gt1min

From: Andreas Stolcke <stolcke at ADDRESS HIDDEN>
Date: Wed, 01 Nov 2006 09:27:32 PST

In message <45484C95.4030401 at ADDRESS HIDDEN>you wrote:
> Andreas Stolcke wrote:
> > In message <45475E03.4040105 at ADDRESS HIDDEN>you wrote:
> >> Hi Andreas,
> >>
> >> ngram-count effectively ignores the -gt1min option, i.e. the cutoff
> >> value for unigrams. Is that the desired behavior?
> >
> > How ddo you reach this conclusions?
> >
> > Andreas
> >
> >
> e.g.,
> ngram-count -order 1 -gt1min 1 -text <text> -lm lm1
> ngram-count -order 1 -gt1min 5 -text <text> -lm lm5
> both produce the same list of unigrams (same length), just the logprob
> changes. I would have expected unigrams below gt1min being pruned (as
> are ngrams of higher order) and hence the list in lm5 being shorter...
>
> Ronny
>
> --
> ------------------------------------
> Ronny Melz
> IfI, NLP Dept, University of Leipzig
> Augustusplatz 10/11
> 04109 Leipzig, Germany
> ------------------------------------
>

Ronny,

the fact that all words appear in the unigrams does not mean that -gt1min
doesn't work.  For historical reasons the unigram list also serves the
purpose of listing the vocabulary of the LM.  Therefore SRILM always
includes all words in the unigrams.  However, those words that are excluded
by -gt1min would get a probability that corresponds to the zero-order backoff
probability.  Zero-order backoff probabilities are obtained by distributing
the probability mass left over from unigram discounting over all
words.

If you want to exclude certain words from the LM altogether use the
-vocab option.

Andreas

Click here to go to the SRILM home page.

 

About Us  Vertical divider  R&D Divisions  Divider  Careers  Divider  Newsroom  Divider  Contact Us
©2006 SRI International, 333 Ravenswood Avenue, Menlo Park, CA 94025-3493
SRI International is an independent, nonprofit corporation. Privacy policy

Last modified Nov 21, 2008