Search SRILM-USER Archives

Match: Format: Sort by:
Search:

Re: SRILM 1.4

From: Andreas Stolcke <stolcke at ADDRESS HIDDEN>
Date: Thu, 04 Mar 2004 13:35:58 PST

>
> > This would be one solution.  Use ngram-counts -read
> > and then ngram -counts.   Just reorder the words in the N-grams to
> > reflect the
> > backoff order you want.
> >
>
> So how exactly would I reorder them supposing I wanted to do the backoff
> as I explained earlier?  Can you just give a concrete example of
> reordering them...?

This works only if each backoff level drops exactly one of the history
elements.  So if you want to backoff

p(a|b,c,d) -> p(a|b,c) -> p(a|c)

you are dropping history words in the order 3 (farthest), then 1 (nearest),
then 2.
To achieve this extract N-grams (d c b a) from your data and prepare a count
file with

d b c a <count>

For training (ngram-count) you also need to generate the lower-order counts,
ie.

b c a <count>
c a <count>
a <count>

For testing (ngram -counts) you only need the highest order counts.
(except at the start of sentence where the length of the N-grams is
liminted by the <s> tag).

--Andreas

Click here to go to the SRILM home page.