Search SRILM-USER Archives

Match: Format: Sort by:
Search:

Re: SRILM help needed

From: Andreas Stolcke <stolcke at ADDRESS HIDDEN>
Date: Sat, 06 Apr 2002 14:18:58 PST

Zhu,

the default smoothing algorithm in ngram-count is Good-Turing.
The default parameters (as displayed by ngram-count -help) are:

-gt1min:       lower 1gram discounting cutoff
                Default value: 1
-gt1max:       upper 1gram discounting cutoff
                Default value: 1
-gt2min:       lower 2gram discounting cutoff
                Default value: 1
-gt2max:       upper 2gram discounting cutoff
                Default value: 7
-gt3min:       lower 3gram discounting cutoff
                Default value: 2
-gt3max:       upper 3gram discounting cutoff
                Default value: 7
-gt4min:       lower 4gram discounting cutoff
                Default value: 2
-gt4max:       upper 4gram discounting cutoff
                Default value: 7
-gt5min:       lower 5gram discounting cutoff
                Default value: 2
-gt5max:       upper 5gram discounting cutoff
                Default value: 7
-gt6min:       lower 6gram discounting cutoff
                Default value: 2
-gt6max:       upper 6gram discounting cutoff
                Default value: 7

So all unigram and bigrams are kept, but singleton ngrams of higher orders
are discarded (which is a pretty standard choice).

I'm not sure I understand your question about hidden-ngram.
It doesn't use any "cut-offs".   Cut-offs apply in N-gram model
training, hidden-ngram only uses the model as it is produced by
ngram-count (or some other program).

--Andreas

PS.  Your message to srilm-user didn't make it to the list because you are
not a subscriber.  As way to control junk mail, only subscribers can post
to the list.  To join, send a message containing "subscribe srilm-user"
to majordomo at ADDRESS HIDDEN.

------- Forwarded Message

Date: Thu, 4 Apr 2002 20:55:13 -0500 (EST)
From: Zhu Zhang <zhuzhang at ADDRESS HIDDEN>
X-X-Sender: zhuzhang at ADDRESS HIDDEN
To: srilm-user at ADDRESS HIDDEN
Subject: SRILM help needed
Message-ID: <Pine.SOL.4.44.0204042045410.16911-100000 at ADDRESS HIDDEN>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Content-Length: 313

Hi,

Could anybody provide the following info about SRILM, which doesn't seem
to be very clear from the documentation:

- -  What is the defaul smoothing algorithm for ngram-count?
- -  what are the smoothing parameters?
- -  In hidden-ngram, what are the event cut-off frequencies?

Thanks in advance for any help!

------- End of Forwarded Message

Click here to go to the SRILM home page.