help with ngram-count

Andreas Stolcke stolcke at speech.sri.com
Sat Apr 19 10:13:13 PDT 2003


For ngram backup you distribute the probabilty mass left over by 
ngrams of order k in proportion to probabilities given by ngrams of order k-1.

What the error message is saying is that the k-1-grams don't assign any
probability to the words that don't already have k-grams.  This can happen
especially when you disable smoothing as you did.

The problem should go away if you include all trigrams from your training 
data.  the default minimum count for trigrams 2, so you need to use
-gt3min 1 in addition to the options you have.

--Andreas

In message <20030419045423.33794.qmail at web41604.mail.yahoo.com>you wrote:
> --0-1120635126-1050728063=:32317
> Content-Type: text/plain; charset=us-ascii
> 
> I encountered the following problem reported from ngram-count: BOW denominato
> r for context "D SMALL" is 0 <= 0,numerator is 0.0909091 The switches I invok
> ed is: zcat EN.count.1.gz EN.count.2.gz EN.count.3.gz | perl -pe 's/<UNK>/<un
> k>/g' | ./bin/ngram-count -memuse -read - -vocab ML.vocab -order 3 -cdiscount
> 3 0  -cdiscount2 0 -cdiscount1 0  -unk -lm -  | ./bin/add-dummy-bows - | perl
>  -pe 's/<unk>/<UNK>/g' | gzip >! EN.arpabo.3.gz Could someone help me to get 
> rid of that warning msg?   Thanks, June   
> 
> 



More information about the SRILM-User mailing list