Where have all the 3-grams gone?

Karl Weilhammer weilkar at phonetik.uni-muenchen.de
Tue Mar 18 14:37:13 PST 2003


Hi Andreas,

experimenting a little with SRILM, I found that ngram-count does not enter
trigrams into the language model, that occur only once, while it does so
with bigrams. The command

echo "the man hit the ball" | ngram-count -order 3 -text - -cdiscount3 0.5
-cdiscount2 0.5 -cdiscount1 0.5 -unk -lm test_C3gram.lm

results in the following language model:
__________________________________________

\data\
ngram 1=7
ngram 2=6
ngram 3=0

\1-grams:
-1.079181       </s>
-99     <s>     -0.1760913
-0.3802113      <unk>
-1.079181       ball    -0.2632414
-1.079181       hit     -0.1760913
-1.079181       man     -0.2632414
-0.60206        the     -0.2218487

\2-grams:
-0.30103        <s> the
-0.30103        ball </s>
-0.30103        hit the
-0.30103        man hit
-0.60206        the ball
-0.60206        the man

\3-grams:

\end\
_________________________________________

The same command with "-order 2" results in basically the same language
model (only the lines "ngram 3=0" and "\3-grams:" are missing).
Using "-minprune 4" and "-prune 0" did not change the result.

Is there a possibility to get entries for singular trigrams in the
language model?

Karl

----------------------------------------------------------------------------
Karl Weilhammer
Institut fuer Phonetik und Sprachliche Kommunikation
Ludwig-Maximilians-Universitaet Muenchen         Tel.: +49-(0)89-2180-2454
Schellingstr. 3                                  Fax : +49-(0)89-2800362
80799 Muenchen                 Email: weilkar at phonetik.uni-muenchen.de
GERMANY                        www  : http://www.phonetik.uni-muenchen.de/
----------------------------------------------------------------------------




More information about the SRILM-User mailing list