Search SRILM-USER Archives

Match: Format: Sort by:
Search:

Language Model output problem using FLM

From: Antoine Ghaoui <Antoine.Ghaoui at ADDRESS HIDDEN>
Date: Thu, 15 Feb 2007 10:09:39 +0200

Hello,

I'm trying to use fngram-count to generate a Language Model based on  
Morphology.
I'm trying to generate a trigram model in order to be familiar with  
the tool.

The factor file is:

## word trigram
1
W : 2 W(-1) W(-2) ntextfile_99.flm.cnt ntextfile_99.flm.lm 3
W1W2    W2      kndiscount gtmin 1 interpolate
W1      W1      kndiscount gtmin 1 interpolate
0       0       kndiscount gtmin 1

The command line used is:
fngram-count -factor-file flm_spc.1 -text ntextfile_99.flm -lm  
ntextfile_99.flm.lm -vocab ntextfile.vocab.flm

The lm file generated is a little bit strange. A part of it is shown  
below:
\data\
ngram 0x0=18119
ngram 0x1=2855740
ngram 0x2=0
ngram 0x3=6490198

\0x0-grams:
-2.313375       </s>
-99     <s>
.
.
\0x1-grams:
-0.9892201      <s> W-LTN       -1.629908
.
.
\\0x2-grams:

\0x3-grams:
-0.9725394      <s> <s> W-LTN   -1.654503
.
.
\end\

Can you please help on this? Is it normal to have ngram 0x2=0? How  
can I get the old format?

Thanks for your help

Antoine

Click here to go to the SRILM home page.