--Apple-Mail-49-237946576
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
charset=US-ASCII;
delsp=yes;
format=flowed
Hello,
I'm using FLM to test some models.
I'm using the same data and the same vocabulary in both tools, ngram-
count and fngram-count.
I'm not able to generate the same trigram model.
The number of bigram and trigram in the LM files generated are
different.
using ngram-count, I'm getting:
\data\
ngram 1=315
ngram 2=23800
ngram 3=120408
using fngram-count, I'm getting:
\data\
ngram 0x0=315
ngram 0x1=23523
ngram 0x2=0
ngram 0x3=86366
knowing that ngram-count is used with the default parameters and the
factor file for the fngram-count is:
##rule trigram
1
U : 2 U(-1) U(-2) ntextfile.flm.cnt ntextfile.flm.lm 3
U1U2 U2 wbdiscount gtmin 3 interpolate
U1 U1 wbdiscount gtmin 1 interpolate
0 0
What are the parameters to use in the factor file in order to get
the same LM output?
Thanks
Antoine
--Apple-Mail-49-237946576
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
charset=ISO-8859-1
<HTML><BODY style=3D"word-wrap: break-word; -khtml-nbsp-mode: space; =
-khtml-line-break: after-white-space; ">Hello,<DIV><DIV><SPAN =
class=3D"Apple-style-span" style=3D"border-collapse: separate; =
border-spacing: 0px 0px; color: rgb(0, 0, 0); font-family: Helvetica; =
font-size: 12px; font-style: normal; font-variant: normal; font-weight: =
normal; letter-spacing: normal; line-height: normal; text-align: auto; =
-khtml-text-decorations-in-effect: none; text-indent: 0px; =
-apple-text-size-adjust: auto; text-transform: none; orphans: 2; =
white-space: normal; widows: 2; word-spacing: 0px; "><SPAN =
class=3D"Apple-style-span" style=3D"border-collapse: separate; =
border-spacing: 0px 0px; color: rgb(0, 0, 0); font-family: Helvetica; =
font-size: 12px; font-style: normal; font-variant: normal; font-weight: =
normal; letter-spacing: normal; line-height: normal; text-align: auto; =
-khtml-text-decorations-in-effect: none; text-indent: 0px; =
-apple-text-size-adjust: auto; text-transform: none; orphans: 2; =
white-space: normal; widows: 2; word-spacing: 0px; "><DIV><FONT =
class=3D"Apple-style-span" color=3D"#7E7E7E" face=3D"Verdana" =
size=3D"2"><SPAN class=3D"Apple-style-span" style=3D"font-size: 10px;; =
color: rgb(126, 126, 126); font-family: Verdana; "><BR =
class=3D"khtml-block-placeholder"></SPAN></FONT></DIV><DIV>I'm using FLM =
to test some models.</DIV><DIV><BR =
class=3D"khtml-block-placeholder"></DIV><DIV>I'm using the same data and =
the same vocabulary in both tools, ngram-count and =
fngram-count.</DIV><DIV>I'm not able to generate the same trigram =
model.</DIV><DIV>The number of bigram and trigram in the LM files =
generated are different.</DIV><DIV><BR =
class=3D"khtml-block-placeholder"></DIV><DIV>using ngram-count, I'm =
getting:=A0</DIV><DIV>\data\</DIV><DIV>ngram 1=3D315</DIV><DIV>ngram =
2=3D23800</DIV><DIV>ngram 3=3D120408</DIV><DIV><BR =
class=3D"khtml-block-placeholder"></DIV><DIV>using fngram-count, I'm =
getting:</DIV><DIV>\data\</DIV><DIV>ngram 0x0=3D315</DIV><DIV>ngram =
0x1=3D23523</DIV><DIV>ngram 0x2=3D0</DIV><DIV>ngram =
0x3=3D86366</DIV><DIV><BR =
class=3D"khtml-block-placeholder"></DIV><DIV>knowing that ngram-count is =
used with the default parameters and the factor file for the =
fngram-count is:</DIV><DIV><BR =
class=3D"khtml-block-placeholder"></DIV><DIV>##rule =
trigram</DIV><DIV>1</DIV><DIV>U : 2 U(-1) U(-2) ntextfile.flm.cnt =
ntextfile.flm.lm 3</DIV><DIV>U1U2<SPAN class=3D"Apple-tab-span" =
style=3D"white-space:pre"> </SPAN>U2<SPAN class=3D"Apple-tab-span" =
style=3D"white-space:pre"> </SPAN>wbdiscount<SPAN =
class=3D"Apple-tab-span" style=3D"white-space:pre"> </SPAN>gtmin =
3<SPAN class=3D"Apple-tab-span" style=3D"white-space:pre"> =
</SPAN>interpolate</DIV><DIV>U1<SPAN class=3D"Apple-tab-span" =
style=3D"white-space:pre"> </SPAN>U1<SPAN class=3D"Apple-tab-span" =
style=3D"white-space:pre"> </SPAN>wbdiscount<SPAN =
class=3D"Apple-tab-span" style=3D"white-space:pre"> =
</SPAN>gtmin<SPAN class=3D"Apple-tab-span" style=3D"white-space:pre"> =
</SPAN>1<SPAN class=3D"Apple-tab-span" style=3D"white-space:pre"> =
</SPAN>interpolate</DIV><DIV>0<SPAN class=3D"Apple-tab-span" =
style=3D"white-space:pre"> </SPAN>0</DIV><DIV><BR =
class=3D"khtml-block-placeholder"></DIV></SPAN></SPAN></DIV><DIV>What =
are the parameters=A0 to use in the factor file=A0in order to get the =
same LM output?</DIV><DIV><BR =
class=3D"khtml-block-placeholder"></DIV><DIV><BR =
class=3D"khtml-block-placeholder"></DIV><DIV>Thanks</DIV><DIV><BR =
class=3D"khtml-block-placeholder"></DIV><DIV>Antoine</DIV><BR></DIV></BODY=
></HTML>=
--Apple-Mail-49-237946576--
Click here to go to the SRILM home page.