<div dir="ltr"><div><div><div>Dear fellow users<br><br>I am trying to build a factored model for Estonian (which is morphologically tagged, using tree tagger). The fngram-count program seems to run without issues. However, when I use fngram program to estimate the perplexity of a test sample, I get an error.<br>


<br></div><div>I found the same question asked here before (<a href="http://www.speech.sri.com/pipermail/srilm-user/2011q3/001088.html">http://www.speech.sri.com/pipermail/srilm-user/2011q3/001088.html</a>), but I could not find a response to this email. Hence, I am posting it to the list again.<br>


</div><div><br></div><div>I am pasting below the error I am getting while running fngram program and also the contents of my factor-file that I used with both fngram-count and fngram programs. Please let me know if any more information is needed.<br>


<br></div><div>The error:<br></div><div><br>***<br>w_g4_w1w2m1m2.count.gz: line 14172: malformed N-gram count or more than 100 words per line<br>warning: no singleton counts<br>GT discounting disabled<br>warning: no singleton counts<br>


GT discounting disabled<br>warning: no singleton counts<br>GT discounting disabled<br>warning: no singleton counts<br>GT discounting disabled<br>warning: no singleton counts<br>GT discounting disabled<br>warning: no singleton counts<br>


GT discounting disabled<br>warning: no singleton counts<br>GT discounting disabled<br>warning: no singleton counts<br>GT discounting disabled<br>warning: no singleton counts<br>GT discounting disabled<br>warning: no singleton counts<br>


GT discounting disabled<br>warning: no singleton counts<br>GT discounting disabled<br>s_g4_w1w2m1m2.lm.gz: line 21: error, ngram line has invalid number (1) of fields, expecting either 2 or 3<br>format error in lm file<br>


*******<br><br></div>I am still new to using factored models, and I am as of now only using the example settings given in the Kirchhoff, Blimes and Duh tutorial. <br><br></div>Here is how my factor-file looks like:<br><br>


******<br>##word given word-1 word-2 morph-1 morph-2<br>1<br>W : 4 W(-1) W(-2) M(-1) M(-2) w_g4_w1w2m1m2.count.gz s_g4_w1w2m1m2.lm.gz 5<br>0b0111 0b0010 wbdiscount gtmin 4 interpolate<br>0b1101 0b1000 wbdiscount gtmin 3 interpolate<br>


0b0101 0b0001 wbdiscount gtmin 2 interpolate<br>0b0100 0b0100 wbdiscount gtmin 1 interpolate<br>0b0000 0b0000 wbdiscount gtmin 1<br>******<br><br></div>My training data look like this:<br><s><br>W-Eksamitöö:M-S.com.pl.nom<br>


W-I.:M-Y.nominal.?<br>W-Pange:M-V.main.imper.pres<br>W-sulgudes:<a href="http://M-S.com.pl.in">M-S.com.pl.in</a><br>W-olevad:M-A.pos.pl.nom<br>W-sõnad:M-S.com.pl.nom<br>W-õigesse:M-A.pos.sg.ill<br>W-vormi:M-S.com.sg.adit<br>


W-!:M-Z.Exc<br>W-Piret:M-S.prop.sg.nom<br>W-Toomet:M-S.prop.sg.abl<br>W-on:M-V.main.indic.pres.ps3<br>W-ettevõtlik:M-A.pos.sg.nom<br>W-naine:M-S.com.sg.nom<br>W-.:M-Z.Fst<br></s><br>******<br><div><div><br></div><div>


Thanks,<br></div><div>Pasya.<br>

</div></div></div>