Hi again.<br>What about using the -unk in the ngram-count command?<br>The OOVś and the zeroprobs disapear?<br><br><br><div class="gmail_quote">On Mon, Jan 11, 2010 at 12:00 PM, Manuel Alves <span dir="ltr">&lt;<a href="mailto:beleira@gmail.com">beleira@gmail.com</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><br><br><div class="gmail_quote"><div class="im">---------- Forwarded message ----------<br>

From: <b class="gmail_sendername">Manuel Alves</b> <span dir="ltr">&lt;<a href="mailto:beleira@gmail.com" target="_blank">beleira@gmail.com</a>&gt;</span><br></div><div><div></div><div class="h5">Date: Mon, Jan 11, 2010 at 11:49 AM<br>

Subject: Re: [SRILM User List] Fwd: Fwd: ngram-count<br>To: Andreas Stolcke &lt;<a href="mailto:stolcke@speech.sri.com" target="_blank">stolcke@speech.sri.com</a>&gt;<br><br><br>Hi  Andreas.<br>The output of the ngram-count was:<br>

                                               [root@localhost Corporas]# ../srilm/bin/i686/ngram-count -order 3 -text CETEMPublico1.7 -lm LM<br>

                                               warning: discount coeff 1 is out of range: 1.44451e-17<br>

<br>I dont know if there is any problem with GT discount method.<br><br><br><div class="gmail_quote"><div><div></div><div>On Fri, Jan 8, 2010 at 9:52 PM, Andreas Stolcke <span dir="ltr">&lt;<a href="mailto:stolcke@speech.sri.com" target="_blank">stolcke@speech.sri.com</a>&gt;</span> wrote:<br>

</div></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div><div></div><div>

<div bgcolor="#ffffff" text="#000000"><div>

On 1/8/2010 3:57 AM, Manuel Alves wrote:

</div><blockquote type="cite"><br>

  <br>

  <div class="gmail_quote"><div>---------- Forwarded message ----------<br>

From: <b class="gmail_sendername">Manuel Alves</b> <span dir="ltr">&lt;<a href="mailto:beleira@gmail.com" target="_blank">beleira@gmail.com</a>&gt;</span><br></div><div>

Date: Fri, Jan 8, 2010 at 10:40 AM<br>

Subject: Re: Fwd: ngram-count<br>

To: Andreas Stolcke &lt;<a href="mailto:stolcke@speech.sri.com" target="_blank">stolcke@speech.sri.com</a>&gt;<br>

  <br>

  <br>

1. ngram-count -text CETEMPublico1.7 -lm LM<br>

2.I test it in this way:<br>

                             I use the client-server architecture of

SRILM<br>

                             SERVER : ngram -lm ../$a -server-port 100

-order 3 <br>

                             CLIENT   : ngram -use-server

100\@localhost -cache-served-ngrams -ppl $ficheiro -debug 2 2&gt;&amp;1<br>

                             where $ficheiro is this:<br>

                                                                 </div></div>

</blockquote>

<br>

<blockquote type="cite">

  <div class="gmail_quote">                <br>

  <br><div>

    p( observássemos | que ...)     =  0 [ -inf ]<br>

  </div></div>

</blockquote>

<br><div>

<blockquote type="cite">

  <div class="gmail_quote">file final.txt: 6 sentences, 126 words, 0

OOVs<br>

6 zeroprobs, logprob= -912.981 ppl= 1.7615e+07 ppl1= 4.05673e+07<br>

  </div>

</blockquote>

<br></div>

It looks to me like everything is working as intended.   You are

getting zeroprobs, but not a large number of them.<br>

They are low-frequency words (like the one above), so it makes sense,

since they are probably not contained in the training corpus.<br>

<br>

The perplexity is quite high, but that could be because of a small, or

mismatched training corpus.   You didn&#39;t include the output of the

ngram-count program, it&#39;s possible that the GT (default) discounting

method reported some problems that are not evident from your mail.<br>

<br>

One thing to note is that with network-server LMs you don&#39;t get OOVs,

because all words are implicitly added to the vocabulary. Consequently,

OOVs are counted as zeroprobs instead, but both types of tokens are

equivalent for perplexity computation.<br>

Still, you could run <br>

         ngram -lm ../$a -order 3  -ppl $ficheiro -debug 2<br>

just to make sure you&#39;re getting the same result.<br><font color="#888888">

<br>

Andreas</font><div><br>

<br>

<blockquote type="cite">

  <div class="gmail_quote"><u><font color="#888888">Manuel Alves.  </font></u><br>

  <div>

  <div><br>

  <div class="gmail_quote">On Thu, Jan 7, 2010 at 8:35 PM, Andreas

Stolcke <span dir="ltr">&lt;<a href="mailto:stolcke@speech.sri.com" target="_blank">stolcke@speech.sri.com</a>&gt;</span>

wrote:<br>

  <blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

    <div bgcolor="#ffffff" text="#000000">

    <div>

    <div>On 1/6/2010 10:34 AM, Manuel Alves wrote:

    <blockquote type="cite"><br>

      <br>

      <div class="gmail_quote">---------- Forwarded message ----------<br>

From: <b class="gmail_sendername">Manuel Alves</b> <span dir="ltr">&lt;<a href="mailto:beleira@gmail.com" target="_blank">beleira@gmail.com</a>&gt;</span><br>

Date: Wed, Jan 6, 2010 at 6:33 PM<br>

Subject: ngram-count<br>

To: <a href="mailto:srilm-user@speech.sri.com" target="_blank">srilm-user@speech.sri.com</a><br>

      <br>

      <br>

Hi people.<br>

I need help whith ngram-count because i am training a model but when

after i try to use it some test example he gives me Zeroprobs in the

output.<br>

This means that the model is bad trained?<br>

Please answer me.<br>

Best regards,<br>

      <font color="#888888">Manuel Alves.<br>

      </font></div>

    </blockquote>

    <br>

    </div>

    </div>

    </div>

  </blockquote>

  </div>

  </div>

  </div>

  </div>

</blockquote>

</div></div>

<br></div></div>_______________________________________________<br>

SRILM-User site list<br>

<a href="mailto:SRILM-User@speech.sri.com" target="_blank">SRILM-User@speech.sri.com</a><br>

<a href="http://www.speech.sri.com/mailman/listinfo/srilm-user" target="_blank">http://www.speech.sri.com/mailman/listinfo/srilm-user</a><br></blockquote></div><br>

</div></div></div><br>

</blockquote></div><br>