<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    On 4/10/2012 1:29 AM, Saman Noorzadeh wrote:

    <blockquote

      cite="mid:1334046577.74819.YahooMailNeo@web162005.mail.bf1.yahoo.com"

      type="cite">

      <div style="color: rgb(0, 0, 0); background-color: rgb(255, 255,

        255); font-family: verdana,helvetica,sans-serif; font-size:

        12pt;">

        <div>Hello</div>

        <div>I am getting confused about the models that ngram-count

          make:</div>

        <div>ngram-count -order 2  -write-vocab vocabulary.voc -text

          mytext.txt   -write model1.bo<br>

          ngram-count -order 2  -read model1.bo -lm model2.BO</div>

        <div><br>

        </div>

        <div>forexample: (the text is very large and these words are

          just a sample)<br>

        </div>

        <div><br>

        </div>

        <div>in model1.bo:</div>

        <div>cook   14 <br>

        </div>

        <div>cook was 1</div>

        <div><br>

        </div>

        <div>in model2.BO:</div>

        <div>-1.904738  cook was </div>

        <div><br>

        </div>

        <div>my question is that the probability of 'cook was' bigram

          should be log10(1/14), but ngram-count result shows:

          log(1/80)== -1.9047</div>

        <div>how is these probabilities computed?</div>

      </div>

    </blockquote>

    <br>

    It's called "smoothing" or "discounting" and ensures that word

    sequences of ngrams never seen in the training data receive nonzero

    probability.<br>

    Please consult any of the basic LM tutorial sources listed at

    <a class="moz-txt-link-freetext" href="http://www.speech.sri.com/projects/srilm/manpages/">http://www.speech.sri.com/projects/srilm/manpages/</a>, or specifically

    <a class="moz-txt-link-freetext" href="http://www.speech.sri.com/projects/srilm/manpages/ngram-discount.7.html">http://www.speech.sri.com/projects/srilm/manpages/ngram-discount.7.html</a>

    .<br>

    <br>

    To obtain the unsmoothed probability estimates that you are

    expecting you need to change the parameters.  Try ngram-count 

    -addsmooth 0 .... <br>

    <br>

    Andreas<br>

    <br>

  </body>

</html>