<div dir="ltr"><div>Hi Andreas and <span style="color:rgb(84,84,84);font-family:arial,sans-serif;line-height:16.545454025268555px">贺天行,</span><span style="color:rgb(51,51,51);font-family:'normal arial',sans-serif;font-size:16px;line-height:20px"><br>
</span><br></div><div>Thanks. I understand now.</div><div><br></div><div>Jian</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Wed, Apr 16, 2014 at 5:50 PM, Andreas Stolcke <span dir="ltr"><<a href="mailto:stolcke@icsi.berkeley.edu" target="_blank">stolcke@icsi.berkeley.edu</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
  
    
  
  <div bgcolor="#FFFFFF" text="#000000"><div><div class="h5">
    <div>On 4/16/2014 3:20 AM, jian zhang wrote:<br>
    </div>
    <blockquote type="cite">
      <div dir="ltr">Hi Andreas,
        <div><br>
        </div>
        <div>I am confused about the ppl output from ngram.</div>
        <div>The following are the outputs from two sentences,</div>
        <div><br>
        </div>
        <div>
          <div>resumption of the session</div>
          <div><span style="white-space:pre-wrap"> </span>p(
            resumption | <s> ) <span style="white-space:pre-wrap"> </span>= [1gram] 6.41856e-07 [
            -6.19256 ]</div>
          <div><span style="white-space:pre-wrap"> </span>p( of |
            resumption ...) <span style="white-space:pre-wrap"> </span>=
            [2gram] 0.547254 [ -0.261811 ]</div>
          <div><span style="white-space:pre-wrap"> </span><b>p( the
              | of ...) <span style="white-space:pre-wrap"> </span>=
              [2gram] 0.0826684 [ -1.08266 ]</b></div>
          <div><span style="white-space:pre-wrap"> </span>p(
            session | the ...) <span style="white-space:pre-wrap">
            </span>= [1gram] 1.21666e-06 [ -5.91483 ]</div>
          <div><span style="white-space:pre-wrap"> </span>p(
            </s> | session ...) <span style="white-space:pre-wrap"> </span>= [1gram] 0.00150439 [
            -2.82264 ]</div>
          <div>1 sentences, 4 words, 0 OOVs</div>
          <div>0 zeroprobs, logprob= -16.2745 ppl= 1798.46 ppl1= 11711.9</div>
          <div>4 words, rank1= 0.25 rank5= 0.5 rank10= 0.5</div>
          <div>5 words+sents, rank1wSent= 0.2 rank5wSent= 0.4
            rank10wSent= 0.4 qloss= 0.899274 absloss= 0.873714</div>
          <div><br>
          </div>
          <div>
            <div>you have requested a debate on this subject in the
              course of the next few days , during this part-session .</div>
            <div><span style="white-space:pre-wrap"> </span>p( you
              | <s> ) <span style="white-space:pre-wrap"> </span>=
              [2gram] 0.000716442 [ -3.14482 ]</div>
            <div><span style="white-space:pre-wrap"> </span>p( have
              | you ...) <span style="white-space:pre-wrap"> </span>=
              [2gram] 0.0179397 [ -1.74618 ]</div>
            <div><span style="white-space:pre-wrap"> </span>p(
              requested | have ...) <span style="white-space:pre-wrap"> </span>= [1gram] 6.43992e-06 [
              -5.19112 ]</div>
            <div><span style="white-space:pre-wrap"> </span>p( a |
              requested ...) <span style="white-space:pre-wrap"> </span>=
              [1gram] 0.00378035 [ -2.42247 ]</div>
            <div><span style="white-space:pre-wrap"> </span>p(
              debate | a ...) <span style="white-space:pre-wrap"> </span>=
              [2gram] 0.000358849 [ -3.44509 ]</div>
            <div><span style="white-space:pre-wrap"> </span>p( on |
              debate ...) <span style="white-space:pre-wrap"> </span>=
              [2gram] 0.0598839 [ -1.22269 ]</div>
            <div><span style="white-space:pre-wrap"> </span>p( this
              | on ...) <span style="white-space:pre-wrap"> </span>=
              [2gram] 0.00443142 [ -2.35346 ]</div>
            <div><span style="white-space:pre-wrap"> </span>p(
              subject | this ...) <span style="white-space:pre-wrap"> </span>= [2gram] 9.54276e-05 [
              -4.02033 ]</div>
            <div><span style="white-space:pre-wrap"> </span>p( in |
              subject ...) <span style="white-space:pre-wrap"> </span>=
              [2gram] 0.0436281 [ -1.36023 ]</div>
            <div><span style="white-space:pre-wrap"> </span>p( the
              | in ...) <span style="white-space:pre-wrap"> </span>=
              [2gram] 0.147714 [ -0.830578 ]</div>
            <div><span style="white-space:pre-wrap"> </span>p(
              course | the ...) <span style="white-space:pre-wrap">
              </span>= [3gram] 0.00139691 [ -2.85483 ]</div>
            <div><span style="white-space:pre-wrap"> </span>p( of |
              course ...) <span style="white-space:pre-wrap"> </span>=
              [3gram] 0.579381 [ -0.237035 ]</div>
            <div><span style="white-space:pre-wrap"> </span><b>p(
                the | of ...) <span style="white-space:pre-wrap"> </span>=
                [2gram] 0.0762541 [ -1.11774 ]</b></div>
            <div><span style="white-space:pre-wrap"> </span>p( next
              | the ...) <span style="white-space:pre-wrap"> </span>=
              [3gram] 0.00123622 [ -2.9079 ]</div>
            <div><span style="white-space:pre-wrap"> </span>p( few
              | next ...) <span style="white-space:pre-wrap"> </span>=
              [3gram] 0.0245328 [ -1.61025 ]</div>
            <div><span style="white-space:pre-wrap"> </span>p( days
              | few ...) <span style="white-space:pre-wrap"> </span>=
              [2gram] 0.00340647 [ -2.46769 ]</div>
            <div><span style="white-space:pre-wrap"> </span>p( , |
              days ...) <span style="white-space:pre-wrap"> </span>=
              [2gram] 0.15756 [ -0.802555 ]</div>
            <div><span style="white-space:pre-wrap"> </span>p(
              during | , ...) <span style="white-space:pre-wrap"> </span>=
              [2gram] 0.000749831 [ -3.12504 ]</div>
            <div><span style="white-space:pre-wrap"> </span>p( this
              | during ...) <span style="white-space:pre-wrap"> </span>=
              [3gram] 0.0352358 [ -1.45302 ]</div>
            <div><span style="white-space:pre-wrap"> </span>p(
              <unk> | this ...) <span style="white-space:pre-wrap"> </span>= [1gram] 9.0905e-07 [
              -6.04141 ]</div>
            <div><span style="white-space:pre-wrap"> </span>p( . |
              <unk> ...) <span style="white-space:pre-wrap">
              </span>= [1gram] 0.0254746 [ -1.59389 ]</div>
            <div><span style="white-space:pre-wrap"> </span>p(
              </s> | . ...) <span style="white-space:pre-wrap"> </span>= [2gram] 0.809733 [
              -0.091658 ]</div>
            <div>1 sentences, 21 words, 0 OOVs</div>
            <div>0 zeroprobs, logprob= -50.04 ppl= 188.168 ppl1= 241.466</div>
            <div>21 words, rank1= 0.142857 rank5= 0.428571 rank10=
              0.47619</div>
            <div>22 words+sents, rank1wSent= 0.181818 rank5wSent=
              0.454545 rank10wSent= 0.5 qloss= 0.930912 absloss=
              0.909386</div>
          </div>
          <div><br>
          </div>
          <div>My two questions:</div>
          <div>1. There are 2-gram p( the | of ...) computed from both
            sentences, why they have different probability (first
            sentence gives 0.0826684, second sentence gives 0.0762541)?</div>
        </div>
      </div>
    </blockquote></div></div>
    Because the backoff weights are dependent on the trigram context.<br>
    So the first probability equals<br>
            bow("resumption of") * p("the"| "of") <br>
    whereas the second probability is <br>
            bow("course of") * p("the" | "of")<div class=""><br>
    <blockquote type="cite">
      <div dir="ltr">
        <div>
          <div>2. Is there a parameter setting for ngram which is able
            to print out the actual tokens instead of ellipsis. </div>
          <div><br>
          </div>
          <br>
        </div>
      </div>
    </blockquote></div>
    No, unfortunately.  The idea behind the output format was to keep
    the number of fields constant so as to facilitate parsing with
    awk/perl/etc.<span class="HOEnZb"><font color="#888888"><br>
    <br>
    Andreas<br>
    <br>
  </font></span></div>

</blockquote></div><br><br clear="all"><div><br></div>-- <br><div dir="ltr">Jian Zhang<br><a href="http://www.cngl.ie/index.html" target="_blank">Centre for Next Generation Localisation (CNGL)</a><br><a href="http://www.dcu.ie/" target="_blank">Dublin City University</a></div>

</div>