<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix">On 10/2/2013 1:16 AM, E wrote:<br>
    </div>
    <blockquote
      cite="mid:8D08D5EC52F045F-1094-3E82E@webmail-d268.sysops.aol.com"
      type="cite"><font color="black" face="arial" size="2"><font
          style="color: black;" face="arial" size="2">Thanks for the
          pointers! Three questions - </font>
        <div style="color: black; font-family: arial; font-size: 10pt;"><br>
        </div>
        <div style="color: black; font-family: arial; font-size: 10pt;">1.
          The same number of bins are used for all n-grams even though
          number of ngrams for each N may differ. In web1T,  </div>
        <div style="color: black; font-family: arial; font-size: 10pt;">
          <pre style="word-wrap: break-word; white-space: pre-wrap;">Number of unigrams:         13,588,391
Number of fivegrams:     1,176,470,663</pre>
        </div>
        <div>
          <div id="AOLMsgPart_1_f25a2f23-e6ce-45eb-ac97-16c58b5fe90e">
            <div class="aolReplacedBody" bgcolor="#FFFFFF"
              text="#000000" style="color: black; font-family: arial,
              helvetica; font-size: 10pt;"><font size="2"><font
                  face="arial">Would it make any improvement if
                  fivegrams were binned more number of times than
                  unigrams?</font></font></div>
          </div>
        </div>
      </font></blockquote>
    <font size="2"><font face="arial">That's good idea, but I haven't
        tried it, so I cannot say how much it would help.<br>
        It might also help to just have more bins for lower-order ngrams
        since there are more samples of them (more data, hence more
        parameters can be estimated).<br>
        <br>
      </font></font>
    <blockquote
      cite="mid:8D08D5EC52F045F-1094-3E82E@webmail-d268.sysops.aol.com"
      type="cite"><font color="black" face="arial" size="2">
        <div>
          <div id="AOLMsgPart_1_f25a2f23-e6ce-45eb-ac97-16c58b5fe90e">
            <div class="aolReplacedBody" bgcolor="#FFFFFF"
              text="#000000" style="color: black; font-family: arial,
              helvetica; font-size: 10pt;"><br>
            </div>
            <div class="aolReplacedBody" bgcolor="#FFFFFF"
              text="#000000" style="color: black;"><font face="arial"
                size="2">2. For a particular ngram in test data, the
                algorithm will decide which bin Wij's to use based on
                how many times that n-gram occurred in training data. Is
                this right?</font></div>
          </div>
        </div>
      </font></blockquote>
    <font size="2"><font face="arial">Right.<br>
        <br>
      </font></font>
    <blockquote
      cite="mid:8D08D5EC52F045F-1094-3E82E@webmail-d268.sysops.aol.com"
      type="cite"><font color="black" face="arial" size="2">
        <div>
          <div id="AOLMsgPart_1_f25a2f23-e6ce-45eb-ac97-16c58b5fe90e">
            <div class="aolReplacedBody" bgcolor="#FFFFFF"
              text="#000000" style="color: black;"><font face="arial"
                size="2"><br>
              </font></div>
            <div class="aolReplacedBody" bgcolor="#FFFFFF"
              text="#000000" style="color: black;"><font face="arial"
                size="2">3. What does it mean when some weights are zero
                after tuning them. I used just 10 sentences  (5
                repeated) in tune.txt and got google.countlm as at the
                bottom.</font></div>
          </div>
        </div>
      </font></blockquote>
    <blockquote
      cite="mid:8D08D5EC52F045F-1094-3E82E@webmail-d268.sysops.aol.com"
      type="cite"><font color="black" face="arial" size="2">
        <div>
          <div id="AOLMsgPart_1_f25a2f23-e6ce-45eb-ac97-16c58b5fe90e">
            <div class="aolReplacedBody" bgcolor="#FFFFFF"
              text="#000000" style="color: black;"><font face="arial"
                size="2"><br>
              </font></div>
            <div class="aolReplacedBody" bgcolor="#FFFFFF"
              text="#000000" style="color: black;"><font face="arial"
                size="2">For ex. w01, w02 are non-zero but w03 is zero.
                Does this mean that in the development set, there were
                no trigrams that corresponded to counts in bin 0?</font></div>
          </div>
        </div>
      </font></blockquote>
    <br>
    <font size="2"><font face="arial">Correct.<br>
        <br>
        Andreas<br>
      </font></font><br>
  </body>
</html>