<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">On 9/30/2013 10:46 PM, E wrote:<br>

    </div>

    <blockquote

      cite="mid:8D08C80B82D9757-1094-339A7@webmail-d268.sysops.aol.com"

      type="cite"><font color="black" face="arial" size="2">Hello,

        <div><br>

        </div>

        <div>I'm trying to understand the meaning of "google.count.lm0"<span

            style="font-family: Helvetica, Arial, sans-serif; font-size:

            10pt;"> file as given in FAQ section on creating LM from

            Web1T corpus. From what I read in Sec 11.4.1 Deleted

            Interpolation Smoothing in Spoken Language Processing, by

            Huang et al. </span></div>

        <div><span style="font-family: Helvetica, Arial, sans-serif;

            font-size: 10pt;">(equation 11.22) bigram case</span></div>

        <div><span style="font-family: Helvetica, Arial, sans-serif;

            font-size: 10pt;"><br>

          </span></div>

        <div><span style="font-family: Helvetica, Arial, sans-serif;

            font-size: 10pt;">P(w_i | w_{i-1}) = \lambda * P_{MLE}(</span><span

            style="font-family: Helvetica, Arial, sans-serif; font-size:

            10pt;">w_i | w_{i-1}</span><span style="font-family:

            Helvetica, Arial, sans-serif; font-size: 10pt;">) + (1 -

            \lambda) * P(w_i)</span></div>

        <div><span style="font-family: Helvetica, Arial, sans-serif;

            font-size: 10pt;"><br>

          </span></div>

        <div><span style="font-family: Helvetica, Arial, sans-serif;

            font-size: 10pt;">They call \lambda's as the mixture

            weights. I wonder if they are conceptually the same as the

            ones used in google.countlm. If so why are they arranged in

            a 15x5 matrix? Where can I read more about the same? <br>

          </span></div>

      </font></blockquote>

    <font size="2"><font face="arial"><br>

        I don't have access to the book chapter you cite, but from the

        equation it looks like a single fixed interpolation weight is

        used.<br>

        <br>

        In the SRILM count-lm implementation you have separate lambdas

        assigned to different groups of context ngrams, as a function of

        the frequency of those contexts.  This is what is called

        "Jelinek-Mercer" smoothing in

        <a class="moz-txt-link-freetext" href="http://acl.ldc.upenn.edu/P/P96/P96-1041.pdf">http://acl.ldc.upenn.edu/P/P96/P96-1041.pdf</a> , where the

        bucketing of the contexts is done based on frequency (as

        suggested in the paper).  The specifics are spelled out in the

        ngram(1) man page.  The relevant bits are:<br>

        <br>

                           mixweights M<br>

                            w01 w02 ... w0N<br>

                            w11 w12 ... w1N<br>

                            ...<br>

                            wM1 wM2 ... wMN<br>

                           countmodulus m<br>

        <br>

                     M specifies the number of mixture weight bins

        (minus  1).   m  is<br>

                      the  width  of a mixture weight bin.  Thus, wij is

        the mixture weight used to interpolate an j-th order<br>

                      maximum-likelihood estimate with lower-order

        estimates given that the (j-1)-gram context has been  seen<br>

                      with  a  frequency between i*m and (i+1)*m-1

        times.  (For contexts with frequency greater than M*m, the<br>

                      i=M weights are used.) <br>

        <br>

        <br>

        Andreas <br>

        <br>

      </font></font><br>

  </body>

</html>