<font color='black' size='2' face='arial'>Hello,

<div><br>

</div>


<div>I'm trying to understand the meaning of "google.count.lm0"<span style="font-family: Helvetica, Arial, sans-serif; font-size: 10pt;"> file as given in FAQ section on creating LM from Web1T corpus. From what I read in Sec 11.4.1 Deleted Interpolation Smoothing in Spoken Language Processing, by Huang et al. </span></div>


<div><span style="font-family: Helvetica, Arial, sans-serif; font-size: 10pt;">(equation 11.22) bigram case</span></div>


<div><span style="font-family: Helvetica, Arial, sans-serif; font-size: 10pt;"><br>

</span></div>


<div><span style="font-family: Helvetica, Arial, sans-serif; font-size: 10pt;">P(w_i | w_{i-1}) = \lambda * P_{MLE}(</span><span style="font-family: Helvetica, Arial, sans-serif; font-size: 10pt;">w_i | w_{i-1}</span><span style="font-family: Helvetica, Arial, sans-serif; font-size: 10pt;">) + (1 - \lambda) * P(w_i)</span></div>


<div><span style="font-family: Helvetica, Arial, sans-serif; font-size: 10pt;"><br>

</span></div>


<div><span style="font-family: Helvetica, Arial, sans-serif; font-size: 10pt;">They call \lambda's as the mixture weights. I wonder if they are conceptually the same as the ones used in google.countlm. If so why are they arranged in a 15x5 matrix? Where can I read more about the same? </span></div>


<div><span style="font-family: Helvetica, Arial, sans-serif; font-size: 10pt;"><br>

</span></div>


<div><span style="font-family: Helvetica, Arial, sans-serif; font-size: 10pt;">Thanks.</span></div>

</font>