<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">On 9/30/2013 10:46 PM, E wrote:<br>
</div>
<blockquote
cite="mid:8D08C80B82D9757-1094-339A7@webmail-d268.sysops.aol.com"
type="cite"><font color="black" face="arial" size="2">Hello,
<div><br>
</div>
<div>I'm trying to understand the meaning of "google.count.lm0"<span
style="font-family: Helvetica, Arial, sans-serif; font-size:
10pt;"> file as given in FAQ section on creating LM from
Web1T corpus. From what I read in Sec 11.4.1 Deleted
Interpolation Smoothing in Spoken Language Processing, by
Huang et al. </span></div>
<div><span style="font-family: Helvetica, Arial, sans-serif;
font-size: 10pt;">(equation 11.22) bigram case</span></div>
<div><span style="font-family: Helvetica, Arial, sans-serif;
font-size: 10pt;"><br>
</span></div>
<div><span style="font-family: Helvetica, Arial, sans-serif;
font-size: 10pt;">P(w_i | w_{i-1}) = \lambda * P_{MLE}(</span><span
style="font-family: Helvetica, Arial, sans-serif; font-size:
10pt;">w_i | w_{i-1}</span><span style="font-family:
Helvetica, Arial, sans-serif; font-size: 10pt;">) + (1 -
\lambda) * P(w_i)</span></div>
<div><span style="font-family: Helvetica, Arial, sans-serif;
font-size: 10pt;"><br>
</span></div>
<div><span style="font-family: Helvetica, Arial, sans-serif;
font-size: 10pt;">They call \lambda's as the mixture
weights. I wonder if they are conceptually the same as the
ones used in google.countlm. If so why are they arranged in
a 15x5 matrix? Where can I read more about the same? <br>
</span></div>
</font></blockquote>
<font size="2"><font face="arial"><br>
I don't have access to the book chapter you cite, but from the
equation it looks like a single fixed interpolation weight is
used.<br>
<br>
In the SRILM count-lm implementation you have separate lambdas
assigned to different groups of context ngrams, as a function of
the frequency of those contexts. This is what is called
"Jelinek-Mercer" smoothing in
<a class="moz-txt-link-freetext" href="http://acl.ldc.upenn.edu/P/P96/P96-1041.pdf">http://acl.ldc.upenn.edu/P/P96/P96-1041.pdf</a> , where the
bucketing of the contexts is done based on frequency (as
suggested in the paper). The specifics are spelled out in the
ngram(1) man page. The relevant bits are:<br>
<br>
mixweights M<br>
w01 w02 ... w0N<br>
w11 w12 ... w1N<br>
...<br>
wM1 wM2 ... wMN<br>
countmodulus m<br>
<br>
M specifies the number of mixture weight bins
(minus 1). m is<br>
the width of a mixture weight bin. Thus, wij is
the mixture weight used to interpolate an j-th order<br>
maximum-likelihood estimate with lower-order
estimates given that the (j-1)-gram context has been seen<br>
with a frequency between i*m and (i+1)*m-1
times. (For contexts with frequency greater than M*m, the<br>
i=M weights are used.) <br>
<br>
<br>
Andreas <br>
<br>
</font></font><br>
</body>
</html>