<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">Right, thanks Andreas.<br>
It's getting clearer to me now.<br>
<br>
Regards,<br>
Ismail<br>
<br>
<br>
On 04/30/2014 01:39 PM, Andreas Stolcke wrote:<br>
</div>
<blockquote cite="mid:53609A88.1050204@icsi.berkeley.edu"
type="cite">
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
<div class="moz-cite-prefix">On 4/28/2014 7:38 PM, Ismail Rusli
wrote:<br>
</div>
<blockquote cite="mid:535F10B4.5060101@gmail.com" type="cite">
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
<div class="moz-cite-prefix">Thanks for the answer, Andreas.<br>
<br>
As i read paper by<br>
Chen and Goodman (1999), they used held-out data<br>
to optimize parameters in language model. How do i<br>
do this in SRILM? Does SRILM optimize parameters<br>
when i use -kndiscount?</div>
</blockquote>
SRILM just uses the formulas for estimating the discounts from the
count-of-counts, i.e., equations (26) in the <a
moz-do-not-send="true"
href="http://www.speech.sri.com/projects/srilm/manpages/pdfs/chen-goodman-tr-10-98.pdf">Chen
& Goodman technical report</a>.<br>
<br>
<blockquote cite="mid:535F10B4.5060101@gmail.com" type="cite">
<div class="moz-cite-prefix"> I tried -kn to save <br>
parameters in a file and included this file <br>
when building LM but it turned out<br>
my perplexity is getting bigger.<br>
</div>
</blockquote>
You can save the discounting parameters using:<br>
<br>
1) ngram-count -read COUNTS -kndiscount -kn1 K1 -kn2 K2 -kn3
K3<br>
(no -lm argument!)<br>
<br>
Then you can read them back in for LM estimation using <br>
<br>
2) ngram-count -read COUNTS -kndiscount -kn1 K1 -kn2 K2 -kn3 K3
-lm LM<br>
<br>
and the result will be identical to the second command when run
without -kn1/2/3 options.<br>
<br>
Now, if you want you can manipulate the discounting parameters
before invoking command 2.<br>
For example, you could perform a search over the D1, D2, D3
parameters optimizing perplexity on a held-out set, just like
C&G did. But you have to implement that search yourself by
writing some wrapper scripts.<br>
<br>
Also consider the interpolated version of KN smoothing. Just add
the ngram-count -interpolate option, it usually gives slightly
better results.<br>
<blockquote cite="mid:535F10B4.5060101@gmail.com" type="cite">
<div class="moz-cite-prefix"> <br>
And just one more,<br>
do you have a link to good tutorial in using<br>
class-based models with SRILM?<br>
</div>
</blockquote>
There is a basic tutorial at <a moz-do-not-send="true"
class="moz-txt-link-freetext"
href="http://ssli.ee.washington.edu/ssli/people/sarahs/srilm.html">http://ssli.ee.washington.edu/ssli/people/sarahs/srilm.html</a>
.<br>
<br>
Andreas<br>
<br>
<br>
</blockquote>
<br>
</body>
</html>