<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <div class="moz-cite-prefix">On 4/28/2014 7:38 PM, Ismail Rusli

      wrote:<br>

    </div>

    <blockquote cite="mid:535F10B4.5060101@gmail.com" type="cite">

      <meta content="text/html; charset=ISO-8859-1"

        http-equiv="Content-Type">

      <div class="moz-cite-prefix">Thanks for the answer, Andreas.<br>

        <br>

        As i read paper by<br>

        Chen and Goodman (1999), they used held-out data<br>

        to optimize parameters in language model. How do i<br>

        do this in SRILM? Does SRILM optimize parameters<br>

        when i use -kndiscount?</div>

    </blockquote>

    SRILM just uses the formulas for estimating the discounts from the

    count-of-counts, i.e., equations (26) in the <a

href="http://www.speech.sri.com/projects/srilm/manpages/pdfs/chen-goodman-tr-10-98.pdf">Chen

      & Goodman technical report</a>.<br>

    <br>

    <blockquote cite="mid:535F10B4.5060101@gmail.com" type="cite">

      <div class="moz-cite-prefix"> I tried -kn to save <br>

        parameters in a file and included this file <br>

        when building LM but it turned out<br>

        my perplexity is getting bigger.<br>

      </div>

    </blockquote>

    You can save the discounting parameters using:<br>

    <br>

    1)      ngram-count -read COUNTS -kndiscount -kn1 K1 -kn2 K2 -kn3 K3<br>

    (no -lm argument!)<br>

    <br>

    Then you can read them back in for LM estimation using <br>

    <br>

    2)    ngram-count -read COUNTS -kndiscount -kn1 K1 -kn2 K2 -kn3 K3

    -lm LM<br>

    <br>

    and the result will be identical to the second command when run

    without -kn1/2/3 options.<br>

    <br>

    Now, if you want you can manipulate the discounting parameters

    before invoking command 2.<br>

    For example, you could perform a search over the D1, D2, D3

    parameters optimizing perplexity on a held-out set, just like

    C&G did.  But you have to implement that search yourself by

    writing some wrapper scripts.<br>

    <br>

    Also consider the interpolated version of KN smoothing.   Just add

    the ngram-count -interpolate option, it usually gives slightly

    better results.<br>

    <blockquote cite="mid:535F10B4.5060101@gmail.com" type="cite">

      <div class="moz-cite-prefix"> <br>

        And just one more,<br>

        do you have a link to good tutorial in using<br>

        class-based models with SRILM?<br>

      </div>

    </blockquote>

    There is a basic tutorial at

    <a class="moz-txt-link-freetext" href="http://ssli.ee.washington.edu/ssli/people/sarahs/srilm.html">http://ssli.ee.washington.edu/ssli/people/sarahs/srilm.html</a> .<br>

    <br>

    Andreas<br>

    <br>

    <br>

  </body>

</html>