<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <div class="moz-cite-prefix">Right, thanks Andreas.<br>

      It's getting clearer to me now.<br>

      <br>

      Regards,<br>

      Ismail<br>

      <br>

      <br>

      On 04/30/2014 01:39 PM, Andreas Stolcke wrote:<br>

    </div>

    <blockquote cite="mid:53609A88.1050204@icsi.berkeley.edu"

      type="cite">

      <meta content="text/html; charset=ISO-8859-1"

        http-equiv="Content-Type">

      <div class="moz-cite-prefix">On 4/28/2014 7:38 PM, Ismail Rusli

        wrote:<br>

      </div>

      <blockquote cite="mid:535F10B4.5060101@gmail.com" type="cite">

        <meta content="text/html; charset=ISO-8859-1"

          http-equiv="Content-Type">

        <div class="moz-cite-prefix">Thanks for the answer, Andreas.<br>

          <br>

          As i read paper by<br>

          Chen and Goodman (1999), they used held-out data<br>

          to optimize parameters in language model. How do i<br>

          do this in SRILM? Does SRILM optimize parameters<br>

          when i use -kndiscount?</div>

      </blockquote>

      SRILM just uses the formulas for estimating the discounts from the

      count-of-counts, i.e., equations (26) in the <a

        moz-do-not-send="true"

href="http://www.speech.sri.com/projects/srilm/manpages/pdfs/chen-goodman-tr-10-98.pdf">Chen


        & Goodman technical report</a>.<br>

      <br>

      <blockquote cite="mid:535F10B4.5060101@gmail.com" type="cite">

        <div class="moz-cite-prefix"> I tried -kn to save <br>

          parameters in a file and included this file <br>

          when building LM but it turned out<br>

          my perplexity is getting bigger.<br>

        </div>

      </blockquote>

      You can save the discounting parameters using:<br>

      <br>

      1)      ngram-count -read COUNTS -kndiscount -kn1 K1 -kn2 K2 -kn3

      K3<br>

      (no -lm argument!)<br>

      <br>

      Then you can read them back in for LM estimation using <br>

      <br>

      2)    ngram-count -read COUNTS -kndiscount -kn1 K1 -kn2 K2 -kn3 K3

      -lm LM<br>

      <br>

      and the result will be identical to the second command when run

      without -kn1/2/3 options.<br>

      <br>

      Now, if you want you can manipulate the discounting parameters

      before invoking command 2.<br>

      For example, you could perform a search over the D1, D2, D3

      parameters optimizing perplexity on a held-out set, just like

      C&G did.  But you have to implement that search yourself by

      writing some wrapper scripts.<br>

      <br>

      Also consider the interpolated version of KN smoothing.   Just add

      the ngram-count -interpolate option, it usually gives slightly

      better results.<br>

      <blockquote cite="mid:535F10B4.5060101@gmail.com" type="cite">

        <div class="moz-cite-prefix"> <br>

          And just one more,<br>

          do you have a link to good tutorial in using<br>

          class-based models with SRILM?<br>

        </div>

      </blockquote>

      There is a basic tutorial at <a moz-do-not-send="true"

        class="moz-txt-link-freetext"

        href="http://ssli.ee.washington.edu/ssli/people/sarahs/srilm.html">http://ssli.ee.washington.edu/ssli/people/sarahs/srilm.html</a>

      .<br>

      <br>

      Andreas<br>

      <br>

      <br>

    </blockquote>

    <br>

  </body>

</html>