<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">On 8/2/2012 2:30 AM, Meng Chen wrote:<br>

    </div>

    <blockquote

cite="mid:CA+bc0mq1Sa8_KsSuHQ9LMv+GP=8+SVVhu16hRDdareVcQbzJWw@mail.gmail.com"

      type="cite">Hi, I am training LM using <b>make-batch-counts</b>,

      <b>merge-batch-counts</b> and <b>make-big-lm</b>. I compared the

      modified Kneser-Ney and Good-Turing smoothing algorithm in <b>make-big-lm</b>,

      and found that the training speed is much slower by modified

      Kneser-Ney. I checked the debug information, and found that it run

      <b>make-kn-counts</b> and <b>merge-batch-counts</b>, which cost

      most of the time. I wonder if the extra two steps could run in <b>make-batch-counts</b>,

      so it could save much time.</blockquote>

    KN is slower because it has to first compute the regular ngram

    counts, then, in a second pass, make-kn-counts, which takes the

    merged ngram counts as input.  Because the counts have to be merged

    first (you are counting the ngram types, not the token frequencies)

    you need to do it in this order.<br>

    <br>

    Andreas<br>

    <br>

  </body>

</html>