<html>

  <head>


    <meta http-equiv="content-type" content="text/html; charset=UTF-8">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    Hi, I have two questions:<br>

    <br>

    1. If I generate the language model with Kneser-Ney smoothing (or

    Modified Kneser-Ney), why do the parameter "-gtnmin" apply to

    already modified counts? <br>

    <blockquote class="gmail_quote" style="margin:0 0 0

      .8ex;border-left:1px #ccc solid;padding-left:1ex">

      <div>For example, if in the training data 2-gram "markov model"

        occurs only in the context "hidden markov model" and gt2min = 2,

        then the modified count for "markov model" = n(* markov model) =

        1 < gt2min and <br>

        prob("markov model") = bow("markov")*prob("model"). <br>

        Instead of  prob("markov model") = ( n(* markov model)  - D)/

        n(* markov *) ;<br>

        <br>

        2. Let say I use ngram-count to generate the language model as

        following: <br>

        ngram-count -text text.txt -vocab vocab.txt -gt1min 5 -lm sri.lm<br>

        Let the word "hello" exists in "vocab.txt" and occurs 4 times in

        "text.txt". Then probability of "hello" is calculated as 

        probability of zerotone. Is it correct?<br>

        <div><br>

        </div>

      </div>

    </blockquote>

    <br>

    <pre class="moz-signature" cols="72">Thanks

Anna Bulusheva

</pre>

  </body>

</html>