<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <div class="moz-cite-prefix">On 1/8/2013 6:07 PM, Marta Ruiz wrote:<br>

    </div>

    <blockquote

cite="mid:CABEBqHJ798PkMfSe_DJ9YLFASabk1S8Wk65nJHZgdtfoJ8tSpQ@mail.gmail.com"

      type="cite">Thanks Andreas, two more questions<br>

      <div class="gmail_quote">

        <blockquote class="gmail_quote" style="margin:0 0 0

          .8ex;border-left:1px #ccc solid;padding-left:1ex">

          <br>

          1. Create a word-based version of each model.  For example,

          you can construct a POS-based LM and combine it with a class

          membership mapping (in classes-format, see man page) to get a

          word-level POS-based model.   Similar with lemma-based LMs

          (the lemmas are effectively word classes).<br>

          <br>

        </blockquote>

        <div><br>

          which is the instruction to do this?<br>

        </div>

      </div>

    </blockquote>

    <br>

    1. You create the class-to-word mapping file (in the format

    described <a

href="http://www.speech.sri.com/projects/srilm/manpages/classes-format.5.html">here</a>)

    to reflect either your POS-to-word or lemma-to-word mapping.<br>

    2. Process the training data to replace the words with POS or

    lemmas, as appropriate.<br>

    3. Train the ngram portion of the LM by running ngram-count on the

    training data represented as a sequence of POS tags / lemmas (from

    step 2).<br>

    <br>

    <br>

    <blockquote

cite="mid:CABEBqHJ798PkMfSe_DJ9YLFASabk1S8Wk65nJHZgdtfoJ8tSpQ@mail.gmail.com"

      type="cite">

      <div class="gmail_quote">

        <div> </div>

        <blockquote class="gmail_quote" style="margin:0 0 0

          .8ex;border-left:1px #ccc solid;padding-left:1ex">

          2. Then interpolate the models using<br>

          <br>

              ngram -bayes 0 -lm LM1 -mix-lm LM2 -mix-lm2 LM3 ....

          -lambda ... -mix-lambda2 ... -classes CLASSES<br>

          <br>

          where CLASSES is a classes-format(5) file defining the union

          of all the word classes used in the various component models.<span

            class="HOEnZb"><font color="#888888"><br>

              <br>

            </font></span></blockquote>

        <div><br>

          to find the lambdas can I use the compute-best-mix, can't I?<br>

        </div>

      </div>

    </blockquote>

    Exactly.<br>

    <br>

    Andreas<br>

    <br>

  </body>

</html>