<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">You need to run a few sanity checks to

      make sure things are working as you expect them to.<br>

      <br>

      1.  Decode 1-best from the HTK lattice WITHOUT rescoring.  The

      results should be the same as from the HTK decoder.  If not there

      might be a difference in the LM scaling factor, and you may have

      to adjust is via the command line option. There might also be

      issues with the CTM output and conversion back to MLF.  <br>

      <br>

      2. Rescore the lattices with the same LM that is used in the HTK

      decoder.   Again, the results should be essentially identical.<br>

      I'm not familiar with the bigram format used by HTK, but you may

      have to convert it to ARPA format.<br>

      <br>

      3. Then try rescoring with a trigram.<br>

      <br>

      Approaching your goal in steps hopefully will help you pinpoint

      the problem(s).<br>

      <br>

      Andreas<br>

      <br>

      On 11/22/2012 5:06 AM, Dmytro Prylipko wrote:<br>

    </div>

    <blockquote cite="mid:50AE235E.7060400@ovgu.de" type="cite">

      <meta http-equiv="content-type" content="text/html;

        charset=ISO-8859-1">

      Hi,<br>

      <br>

      I found that the accuracy of the recognition results obtained with

      HVite is about 5% better with comparison to the hypothesis got

      after rescoring the lattices with lattice-tool.<br>

      <br>

      HVite do not really use an N-gram, it is a word net, but I cannot

      really figure out why does it work so much better than SRILM

      models.<br>

      <br>

      I use the following script to generate lattices (60-best):<br>

      <br>

      <font face="Courier New">HVite -A -T 1 \<br>

        -C GENLATTICES.conf \<br>

        -n 20 60 \<br>

        -l outLatDir \ <br>

        -z lat \<br>

        -H hmmDefs \<br>

        -S test.list \<br>

        -i out.bigram.HLStats.mlf \<br>

        -w bigram.HLStats.lat \<br>

        -p 0.0 \<br>

        -s 8.0 \<br>

        lexicon \<br>

        hmm.mono.list</font><br>

      <br>

      Which are then rescored with:<br>

      <br>

      <font face="Courier New">lattice-tool \<br>

        -read-htk \<br>

        -write-htk \<br>

        -htk-lmscale 10.0 \<br>

        -htk-words-on-nodes \<br>

        -order 3 \<br>

        -in-lattice-list srclat.list \<br>

        -out-lattice-dir rescoredLatDir \<br>

        -lm trigram.SRILM.lm \<br>

        -overwrite<br>

        <br>

        find rescoredLatDir -name "*.lat" > rescoredLat.list<br>

        <br>

        lattice-tool \<br>

        -read-htk \<br>

        -write-htk \<br>

        -htk-lmscale 10.0 \<br>

        -htk-words-on-nodes \<br>

        -order 3  \<br>

        -in-lattice-list rescoredLat.list\<br>

        -viterbi-decode \<br>

        -output-ctm | ctm2mlf_r > out.trigram.SRILM.mlf</font><br>

      <br>

      Decoded with HVite (92.86%):<br>

      <br>

      <font face="Courier New"> LAB: <A> wie sieht es aus mit

        einem weiteren zweitaegigen mit einer weiteren zweitaegigen

        arbeitssitzu <br>

         REC: <A> wie sieht es aus mit einem weiteren zweitaegigen

        in  einer weiteren zweitaegigen arbeitssitzu</font><br>

      <br>

      ... and with lattice-tool (64.29%):<br>

      <br>

      <font face="Courier New"> LAB: <A> wie sieht es aus mit

        einem weiteren zweitaegigen mit  einer weiteren zweitaegigen

        arbeitssitzu<br>

         REC: <A> wie sieht es aus mit einen weiteren zweitaegigen

        dann bei   einem    zweitaegigen arbeitssitzung</font><br>

      <br>

      Corresponding word nets and LMs have been built using the same

      vocabulary and training data. I should say that for some sentences

      SRILM outperforms HTK, but in general it is roughly 5-7% behind.<br>

      Could you please suggest why is it so? Maybe some parameter values

      are wrong?<br>

      Or should it be like this?<br>

      <br>

      I would be greatly appreciated for help.<br>

      <br>

      Yours,<br>

      Dmytro Prylipko.<br>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

SRILM-User site list

<a class="moz-txt-link-abbreviated" href="mailto:SRILM-User@speech.sri.com">SRILM-User@speech.sri.com</a>

<a class="moz-txt-link-freetext" href="http://www.speech.sri.com/mailman/listinfo/srilm-user">http://www.speech.sri.com/mailman/listinfo/srilm-user</a></pre>

    </blockquote>

    <br>

  </body>

</html>