<html>

  <head>

    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">On 9/25/2014 11:28 PM, Максим

      Кореневский wrote:<br>

    </div>

    <blockquote cite="mid:1411712882.492526412@f356.i.mail.ru"

      type="cite">

      Hi, all,<br>

      <br>

      I use lattice-tool.exe to convert word lattices (in HTK-like SLF

      format) obtained from recognition pass into a word confusion

      networks (meshes). SLFs contains both acoustic and language model

      scores and lm_scale parameter (used by recognizer) in its header.

      Word insertion penalty was set to 0.<br>

      <br>

      When I scale both acoustic and LM scores with a constant factor C,

      I see that the 1-best path through mesh depends strongly on it.

      When C is large the mesh 1-best sentence coincides to word lattice

      1-best sentence (which is in turn recognizer 1-best output), but

      when C goes down to zero, WER of mesh 1-best sequence increases

      monotonically.<br>

    </blockquote>

    What you're seeing is expected.   In fact, the scaling of of scores

    can be achieved using the lattice-tool -posterior-scale option, you

    don't have to do it yourself by manipulating the scores in the

    lattices.  <br>

    <br>

           -posterior-scale S<br>

                  Scale the transition weights by dividing by S for the

    purpose of<br>

                  posterior probability computation.  If the input

    weights  repre-<br>

                  sent combined acoustic-language model scores then this

    should be<br>

                  approximately the language model weight  of  the 

    recognizer  in<br>

                  order  to  avoid  overly peaked posteriors (the

    default value is<br>

                  8).<br>

    <br>

    <br>

    <blockquote cite="mid:1411712882.492526412@f356.i.mail.ru"

      type="cite"> I believed that optimal value of this factor should

      be about 1/lm_scale (as proposed in several papers, for example,

      "Confidence measures for Large Vocabulary Speech Recognition" by

      F.Wessel et al., 2001), but I observe an average WER increase

      about 5% absolute over large number of files for such factor

      value.<br>

    </blockquote>

    Now the default posterior-scale (see above) is equal to the LM score

    weight,  just as advocated in the paper you mention.  BTW, the

    rationale for this choice can be found in our earlier work on

    expected error minimization, e.g., in section 3.6 of <a

href="http://www.speech.sri.com/cgi-bin/run-distill?ftp:papers/eurospeech99-consensus.ps.gz">this

      paper</a>.<br>

    So if you are scaling the scores yourself and also use the default

    -posterior-scale then you would end up with the wrong scaling.<br>

    <br>

    If you are not seeing a lower WER using the default posterior

    scaling then you probably won't see a gain from confusion networks

    on your task. This could be for various reasons, e..g, the lattices

    are too thin, or the utterances too short.<br>

    <br>

    Andreas<br>

    <br>

  </body>

</html>