<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">On 9/25/2014 11:28 PM, Максим
Кореневский wrote:<br>
</div>
<blockquote cite="mid:1411712882.492526412@f356.i.mail.ru"
type="cite">
Hi, all,<br>
<br>
I use lattice-tool.exe to convert word lattices (in HTK-like SLF
format) obtained from recognition pass into a word confusion
networks (meshes). SLFs contains both acoustic and language model
scores and lm_scale parameter (used by recognizer) in its header.
Word insertion penalty was set to 0.<br>
<br>
When I scale both acoustic and LM scores with a constant factor C,
I see that the 1-best path through mesh depends strongly on it.
When C is large the mesh 1-best sentence coincides to word lattice
1-best sentence (which is in turn recognizer 1-best output), but
when C goes down to zero, WER of mesh 1-best sequence increases
monotonically.<br>
</blockquote>
What you're seeing is expected. In fact, the scaling of of scores
can be achieved using the lattice-tool -posterior-scale option, you
don't have to do it yourself by manipulating the scores in the
lattices. <br>
<br>
-posterior-scale S<br>
Scale the transition weights by dividing by S for the
purpose of<br>
posterior probability computation. If the input
weights repre-<br>
sent combined acoustic-language model scores then this
should be<br>
approximately the language model weight of the
recognizer in<br>
order to avoid overly peaked posteriors (the
default value is<br>
8).<br>
<br>
<br>
<blockquote cite="mid:1411712882.492526412@f356.i.mail.ru"
type="cite"> I believed that optimal value of this factor should
be about 1/lm_scale (as proposed in several papers, for example,
"Confidence measures for Large Vocabulary Speech Recognition" by
F.Wessel et al., 2001), but I observe an average WER increase
about 5% absolute over large number of files for such factor
value.<br>
</blockquote>
Now the default posterior-scale (see above) is equal to the LM score
weight, just as advocated in the paper you mention. BTW, the
rationale for this choice can be found in our earlier work on
expected error minimization, e.g., in section 3.6 of <a
href="http://www.speech.sri.com/cgi-bin/run-distill?ftp:papers/eurospeech99-consensus.ps.gz">this
paper</a>.<br>
So if you are scaling the scores yourself and also use the default
-posterior-scale then you would end up with the wrong scaling.<br>
<br>
If you are not seeing a lower WER using the default posterior
scaling then you probably won't see a gain from confusion networks
on your task. This could be for various reasons, e..g, the lattices
are too thin, or the utterances too short.<br>
<br>
Andreas<br>
<br>
</body>
</html>