Hi Andres,<br><br>Thank you very much! <br>I will test more.<br><br>Regards,<br>Yuan<br><br><div class="gmail_quote">On Wed, Oct 17, 2012 at 1:52 PM, Andreas Stolcke <span dir="ltr"><<a href="mailto:stolcke@icsi.berkeley.edu" target="_blank">stolcke@icsi.berkeley.edu</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"><div><div class="h5">
<div>On 10/16/2012 5:33 PM, yuan liang
wrote:<br>
</div>
<blockquote type="cite">Hi Andreas,<br>
<br>
Thank you very much!<br>
<br>
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
2) I used a Trigram in FLM format to rescore "Lattice_1":<br>
<br>
First I converted all word nodes (HTk format) to FLM
representation;<br>
<br>
Then rescored with:<br>
<br>
" lattice-tool -in-lattice Lattice_1 -unk -vocab
[voc_file] -read-htk -no-nulls -no-htk-nulls -factored
-lm [FLM_specification_file] -htk-lmscale 15
-htk-logbase 2.71828183 -posterior-scale 15 -write-htk
-out-lattice Lattice_3"<br>
<br>
I think "Lattice_2" and "Lattice_3" should be the same,
since the perplexity of using Trigram and using Trigram in
FLM format are same. However, they are different. Did I
miss something?<br>
</blockquote>
<br>
</div>
This is a question about the equivalent encoding of standard
word-based LMs as FLMs, and I'm not an expert here.<br>
However, as a sanity check, I would first do a simple
perplexity computation (ngram -debug 2 -ppl) with both models
on some test set and make sure you get the same word-for-word
conditional probabilities. If not, you can spot where the
differences are and present a specific case of different
probabilities to the group for debugging.
<div>
<br>
</div>
<span><font color="#888888">
<br>
</font></span></blockquote>
</div>
Actually I did the perplexity test on a test set of 6564 sentences
(72854 words). The total perplexity are the same using standard
word-based Trigram LM as using FLM Trigram. Also I checked the
details of the word-for-word conditional probability, for these
72854 words, only 442 words' conditional probabilities are not
exactly the same, others are exactly the same. However the
probability difference is negligible ( like 0.00531048 and
0.00531049, 5.38809e-07 and 5.38808e-07 ). So I thought we can say
both models can get the same word-for-word conditional
probabilities.<br>
<br>
I also considered probably it's because of the FLM format, lattice
expanding with standard Trigram is seems different with FLM
Trigram, using FLM Trigram lattice expanded around 300 times
larger than using standard Trigram, maybe the expanding way is
different. I'm not sure, I still need to investigate more.<br>
</blockquote>
<br></div></div>
The lattice expansion algorithm makes use of the backoff structure
of the standard LM to minimize the number of nodes that need to be
duplicated to correctly apply the probabilities. The FLM makes more
conservative assumptions and always assumes you need two words of
context, leading to more nodes after expansion. That would explain
the size difference.<br>
<br>
You can also check the probabilities in expanded lattices. The
command<br>
<br>
lattice-tool -in-lattice LATTICE -ppl TEXT -debug 2 ...<br>
<br>
will compute the probabilities assigned to the words in TEXT by
traversing the lattice. It is worth checking first that expansion
with FLMs yields the right probabilities.<br>
<br>
You say that viterbi decoding gives almost the same results (this
suggests the expansion works correctly), but posterior (confusion
network) decoding doesn't. It is possible there is a problem with
building CNs from lattices with factored vocabularies. I don't
think I every tried that. It would help to find a minimal test case
that shows the problem.<span class="HOEnZb"><font color="#888888"><br>
<br>
Andreas</font></span><div class="im"><br>
<br>
<blockquote type="cite">
<br>
<br>
Thank you very much for your advices!<br>
<br>
Regards,<br>
Yuan<br>
</blockquote>
<br>
</div></div>
</blockquote></div><br>