<html>
<head>
<style>
.hmmessage P
{
margin:0px;
padding:0px
}
body.hmmessage
{
FONT-SIZE: 10pt;
FONT-FAMILY:Tahoma
}
</style>
</head>
<body class='hmmessage'>
Hi Andreas, could you unsubscribe me from this mailing list please? Thanks a lot for all your help in the past.<BR>
<BR>
Regards,<BR>
<BR>
Sai<BR><BR>> Date: Wed, 8 Oct 2008 15:29:47 +0200<BR>> From: gwenole.lecorve@irisa.fr<BR>> To: stolcke@speech.sri.com<BR>> CC: srilm-user@speech.sri.com<BR>> Subject: Re: Beginning and end of sentences tags<BR>> <BR>> Thank you for this quick and precise answer.<BR>> However, when I launch my command (see below), I still do not get back <BR>> the same lattice structure.<BR>> command :<BR>> > lattice-tool -in-lattice /path/to/input.lat -out-lattice <BR>> > /path/to/output.lat<BR>> > -lm $LM<BR>> > -htk-logbase 2.71828<BR>> > -write-htk<BR>> > -read-htk<BR>> > -print-sent-tags<BR>> > -htk-logzero '-99'<BR>> > -no-htk-nulls<BR>> > -htk-words-on-nodes<BR>> <BR>> Then the result is as follows :<BR>> > # Header (generated by SRILM)<BR>> > VERSION=1.1<BR>> > UTTERANCE=/path/to/one.spf<BR>> > base=2.71828<BR>> > dir=f<BR>> > vocab=/path/to/dic<BR>> > start=0<BR>> > end=1<BR>> > NODES=6 LINKS=5<BR>> > # Nodes<BR>> > I=0 W=<s> t=0<BR>> > I=1 W=</s> t=1.08<BR>> > I=2 W=le t=0.14 v=1<BR>> > I=3 W=chien t=0.33 v=1<BR>> > I=4 W=miaule t=0.83 v=1<BR>> > I=5 W=</s> t=1.08<BR>> > # Links<BR>> > J=0 S=0 E=2 a=-55.36 l=-2.74741<BR>> > J=1 S=2 E=3 a=-72.28 l=-8.60446<BR>> > J=2 S=3 E=4 a=-72.28 l=-inf<BR>> > J=3 S=4 E=5 a=-91.5701 l=-2.87136<BR>> > J=4 S=5 E=1 l=-2.87136<BR>> <BR>> <BR>> I notice 2 things :<BR>> 1/ Evenif if !NULL are replaced by the sentence start/end tags, one more <BR>> "eos" tag is added at the end of the lattice. Isn't it a problem since a <BR>> P(</s>|</s>) would then be considered while computing the posteriors ? <BR>> When writing words on edges the problem is the same (whereas the "bos" <BR>> tag dissapears).<BR>> 2/ Despite the "-htk-logzero -99" option, "-inf" is still returned. <BR>> After a few additional experiments, it appears that the "-htk-logzero" <BR>> option works when, for example, no LM rescoring is applied or when the <BR>> "-no-expansion" option is enabled.<BR>> <BR>> I may misuse the lattice-tool command but I do not see how to preserve <BR>> the original lattice structure (eventhough I know that SRILM converts <BR>> HTK lattices into its own format and that my goal is maybe unreachable <BR>> :-) ).<BR>> <BR>> Best regards,<BR>> Gwénolé Lecorvé.<BR>> <BR>> Andreas Stolcke a écrit :<BR>> > Gwénolé Lecorvé wrote:<BR>> >> Hi,<BR>> >><BR>> >> I'm currently trying to rescore language scores of lattices generated <BR>> >> using the HTK toolkit and personal tools.<BR>> >> Here is an example of lattice to be rescored :<BR>> >>> VERSION=1.0<BR>> >>> UTTERANCE=/path/to/one.spf<BR>> >>> acscale=1.00<BR>> >>> vocab=/path/to/dic<BR>> >>> N=290 L=942<BR>> >>> I=0 t=0.00 W=<s><BR>> >>> I=1 t=0.14 W=le v=1<BR>> >>> I=2 t=0.33 W=chien v=1<BR>> >>> I=3 t=0.83 W=miaule v=1<BR>> >>> I=4 t=1.08 W=</s><BR>> >>> J=0 S=0 E=1 a=-55.36 l=-2973.43<BR>> >>> J=1 S=1 E=2 a=-72.28 l=-48.43<BR>> >>> J=2 S=2 E=3 a=-72.28 l=-87.30<BR>> >>> J=3 S=3 E=4 a=-91.57 l=-145.72<BR>> >> You can notice that the tags for beginning/end of sentence are present.<BR>> >> My problem is that once I launch lattice-tool (with <BR>> >> -htk-words-on-nodes and -no-htk-nulls) on such a lattice the results <BR>> >> (HTK format) looks like this :<BR>> >>> # Header (generated by SRILM)<BR>> >>> VERSION=1.1<BR>> >>> UTTERANCE=/path/to/one.spf<BR>> >>> base=2.71828<BR>> >>> dir=f<BR>> >>> vocab=/path/to/di<BR>> >>> start=0<BR>> >>> end=1<BR>> >>> NODES=6 LINKS=5<BR>> >>> # Nodes<BR>> >>> I=0 W=!NULL t=0<BR>> >>> I=1 W=!NULL t=1.08<BR>> >>> I=2 W=le t=0.14 v=1<BR>> >>> I=3 W=chien t=0.33 v=1<BR>> >>> I=4 W=miaule t=0.83 v=1<BR>> >>> I=5 W=!NULL t=1.08<BR>> >>> # Links<BR>> >>> J=0 S=0 E=2 a=-55.36 l=-2.74741<BR>> >>> J=1 S=2 E=3 a=-72.28 l=-9.61595<BR>> >>> J=2 S=3 E=4 a=-72.28 l=-inf<BR>> >>> J=3 S=4 E=5 a=-91.5701 l=-2.87136<BR>> >>> J=4 S=5 E=1 l=-2.87136<BR>> >> Something strange happens : the "bos" and "eos" tags disappear and <BR>> >> !NULL tags are introduced instead.<BR>> >> Why aren't the "bos" and "eos" printed anymore and why are these <BR>> >> !NULL tagged considered insteand ?<BR>> >> Can't I just keep the same lattice structure as the one given in input ?<BR>> >><BR>> >> I'm facing this problem since several months and still did not find <BR>> >> any solution. I would be really grateful if you help me.<BR>> > <s> and </s> are replaced by !NULL because they are not necessary, <BR>> > since the start/end of sentence are implicit in the lattice structure.<BR>> > For example, when rescoring the lattice with an LM the initial node is <BR>> > implicitly treat as the <s> context.<BR>> ><BR>> > However, I can see how you would want to preserve these tags for some <BR>> > applications.<BR>> > If you download the beta version of srilm you will find a new option: <BR>> > lattice-tool -print-sent-tags will output <s> and </s> in the lattice <BR>> > format (both HTK and PFSG).<BR>> ><BR>> > Andreas<BR>> >><BR>> >> Regards,<BR>> >> Gwénolé Lecorvé.<BR>> ><BR>> ><BR>> <BR><BR><br /><hr />ˇTrónchate de risa con los mejores capítulos de South Park en <a href='http://video.msn.com/?mkt=es-es&vid=42a9e969-45ad-4c31-bff9-57629e71fac8&playlist=videoByUuids:uuids:e1daa69b-331b-4e99-8325-c936aaee9f97%2Cad4cb47a-90a8-4b2c-88da-116d2059f4a2%2Cddaade1e-82b2-4914-8277-35e45198f5c6%2C42860247-e337-4384-94d8-fa444cb4aea7&from=hotmail&tab=s1207179071824' target='_new'>MSN Vídeo!</a></body>
</html>