Beginning and end of sentences tags

SAI TANG HUANG sai_tang_huang at hotmail.com
Wed Oct 8 09:40:11 PDT 2008


Hi Andreas, could you unsubscribe me from this mailing list please? Thanks a lot for all your help in the past.
 
Regards,
 
Sai> Date: Wed, 8 Oct 2008 15:29:47 +0200> From: gwenole.lecorve at irisa.fr> To: stolcke at speech.sri.com> CC: srilm-user at speech.sri.com> Subject: Re: Beginning and end of sentences tags> > Thank you for this quick and precise answer.> However, when I launch my command (see below), I still do not get back > the same lattice structure.> command :> > lattice-tool -in-lattice /path/to/input.lat -out-lattice > > /path/to/output.lat> > -lm $LM> > -htk-logbase 2.71828> > -write-htk> > -read-htk> > -print-sent-tags> > -htk-logzero '-99'> > -no-htk-nulls> > -htk-words-on-nodes> > Then the result is as follows :> > # Header (generated by SRILM)> > VERSION=1.1> > UTTERANCE=/path/to/one.spf> > base=2.71828> > dir=f> > vocab=/path/to/dic> > start=0> > end=1> > NODES=6 LINKS=5> > # Nodes> > I=0 W=<s> t=0> > I=1 W=</s> t=1.08> > I=2 W=le t=0.14 v=1> > I=3 W=chien t=0.33 v=1> > I=4 W=miaule t=0.83 v=1> > I=5 W=</s> t=1.08> > # Links> > J=0 S=0 E=2 a=-55.36 l=-2.74741> > J=1 S=2 E=3 a=-72.28 l=-8.60446> > J=2 S=3 E=4 a=-72.28 l=-inf> > J=3 S=4 E=5 a=-91.5701 l=-2.87136> > J=4 S=5 E=1 l=-2.87136> > > I notice 2 things :> 1/ Evenif if !NULL are replaced by the sentence start/end tags, one more > "eos" tag is added at the end of the lattice. Isn't it a problem since a > P(</s>|</s>) would then be considered while computing the posteriors ? > When writing words on edges the problem is the same (whereas the "bos" > tag dissapears).> 2/ Despite the "-htk-logzero -99" option, "-inf" is still returned. > After a few additional experiments, it appears that the "-htk-logzero" > option works when, for example, no LM rescoring is applied or when the > "-no-expansion" option is enabled.> > I may misuse the lattice-tool command but I do not see how to preserve > the original lattice structure (eventhough I know that SRILM converts > HTK lattices into its own format and that my goal is maybe unreachable > :-) ).> > Best regards,> Gwénolé Lecorvé.> > Andreas Stolcke a écrit :> > Gwénolé Lecorvé wrote:> >> Hi,> >>> >> I'm currently trying to rescore language scores of lattices generated > >> using the HTK toolkit and personal tools.> >> Here is an example of lattice to be rescored :> >>> VERSION=1.0> >>> UTTERANCE=/path/to/one.spf> >>> acscale=1.00> >>> vocab=/path/to/dic> >>> N=290 L=942> >>> I=0 t=0.00 W=<s>> >>> I=1 t=0.14 W=le v=1> >>> I=2 t=0.33 W=chien v=1> >>> I=3 t=0.83 W=miaule v=1> >>> I=4 t=1.08 W=</s>> >>> J=0 S=0 E=1 a=-55.36 l=-2973.43> >>> J=1 S=1 E=2 a=-72.28 l=-48.43> >>> J=2 S=2 E=3 a=-72.28 l=-87.30> >>> J=3 S=3 E=4 a=-91.57 l=-145.72> >> You can notice that the tags for beginning/end of sentence are present.> >> My problem is that once I launch lattice-tool (with > >> -htk-words-on-nodes and -no-htk-nulls) on such a lattice the results > >> (HTK format) looks like this :> >>> # Header (generated by SRILM)> >>> VERSION=1.1> >>> UTTERANCE=/path/to/one.spf> >>> base=2.71828> >>> dir=f> >>> vocab=/path/to/di> >>> start=0> >>> end=1> >>> NODES=6 LINKS=5> >>> # Nodes> >>> I=0 W=!NULL t=0> >>> I=1 W=!NULL t=1.08> >>> I=2 W=le t=0.14 v=1> >>> I=3 W=chien t=0.33 v=1> >>> I=4 W=miaule t=0.83 v=1> >>> I=5 W=!NULL t=1.08> >>> # Links> >>> J=0 S=0 E=2 a=-55.36 l=-2.74741> >>> J=1 S=2 E=3 a=-72.28 l=-9.61595> >>> J=2 S=3 E=4 a=-72.28 l=-inf> >>> J=3 S=4 E=5 a=-91.5701 l=-2.87136> >>> J=4 S=5 E=1 l=-2.87136> >> Something strange happens : the "bos" and "eos" tags disappear and > >> !NULL tags are introduced instead.> >> Why aren't the "bos" and "eos" printed anymore and why are these > >> !NULL tagged considered insteand ?> >> Can't I just keep the same lattice structure as the one given in input ?> >>> >> I'm facing this problem since several months and still did not find > >> any solution. I would be really grateful if you help me.> > <s> and </s> are replaced by !NULL because they are not necessary, > > since the start/end of sentence are implicit in the lattice structure.> > For example, when rescoring the lattice with an LM the initial node is > > implicitly treat as the <s> context.> >> > However, I can see how you would want to preserve these tags for some > > applications.> > If you download the beta version of srilm you will find a new option: > > lattice-tool -print-sent-tags will output <s> and </s> in the lattice > > format (both HTK and PFSG).> >> > Andreas> >>> >> Regards,> >> Gwénolé Lecorvé.> >> >> 
_________________________________________________________________
¡Entra en el Club oficial de Messenger y te enterarás de todas las novedades! 
http://www.vivelive.com/ilovemessenger
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20081008/fcf9dd5e/attachment.html>


More information about the SRILM-User mailing list