Beginning and end of sentences tags

Andreas Stolcke stolcke at speech.sri.com
Tue Oct 7 21:41:43 PDT 2008


Gwénolé Lecorvé wrote:
> Hi,
>
> I'm currently trying to rescore language scores of lattices generated 
> using the HTK toolkit and personal tools.
> Here is an example of lattice to be rescored :
>> VERSION=1.0
>> UTTERANCE=/path/to/one.spf
>> acscale=1.00
>> vocab=/path/to/dic
>> N=290  L=942
>> I=0    t=0.00  W=<s>
>> I=1    t=0.14  W=le                 v=1
>> I=2    t=0.33  W=chien                 v=1
>> I=3    t=0.83  W=miaule                 v=1
>> I=4    t=1.08  W=</s>
>> J=0     S=0    E=1    a=-55.36    l=-2973.43
>> J=1     S=1    E=2    a=-72.28    l=-48.43
>> J=2     S=2    E=3    a=-72.28    l=-87.30
>> J=3     S=3    E=4    a=-91.57    l=-145.72
> You can notice that the tags for beginning/end of sentence are present.
> My problem is that once I launch lattice-tool (with 
> -htk-words-on-nodes and -no-htk-nulls) on such a lattice the results 
> (HTK format) looks like this :
>> # Header (generated by SRILM)
>> VERSION=1.1
>> UTTERANCE=/path/to/one.spf
>> base=2.71828
>> dir=f
>> vocab=/path/to/di
>> start=0
>> end=1
>> NODES=6 LINKS=5
>> # Nodes
>> I=0     W=!NULL t=0
>> I=1     W=!NULL t=1.08
>> I=2     W=le    t=0.14  v=1
>> I=3     W=chien t=0.33  v=1
>> I=4     W=miaule        t=0.83  v=1
>> I=5     W=!NULL t=1.08
>> # Links
>> J=0     S=0     E=2     a=-55.36        l=-2.74741
>> J=1     S=2     E=3     a=-72.28        l=-9.61595
>> J=2     S=3     E=4     a=-72.28        l=-inf
>> J=3     S=4     E=5     a=-91.5701      l=-2.87136
>> J=4     S=5     E=1     l=-2.87136
> Something strange happens : the "bos" and "eos" tags disappear and 
> !NULL tags are introduced instead.
> Why aren't the "bos" and "eos" printed anymore and why are these !NULL 
> tagged considered insteand ?
> Can't I just keep the same lattice structure as the one given in input ?
>
> I'm facing this problem since several months and still did not find 
> any solution. I would be really grateful if you help me.
<s> and </s> are replaced by !NULL because they are not necessary, since 
the start/end of sentence are implicit in the lattice structure.
For example, when rescoring the lattice with an LM the initial node is 
implicitly treat as the <s> context.

However, I can see how you would want to preserve these tags for some 
applications.
If you download the beta version of srilm you will find a new option: 
lattice-tool -print-sent-tags will output <s> and </s> in the lattice 
format (both HTK and PFSG).

Andreas
>
> Regards,
> Gwénolé Lecorvé.
 




More information about the SRILM-User mailing list