ngram-count : -tagged option

Andreas Stolcke stolcke at speech.sri.com
Mon Feb 23 14:03:34 PST 2004


In message <Law10-F558I2lreZkHY0000a10c at hotmail.com>you wrote:
> Hello everybody!
> Does anybody has any experience of using -tagged option to ngram-count? I 
> thought that -tagged option means that ngram-count creates tag-based model, 
> but I got strange results. In the resulting counts-file appear a kind of 
> mixture of words and tags... My input text file has a following form:
> <s>word1/tag1 word2/tag2 .... wordN/tagN</s>

This option is for building ngram LMs that use the word class for 
backoff, and thus hopefully improved smoothing.  It is not documented,
I'm afraid, so will be hard to use unless you are willing to look closely
at the code.  I remember someone on this list reported a bug with the
code a while back, so maybe there are some people out there who can help.
Also, there is a small example in test suite (test/tests/tagged-ngram).

I should note that the "factored N-gram" models recently added to
SRILM (release 1.4) are a generalization of tagged N-grams, and 
there is good documentation for those.  So you might want to 
think about reformulating whatever it is you are thinking of as a factored
LM.

--Andreas 




More information about the SRILM-User mailing list