Search SRILM-USER Archives

Match: Format: Sort by:
Search:

Re: ngram-count : -tagged option

From: Andreas Stolcke <stolcke at ADDRESS HIDDEN>
Date: Mon, 23 Feb 2004 14:03:34 PST

In message <Law10-F558I2lreZkHY0000a10c at ADDRESS HIDDEN>you wrote:
> Hello everybody!
> Does anybody has any experience of using -tagged option to ngram-count? I
> thought that -tagged option means that ngram-count creates tag-based model,
> but I got strange results. In the resulting counts-file appear a kind of
> mixture of words and tags... My input text file has a following form:
> <s>word1/tag1 word2/tag2 .... wordN/tagN</s>

This option is for building ngram LMs that use the word class for
backoff, and thus hopefully improved smoothing.  It is not documented,
I'm afraid, so will be hard to use unless you are willing to look closely
at the code.  I remember someone on this list reported a bug with the
code a while back, so maybe there are some people out there who can help.
Also, there is a small example in test suite (test/tests/tagged-ngram).

I should note that the "factored N-gram" models recently added to
SRILM (release 1.4) are a generalization of tagged N-grams, and
there is good documentation for those.  So you might want to
think about reformulating whatever it is you are thinking of as a factored
LM.

--Andreas

Click here to go to the SRILM home page.