Search SRILM-USER Archives

Match: Format: Sort by:
Search:

-tagged option?

From: Gemma Boleda <gemma.boleda at ADDRESS HIDDEN>
Date: Tue, 17 May 2005 20:45:40 +0000

Hi,

I am using the -tagged option for ngram-count and I am experiencing 2
problems:

a) the slash is taken into account in the ngram counts: taking as input "la/DT
nena/N5 és/V maca/JQ ./PT", the bigrams look as follows:

<s> la 1
<s> /DT 1
la nena 1
nena és 1
és maca 1
/N5 és 1
/N5 /V 1
/V maca 1
/V /JQ 1
/DT nena 1
/DT /N5 1
maca . 1
/JQ . 1
/JQ /PT 1
. </s> 1
/PT </s> 1

Why is the slash considered as part of the tag?

b) as can be seen in the example, the n-grams with tags are only built
left-to-right, e.g. there is no bigram "la /N5", as I would have expected
(and needed).

Can you help me?

Thanks a lot,

Gemma Boleda
Universitat Pompeu Fabra
Barcelona

Click here to go to the SRILM home page.