Search SRILM-USER Archives

Match: Format: Sort by:
Search:

Re: question about vocabulary

From: Anand Venkataraman <anand at ADDRESS HIDDEN>
Date: Tue, 4 May 2004 08:53:13 -0700 (PDT)

> I would like to know if it's possible with the SRILM toolkit to generate
> a vocabulary with the 20000 most frequent words of a corpus for example.

You should be able achieve this by using "ngram-count -order 1 -write -",
doing reverse sort on field 2 and taking the top 20000.

&

Click here to go to the SRILM home page.