Search SRILM-USER Archives

Match: Format: Sort by:
Search:

Re: add new words to current classes

From: Andreas Stolcke <stolcke at ADDRESS HIDDEN>
Date: Tue, 12 Jun 2007 10:49:30 -0700

Sergey Protasov wrote:
> Dear experts,
>
> I have small corpora with dictionary of 10K words that split on 200
> classes.
>
> And I have big corpora with dictionary of 30K words (20K of new words).
>
> I want to split 20K new words to the 200 classes that exist.
>
> How can I do it? (using srilm)
>
> I dont want to move any of old 10K words from class to class.

I agree this would be a useful function to have, but unfortunately it is
not currently implemented.
It should be fairly straightforward to do based on the existing code.

You basically  need to load an existing class definition, then create
singleton classes for the
new words, and start incremental merging with the number of classes
limited to the original set.

If you care about this problem you should try to modify ngram-class.cc
and share the results with
the rest of us! I'd be happy to give some guidance and review changes if
you are willing to do the work.

Andreas

Andreas

Click here to go to the SRILM home page.