Search SRILM-USER Archives

Match: Format: Sort by:
Search:

ARPA format (sorting)

From: Paul Melis <melis at ADDRESS HIDDEN>
Date: Tue, 11 Mar 2003 23:21:59 +0100

Hello Andreas,

Is there any explicit sorting that LM's in ARPA format should have? Specifically, is there a standard sort order for the words of uni-, bi- and trigrams? (e.g. <unk> first, then diacritics, then alphabetically, then...).
We've had some problems with arpa's written by SRILM that the CMU toolkit can't handle, and we suspect a problem in the sorting of n-grams.

Regards,
Paul
--
melis at ADDRESS HIDDEN

Click here to go to the SRILM home page.