Search SRILM-USER Archives

Match: Format: Sort by:
Search:

Re: write-vocab

From: Andreas Stolcke <stolcke at ADDRESS HIDDEN>
Date: Tue, 15 May 2007 09:28:24 -0700

B. Plank wrote:
> Dear SRILM-team,
>
> is there a parameter to get the n most frequent words out of a LM? (i.e.
> like restricing the write-vocab of "ngram -order 1" to just output the
> n-most frequent words?) I am sure there is, just now I don't see it.
>
> Thank you for any help,
> Barbara
>
>  
Actually, there is no such tool.  The frequency of words is not
generally available in the LM, only their unigram
probabilities.  Since the unigram probabilities are usually  a monotonic
function of the unigram frequencies you
could write a small script that extracts the words from the unigram
section of the LM file and sorts them by
their probabilities.

Andreas

Click here to go to the SRILM home page.