<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=GB2312" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
On 12/19/2009 4:19 AM, ÍõÇï·æ wrote:
<blockquote cite="mid:200912192019062506438@gmail.com" type="cite">
<meta http-equiv="Content-Type" content="text/html; charset=GB2312">
<meta content="MSHTML 6.00.2900.3562" name="GENERATOR">
<style>BLOCKQUOTE {
        MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px; MARGIN-LEFT: 2em
}
OL {
        MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px
}
UL {
        MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px
}
</style>
<div><font face="Verdana" size="2">hi all,</font></div>
<div> I get the original BiGram from the text with ngram-count tool,</div>
<div>like "ngram-count -text corpus -lm Original_BiGram -order 2"</div>
<div>so the original_Bigram is very large, I need pruning, like
"ngram -lm Original_BiGram -order 2 -prune... "</div>
<div>But I found that the -prune tool can not prune the UniGram, the
-minprune n is at least 2.</div>
<div>So What can I do to prune the Unigram?</div>
<div>because all the words from the corpus are in the Unigram, it is
too large, and some words are really useless. <br>
</div>
</blockquote>
Make a list of the words you want to INclude then use that as the
vocabulary of your LM<br>
<br>
ngram-count -vocab LIST ...<br>
<br>
Andreas <br>
<br>
</body>
</html>