<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta content="text/html; charset=GB2312" http-equiv="Content-Type">

</head>

<body bgcolor="#ffffff" text="#000000">

On 12/19/2009 4:19 AM, 王秋锋 wrote:

<blockquote cite="mid:200912192019062506438@gmail.com" type="cite">

  <meta http-equiv="Content-Type" content="text/html; charset=GB2312">

  <meta content="MSHTML 6.00.2900.3562" name="GENERATOR">

  <style>BLOCKQUOTE {

        MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px; MARGIN-LEFT: 2em

}

OL {

        MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px

}

UL {

        MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px

}

  </style>

  <div><font face="Verdana" size="2">hi all,</font></div>

  <div>&nbsp;I get the original BiGram from the text with ngram-count tool,</div>

  <div>like "ngram-count -text&nbsp; corpus&nbsp; -lm Original_BiGram&nbsp; -order 2"</div>

  <div>so the original_Bigram is very large, I need pruning, like

"ngram -lm Original_BiGram -order 2 -prune... "</div>

  <div>But I found that the -prune tool can not prune the UniGram, the

-minprune n is at least 2.</div>

  <div>So&nbsp;What can I do to prune the Unigram?</div>

  <div>because&nbsp;all the words&nbsp;from the corpus are in the Unigram, it is

too large, and some words&nbsp;are really useless. <br>

  </div>

</blockquote>

Make a list of the words you want to INclude then use that as the

vocabulary of your LM<br>

<br>

ngram-count -vocab LIST&nbsp;&nbsp; ...<br>

<br>

Andreas <br>

<br>

</body>

</html>