--0-1947332894-1056402142=:31216
Content-Type: text/plain; charset=us-ascii
Thanks alot!
Yang Liu <yangl at ADDRESS HIDDEN> wrote:
Hi June,
After you get the automatically induced classes (the class definition in file
text.cls), you can map all the words in your training set to classes using:
replace-words-with-classes classes=text.cls training_set > training_set_classes
Then you can any order class-based LM from that.
Hope this helps.
-- Yang
>Hi,
>
> I tried to build class based LMs in the following way:
>
> step-1: ngram-class -text test.in -numclasses 100 -class-counts text.cnt
-classes text.cls -save 100
>
> step-2: ngram-count -read text.cnt -memuse -kndiscount -kndiscount1
-kndiscount2 -lm text.srilm.gz
>
> I found that the class count output "text.cnt" from step-1 is only
bigram-counts. Thus the final class-LM text.srilm.gz is also a bigram one.
>
> Could anyone tell me if I am using the toolkit correctly? How to build a
trigram class-based LM? Also are there any published paper/document that I can
look up for detail information?
>
> Many thanks,
>
>-June
>
>
>---------------------------------
>Do you Yahoo!?
>SBC Yahoo! DSL - Now only $29.95 per month!
---------------------------------
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
--0-1947332894-1056402142=:31216
Content-Type: text/html; charset=us-ascii
<DIV>Thanks alot! </DIV>
<DIV><BR><BR><B><I>Yang Liu <yangl at ADDRESS HIDDEN></I></B> wrote:</DIV>
<BLOCKQUOTE style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #1010ff 2px solid">Hi June,<BR>After you get the automatically induced classes (the class definition in file <BR>text.cls), you can map all the words in your training set to classes using: <BR>replace-words-with-classes classes=text.cls training_set > training_set_classes<BR>Then you can any order class-based LM from that.<BR><BR>Hope this helps.<BR>-- Yang<BR><BR><BR><BR><BR>>Hi,<BR>> <BR>> I tried to build class based LMs in the following way:<BR>> <BR>> step-1: ngram-class -text test.in -numclasses 100 -class-counts text.cnt <BR>-classes text.cls -save 100<BR>> <BR>> step-2: ngram-count -read text.cnt -memuse -kndiscount -kndiscount1 <BR>-kndiscount2 -lm text.srilm.gz<BR>> <BR>> I found that the class count output "text.cnt" from step-1 is only <BR>bigram-counts. Thus the final class-LM text.srilm.gz is also a bigram one. <BR>> <BR>> Could anyone tell me if I am using the
toolkit correctly? How to build a <BR>trigram class-based LM? Also are there any published paper/document that I can <BR>look up for detail information? <BR>> <BR>> Many thanks,<BR>> <BR>>-June<BR>><BR>><BR>>---------------------------------<BR>>Do you Yahoo!?<BR>>SBC Yahoo! DSL - Now only $29.95 per month!<BR></BLOCKQUOTE><p><hr SIZE=1>
Do you Yahoo!?<br>
<a href="http://pa.yahoo.com/*http://rd.yahoo.com/evt=1207/*http://promo.yahoo.com/sbc/">SBC Yahoo! DSL</a> - Now only $29.95 per month!
--0-1947332894-1056402142=:31216--
Click here to go to the SRILM home page.