Divider
  Speech Technology and Research Laboratory
  People
  Current Research Activities
  Past Research Activities
  Publications
  Career Opportunities
  Seminars
  Technologies for License
  In the News
  Contact Us
  STAR Search
  Information and Computing Sciences Division
SpacerAbout UsDividerR and D DivisionsDividerCareersDividerNewsroomDividerContact UsDividerSRI HomeSpacer

Spacer
         
  SRI Logo

Search SRILM-USER Archives

Match: Format: Sort by:
Search:

Re: Problems about srilm

From: Andreas Stolcke <stolcke at ADDRESS HIDDEN>
Date: Thu, 19 Apr 2007 13:09:37 -1000

¬x¤j¥° wrote:
> Hello!
> I am a student from Taiwan.
> I have some questions when I encountered difficulties in using srilm. The
> problem is as the attaching field. And when I made google n-gram models, I
> also encountered the same problem. Would you please tell me what the mistake
> did I make? Thank you!
>  
It is impossible to read the entire google 5gram corpus into memory,
which is what you are trying to do.
You have to use the count-based LM, and estimate deleted interpolation
weights from a small amount of
data, so that only a small portion of the ngrams need to be kept in memory.

I'm sorry there is no good documentation of this process at this point
(you can piece it together by reading
the manual pages for ngram-count and ngram, and look at the example in

$SRILM/test/tests/ngram-count-lm-limit-vocab/run-test

We will make complete instructions for google ngram usage available in
the future.

Andreas

> --
> Chaoyang University of Technology
> WebMail http://webmail.cyut.edu.tw
>
>
>
>  
>
> ------------------------------------------------------------------------
>

Click here to go to the SRILM home page.

 

About Us  Vertical divider  R&D Divisions  Divider  Careers  Divider  Newsroom  Divider  Contact Us
©2006 SRI International, 333 Ravenswood Avenue, Menlo Park, CA 94025-3493
SRI International is an independent, nonprofit corporation. Privacy policy

Last modified Nov 21, 2008