Divider
  Speech Technology and Research Laboratory
  People
  Current Research Activities
  Past Research Activities
  Publications
  Career Opportunities
  Seminars
  Technologies for License
  In the News
  Contact Us
  STAR Search
  Information and Computing Sciences Division
SpacerAbout UsDividerR and D DivisionsDividerCareersDividerNewsroomDividerContact UsDividerSRI HomeSpacer

Spacer
         
  SRI Logo

SRILM Manual Pages

Papers and Tutorials

Novice users should consult the following papers and tutorials first, where applicable.
  • A. Stolcke, SRILM - An Extensible Language Modeling Toolkit, in Proc. Intl. Conf. Spoken Language Processing, Denver, Colorado, September 2002.
    Gives an overview of SRILM design and functionality.
  • D. Jurafsky, Language Modeling, Lecture 11 of his course on "Speech Recognition and Synthesis" at Stanford.
    Excellent introduction to the basic concepts in LM.
  • J. Goodman, The State of The Art in Language Modeling, presented at the 6th Conference of the Association for Machine Translation in the Americas (AMTA), Tiburon, CA, October, 2002.
    Tutorial presentation and overview of current LM techniques (with emphasis on machine translation).
  • K. Kirchhoff, J. Bilmes, and K. Duh, Factored Language Models Tutorial, Tech. Report UWEETR-2007-0003, Dept. of EE, U. Washington, June 2007.
    This report serves as both a tutorial and reference manual on FLMs.
  • S. F. Chen and J. Goodman, An Empirical Study of Smoothing Techniques for Language Modeling, Tech. Report TR-10-98, Computer Science Group, Harvard U., Cambridge, MA, August 1998 (original postscript document).
    Excellent overview and comparative study of smoothing methods. Served as a reference for many of the methods implemented in SRILM.

FAQ

Answers to frequently asked questions and notes on N-gram smoothing implementations.

Programs

These are the top-level executables that are currently part of SRILM:

ngram-count(1)
count N-grams and estimate language models
ngram-merge(1)
merge N-gram counts
ngram(1)
apply N-gram language models
ngram-class(1)
induce word classes from N-gram statistics
disambig(1)
disambiguate text tokens using an N-gram model
hidden-ngram(1)
tag hidden events between words
nbest-lattice(1)
rescore N-best lists and lattices
nbest-optimize(1)
optimize score combination for N-best word error minimization
nbest-mix(1)
interpolate N-best posterior probabilities
segment(1)
segment text using N-gram language model
segment-nbest(1)
rescore and segment N-best lists using N-gram language models
anti-ngram(1)
count posterior-weighted N-grams in N-best lists
multi-ngram(1)
build multiword N-gram models
lattice-tool(1)
manipulate word lattices
nbest-pron-score(1)
score pronunciations and pauses in N-best hypotheses

Utility Scripts

Additional tools implemented as scripts:

training-scripts(1)
miscellaneous conveniences for language model training
lm-scripts(1)
manipulate N-gram language models
ppl-scripts(1)
manipulate perplexities
pfsg-scripts(1)
create and manipulate finite-state networks
nbest-scripts(1)
rescore and evaluate N-best lists
select-vocab(1)
select a maximum-likelihood vocabulary from a mixture of corpora

File Formats

Some of the data formats used by SRILM:

ngram-format(5)
ARPA backoff N-gram models
classes-format(5)
Word class definitions
pfsg-format(5)
Decipher(TM) probabilistic finite-state grammars
nbest-format(5)
N-best hypotheses lists

LM Library Classes

These are some of the basic classes of the SRILM library. Note that this list is woefully incomplete, as this part of the documentation is largely yet to be written.

LM(3)
Generic language model
Vocab(3)
Vocabulary indexing for SRILM
Prob(3)
Probabilities for SRILM
File(3)
Wrapper for stdio streams

Back to the SRILM home page.

 

About Us  Vertical divider  R&D Divisions  Divider  Careers  Divider  Newsroom  Divider  Contact Us
©2006 SRI International, 333 Ravenswood Avenue, Menlo Park, CA 94025-3493
SRI International is an independent, nonprofit corporation. Privacy policy

Last modified Feb 06, 2008