The
LM
class specifies a minimal language model interface and
provides some generic utilities.
LM
inherits from
Debug,
and the debugging level of an LM object determines if and how much
verbose information various is printed by various functions.
CLASS MEMBERS
LM(Vocab &vocab)
Initializeing an LM object requries specifying the vocabulary
over which the LM is defined.
The vocab object can be shared among different LM instances.
The LM object can modify vocab as a side-effect, e.g., as a result
of reading an LM from a file.
LogP wordProb(VocabIndex word, const VocabIndex *context)
LogP wordProb(VocabString word, const VocabString *context)
Returns the conditional log probability of word given a history.
The history is given in reversed order (most recent word first) in
context, and terminated by Vocab_None.
Word or history can be specified either by strings or indices.
All functional LM subclasses have to implement at least the first version.
LogP wordProbRecompute(VocabIndex word, const VocabIndex *context)
Returns the same conditional log probability as wordProb(),
but on the promise that context is identical to the last call
to wordProb().
This often allows for efficient implementation to speed up repeated
lookups in the same context.
Returns the total log probability of a string of word (a sentence).
The data in the stats object is incremented to reflect the
statistics of the sentence.
Reads sentences from file, computing their probabilities and
aggregate perplexity, and updating the stats.
The debugging state of the LM object determines how much information is
printed to stderr.
debuglevel 0: total statistics only;
debuglevel 1: per-sentence statistics;
debuglevel 2: word probabilities;
debuglevel 3 and greater: LM specific information.
Lines in file that start with escapeString are copied to
the output.
This allows extra information in the input file to be passed through
unchanged.
Reads N-best hypotheses and scores from file, replaces the
LM scores with new ones computed from the current model, and prints
the new scores (including hypotheses) to stdout.
lmScale and wtScore are the LM and word transition weights,
respectively.
oldLM is the LM whose scores are included in the aggregate scores
read from the input (provided so that they can be subtracted out),
and oldLmScale and oldWtScale are the old LM and word
transition weights, respectively.
Lines in file that start with escapeString are copied to
the output.
void setState(const char *state)
This is a generic interface to change the internal ``state'' of a LM.
The default implementation of this function does nothing, but certain
LM subclass implementation may interpret the state string to
assume different internal configurations.
Prob wordProbSum(const VocabIndex *context)
Returns the sum of all word probabilities in context.
Useful for checking the well-definedness of a model.
Generates a random sentence of length up to maxWords.
The result is placed in sentence if specified, or in a
static buffer otherwise.
void *contextID(const VocabIndex *context)
Returns an implementation-dependent value that identifies a the
word context used to compute a conditional probability.
(The context actually used may be shorted that what is specified
in context).
Boolean isNonWord(VocabIndex word)
Return true if word is a regular word in the LM, i.e.,
one that the LM computes probabilities for (as opposed to
non-event tag such as sentence-start).
Read a LM from file.
Return true is the file contents was formated correctly and
an internal LM representation could be successfully constructed from it.
The optional 2nd argument controls whether words not already in the vocabulary
are to be added automatically.
void write(File &file)
Writes the LM to file in a format that can be read back by
read().
Vocab &vocab
The vocabulary object associated with LM (set at initialization).
VocabIndex noiseIndex
The index of the noise tag, i.e., a word that is skipped when
computing probabilities.
const char *stateTag
A string introducing ``state'' information that should be passed to the LM.
Input lines starting with this tag are handed to \fBsetState()\fB
by pplFile() and rescoreFile().
Boolean reverseWords
If set to true, the LM reverses word order before computing
sentence probabilities.
This means wordProb() is expected to compute conditional
probabilities based on right contexts.