segment-nbest

segment-nbest

NAME

segment-nbest - rescore and segment N-best lists using hidden segment N-gram model

SYNOPSIS

\fsegment-nbest\fP [ -help ] option ... nbest-file-list ...

DESCRIPTION

segment-nbest processes a series of consecutive N-best lists from a speech recognizer and applies a hidden segment N-gram language model to them. The language model is a standard backoff N-gram model in ARPA ngram-format(5) modeling sentence segmentation using the boundary tags <s> and </s>. The program reads in all N-best lists and outputs the hypotheses that have the highest aggregate (combined acoustic and language model) score. Hypothesized sentence boundaries are marked by <s> tags.

OPTIONS

Each filename argument can be an ASCII file, or a compressed file (name ending in .Z or .gz), or ``-'' to indicate stdin/stdout.

-help
Print option summary.
-version
Print version information.
-order n
Set the maximal N-gram order to be used, by default 3. NOTE: The order of the model is not set automatically when a model file is read, so the same file can be used at various orders.
-debug level
Set the debugging output level (0 means no debugging output). Debugging messages are sent to stderr.
-lm file
Read the N-gram model from file.
-tolower
Map all vocabulary to lowercase. Useful if case conventions for N-best lists and language model differ.
-mix-lm file
Read a second, standard N-gram model for interpolation purposes.
-lambda weight
Set the weight of the main model when interpolating with -mix-lm. Default value is 0.5.
-bayes length
Interpolate the second and the main model using posterior probabilities for local N-gram-contexts of length length. The -lambda value is used as a prior mixture weight in this case.
-bayes-scale scale
Set the exponential scale factor on the context likelihood in conjunction with the -bayes function. Default value is 1.0.
-nbest-files list
Specifies a list of N-best files. The file list should contain a list of filenames, one per line, each corresponding to an N-best file in one of the formats described in nbest-format(5). The N-best files should correspond to consecutive speech waveforms in the order listed.
-fb-rescore
Perform Forward-backward rescoring. This generates new N-best lists as output whose LM scores reflect the posterior probability of each hypothesis. The default is to perform Viterbi rescoring and output only the best combined hypothesis.
-write-nbest-dir dir
Write rescored N-best lists to directory dir instead of to stdout. The filenames from the input are preserved.
-max-nbest n
Limits the number of hypotheses read from each N-best list to the first n.
-max-rescore m
Only choose among the top m hypotheses of each list (after reordering hypotheses, see below). This is an effective way to limit the quadratic computation of the Viterbi or forward/backward dynamic programming.
-no-reorder
Do not reorder the hypotheses before limiting the computation to the top m. By default the hypotheses will first be sorted according to the acoustic and language model scores recorded in the N-best lists.
-rescore-lmw weight
Specifies the language model weight to be use in combining acoustic and language model scores to select the best hypotheses.
-rescore-wtw weight
Specifies the word transition weight to be used in selecting the best hypotheses.
-noise noise-tag
Designate noise-tag as a vocabulary item that is to be ignored by the LM. (This is typically used to identify a noise marker.)
-noise-vocab file
Read several noise tags from file, instead of, or in addition to, the single noise tag specified by -noise.
-decipher-lm model-file
Designates the N-gram backoff model (typically a bigram) that was used by the Decipher(TM) recognizer in computing composite scores. Used to compute acoustic scores from the composite scores if the N-best lists are in "NBestList1.0" format.
-decipher-lmw weight
Specifies the language model weight used by the recognizer. Used to compute acoustic scores from the composite scores.
-decipher-wtw weight
Specifies the word transition weight used by the recognizer. Used to compute acoustic scores from the composite scores.
-stag string
Use string to mark segment boundaries in the output. Default is the start-of-sentence symbol defined in the language model (<s>).
-bias b
Make a segment boundary a priori more likely by a factor of b. If b is 0, the dynamic program algorithm is restricted to never consider hidden sentence boundaries; this is useful when segment-nbest is used merely for its ability to apply the LM across N-best boundaries.
-start-tag string
Insert a tag string at the front of every N-best hypothesis read in.
-end-tag string
Insert a tag string at the end of every N-best hypothesis read in. This and the previous option are useful if the LM marks acoustic waveform boundaries with a special tag.

segment-nbest will also process any command line arguments following the options as lists of N-best lists, as with the -nbest-files option. Each nbest-file-list will be processed in turn, with individual output delimited by a line of the form

	<nbestfile nbest-file-list>

SEE ALSO

ngram-count(1), segment(1), ngram-format(5), nbest-format(5).
A. Stolcke, ``Modeling Linguistic Segment and Turn Boundaries for N-best Rescoring of Spontaneous Speech,'' Proc. Eurospeech, 2779-2782, 1997.

BUGS

N-gram models of arbitrary order can be used, but the context at the beginning of a hypothesis never extends beyond the words from the preceding N-best list.

AUTHOR

Andreas Stolcke <stolcke@speech.sri.com>.
Copyright 1997-2004 SRI International