nbest-pron-score - score pronunciations and pauses in N-best hypotheses


nbest-pron-score [ -help ] option ...


nbest-pron-score reads N-best lists and computes log probability scores for the pronunciations and pauses contained in them. Pronunciation scoring requires that the N-best lists contain phone backtraces in "NBestList2.0" nbest-format(5).

Pronunciation scores are computed from the probabilities in a dictionary. Pauses are binned into three length classes (none, short, long) and scored according to a trigram language model that conditions the pause length on the left and right neighboring words, in that order (so that bigram backoff uses the left neighbor only).


Each filename argument can be an ASCII file, or a compressed file (name ending in .Z or .gz), or ``-'' to indicate stdin/stdout.

Print option summary.
Print version information.
-debug level
Controls the amount of output (the higher the level, the more).
Map all vocabulary to lowercase. Useful if case conventions for text/counts and language model differ.
Deal with N-best lists containing multiwords joined by underscores. This only affects pause scoring: if a word adjacent to a pause is a multiword and is not in the vocabulary of the pause LM, then it is split and only the component closest to the pause is conditioned on.
-multi-char C
Character used to delimit component words in multiwords (an underscore character by default).
-nbest file
Score the N-best hypothese in file.
-rescore file
Same as -nbest.
-nbest-files file
Process all N-best list filenames listed in file.
-max-nbest n
Limits the number of hypotheses read from an N-best list. Only the first n hypotheses are processed.
-dictionary file
Enable pronunciation scoring, using the pronunciation dictionary file. Each line contains a pronunciation in the format
	word [p] phone ...
The optional value p is the pronunciation probability. If the second field in a line is not a number the pronunciation is assumed to have probability one.
Interpret probabilities in the dictionary as intlog-scaled log probabilities (as used in the SRI Decipher(TM) system), rather than straight probabilities.
-pause-lm file
Enable pause scoring, using the pause LM in file.
-no-pause tag
The word used to represent the absence of a pause in the pause LM.
-short-pause tag
The word used to represent a short pause in the pause LM.
-long-pause tag
The word used to represent a long pause in the pause LM.
-min-pause-dur T
The minimum duration, in seconds, for a non-speech region to be considered a (short) pause.
-long-pause-dur T
The duration, in second, above which a non-speech region is considered a "long" pause.

The default values for pause tags and duration thresholds are printed by the -help option.

-pron-score-dir dir
Write pronunciation scores to dir when processing multiple N-best lists, using output filenames derived from the input files.
-pause-score-dir dir
Write pause scores to dir when processing multiple N-best lists, using output filenames derived from the input files.
-pause-score-weight W
Add pause LM scores to the pronunciation scores after multiplying them by W. This creates a single weighted combination of both models. Pause scores can still be output separately by specifying -pause-score-dir.


nbest-format(5), nbest-scripts(1), nbest-optimize(1), ngram(1).
D. Vergyri, A. Stolcke, V. R. R. Gadde, L. Ferrer, & E. Shriberg, ``Prosodic Knowledge Sources for Automatic Speech Recognition''. Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing, Hong Kong, April 2003.


The binning of pause lengths into three classes should be generalized.


Andreas Stolcke <stolcke@speech.sri.com>.
Copyright 2002-2008 SRI International