nbest-pron-score
nbest-pron-score
NAME
nbest-pron-score - score pronunciations and pauses in N-best hypotheses
SYNOPSIS
nbest-pron-score [ -help ] option ...
DESCRIPTION
nbest-pron-score
reads N-best lists and computes log probability scores for the pronunciations
and pauses contained in them.
Pronunciation scoring requires that the N-best lists
contain phone backtraces in "NBestList2.0"
nbest-format(5).
Pronunciation scores are computed from the probabilities in a dictionary.
Pauses are binned into three length classes (none, short, long) and
scored according to a trigram language model that conditions the pause length
on the left and right neighboring words, in that order (so that bigram
backoff uses the left neighbor only).
OPTIONS
Each filename argument can be an ASCII file, or a
compressed file (name ending in .Z or .gz), or ``-'' to indicate
stdin/stdout.
- -help
-
Print option summary.
- -version
-
Print version information.
- -debug level
-
Controls the amount of output (the higher the
level,
the more).
- -tolower
-
Map all vocabulary to lowercase.
Useful if case conventions for text/counts and language model differ.
- -multiwords
-
Deal with N-best lists containing multiwords joined by underscores.
This only affects pause scoring: if a word adjacent to a pause is
a multiword and is not in the vocabulary of the pause LM, then it is split
and only the component closest to the pause is conditioned on.
- -multi-char C
-
Character used to delimit component words in multiwords
(an underscore character by default).
- -nbest file
-
Score the N-best hypothese in
file.
- -rescore file
-
Same as
-nbest.
- -nbest-files file
-
Process all N-best list filenames listed in
file.
- -max-nbest n
-
Limits the number of hypotheses read from an N-best list.
Only the first
n
hypotheses are processed.
- -dictionary file
-
Enable pronunciation scoring, using the pronunciation dictionary
file.
Each line contains a pronunciation in the format
word [p] phone ...
The optional value
p
is the pronunciation probability.
If the second field in a line is not a number the pronunciation is assumed
to have probability one.
- -intlogs
-
Interpret probabilities in the dictionary as intlog-scaled log probabilities
(as used in the SRI Decipher(TM) system), rather than straight probabilities.
- -pause-lm file
-
Enable pause scoring, using the pause LM in
file.
- -no-pause tag
-
The word used to represent the absence of a pause in the pause LM.
- -short-pause tag
-
The word used to represent a short pause in the pause LM.
- -long-pause tag
-
The word used to represent a long pause in the pause LM.
- -min-pause-dur T
-
The minimum duration, in seconds, for a non-speech region to be considered
a (short) pause.
- -long-pause-dur T
-
The duration, in second, above which a non-speech region is considered a
"long" pause.
The default values for pause tags and duration thresholds are printed by the
-help
option.
- -pron-score-dir dir
-
Write pronunciation scores to
dir
when processing multiple N-best lists,
using output filenames derived from the input files.
- -pause-score-dir dir
-
Write pause scores to
dir
when processing multiple N-best lists,
using output filenames derived from the input files.
- -pause-score-weight W
-
Add pause LM scores to the pronunciation scores after multiplying them
by
W.
This creates a single weighted combination of both models.
Pause scores can still be output separately by specifying
-pause-score-dir.
SEE ALSO
nbest-format(5), nbest-scripts(1), nbest-optimize(1), ngram(1).
D. Vergyri, A. Stolcke, V. R. R. Gadde, L. Ferrer, & E. Shriberg,
``Prosodic Knowledge Sources for Automatic Speech Recognition''.
Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing,
Hong Kong, April 2003.
BUGS
The binning of pause lengths into three classes should be generalized.
AUTHOR
Andreas Stolcke <stolcke@speech.sri.com>.
Copyright 2002-2008 SRI International