nbest-scripts

nbest-scripts

NAME

nbest-scripts, combine-rover-controls, compare-sclite, compute-sclite, fix-ctm, merge-nbest, nbest-error, nbest-posteriors, nbest-rover, nbest-vocab, nbest2-to-nbest1, rescore-acoustic, rescore-decipher, rescore-reweight, sentid-to-sclite - rescore and evaluate N-best lists

SYNOPSIS

rescore-decipher [ -bytelog ] [ -nodecipherlm ] [ -multiwords ] \
	[ -multi-char C ] [ -pretty mapfile ] \
	[ -ngram-tool program ][ -filter command ] \
	[ -norescore ] [ -lm-only ] [ -count-oovs ] [ -limit-vocab ] \
	[ -vocab-aliases mapfile ] [ -fast ] \
	nbest-file-list score-dir -lm ... lm-options ...
rescore-acoustic old-nbest-dir|old-file-list old-ac-weight \
	new-score-dir1 new-ac-weight1 ... new-nbest-dir [ max-nbest ]
rescore-reweight [ -multiwords ] [ -multi-char C ] score-dir|file-list \
	[ lmw [ wtw [ score-dir1 score-weight1 ... ] [ max-nbest ]]]
rescore-minimize-wer score-dir [ lmw [ wtw [ max-nbest ]]]
nbest2-to-nbest1 [ nbest-file ]
nbest-rover [ sentid-list | - ] control-file \
	[ posterior-file [ nbest-lattice-options ] ]
combine-rover-controls [ lambda=weights ] rover-control [ ... ]
nbest-posteriors [ weight=W ] [ lmw=lmw ] [ wtw=wtw ] [ postscale=S ] \
	[ max_nbest=M ] nbest-file
merge-nbest [ multiwords=1 ] [ multichar=C ] [ nopauses=1 ] \
	[ max_nbest=M ] nbest-file ...
nbest-vocab [ nbest-list ... ]
nbest-error score-dir|file-list refs [ nbest-lattice-option ... ]
sentid-to-sclite hyps
sentid-to-ctm hyps
fix-ctm ctmfile
compute-sclite -r refs -h hyps [ -h hyps ... ] [ -S subset ... ] \
	[ -multiwords|-M ] [ -noperiods ] [ -R ] [ -g glmfile ] [ -H ] \
	[ -v ] [ sclite-options ...]
compare-sclite -r refs -h1 hyps1 -h2 hyps2 [ -S subset ] \
	[ compute-sclite-options ... ]

DESCRIPTION

These scripts perform common tasks on N-best hypotheses in nbest-format(5), especially those needed for rescoring and extracting and evaluating 1-best hypotheses.

rescore-decipher applies a language model implemented by ngram(1) to the N-best lists listed in nbest-file-list. The N-best files may be in compressed format. The rescored N-best lists are stored in directory score-dir. All following arguments are passed to ngram(1) and are used to control the language model. The following options are handled by rescore-decipher itself:

-bytelog
causes scores to be output on the bytelog scale (see nbest-format(1)).
-nodecipherlm
indicates that the recognizer language model is not being provided (with -decipher-lm). (This is only possible if the N-best lists are not in ``NBestList1.0'' format.)
-multiwords
specifies that N-best lists contain words joined by underscores, which are to be split into their component prior to rescoring.
-multi-char C
defines a multiword separator character. The default is underscore ``_''.
-pretty mapfile
specifies a word mapping file that allows individual words to be globally replaced by strings of zero or more other words, e.g., to remove vocabulary mismatches between the input N-best lists and the rescoring LM. The mapfile contains one mapping per line, the first field specifying the word to be replaced and subsequent fields forming the replacement string.
-ngram-tool program
specifies a non-standard program to perform the actual LM evaluation (by default, ngram(1) is used). Such a program must understand ngram's command-line options related to N-best rescoring.
-filter command
specifies a command that is used to filter the N-best hypotheses prior to evaluating the language model. This may be used for more general textual rewriting so that non-standard LMs can be applied. The output N-best lists will contain the filtered hypotheses.
-norescore
causes N-best lists to be simply reformatted from one of the Decipher formats into the SRILM N-best format, separating acoustic and LM scores, without replacing the existing LM scores. In this case only the ngram(1) options -decipher-lmw and -decipher-wtw are relevant, and others are ignored. -norescore and -filter may be used together to perform textual rewriting of N-best lists.
-lm-only
dumps out LM scores only, instead of complete N-best lists.
-count-oovs
writes the count of out-of-vocabulary and zero-probability words to the output score files (instead of rescored N-best lists).
-limit-vocab
saves memory by arranging for ngram(1) to load only those N-gram parameters that are relevant to the vocabulary of the N-best lists to be rescored. After determining the N-best vocabulary the -limit-vocab option is passed to ngram(1).
-vocab-aliases map
declares that certain words are to be treated as alternative spellings of the same word for LM evaluation; see the same option for ngram(1). The map is filtered of unused words when used in conjunction with -limit-vocab, and then passed on to ngram(1).
-fast
performs rescoring using only functions built into ngram(1). This avoids some computational and I/O overhead and therefore runs faster, but the options -filter, -pretty, and -lm-only are not supported, and -nodecipherlm is obligatory.

rescore-acoustic replaces the acoustic scores in a set of N-best lists by a weighted combination of new scores. The old N-best lists are given by either a directory old-score-dir or a filelist old-file-list; old-ac-weight is the weight given to the old scores. Directories containing the new scores are listed alternating with the corresponding weights; each score directory must contain one file per waveform segment, each having the same file basenames as the original N-best lists. The new scores should appear in a single column per file, one per line. The N-best lists containing the new combined acoustic scores are written to new-nbest-dir. The optional max-nbest argument can be used to limit the length of the N-best lists output. Also, When a new score file is encountered containing fewer than max-nbest lines, the missing scores are set to the lowest score encountered so far.

rescore-reweight combines the scores in N-best lists with a set of weights and outputs the 1-best hypotheses. The N-best files are found in directory score-dir or listed in file-list. Optional arguments set the language model weight lmw (default 8), the word transition weight wtw (default 0), and the maximum number max-nbest of hypotheses to consider (default all). Optionally, any number of additional score directories and associated weights score-dir1 score-weight1 score-dir2 score-weight2 ... can be specified, following the wtw parameter. These additional scores are combined with those contained in the N-best lists themselves as in rescore-acoustic (using unit weight for the original acoustic scores). The -multiwords and -multi-char options have the same function as for rescore-decipher. The output format for 1-best hypotheses is

	sentid w1 w2 ...
where sentid is the sentence ID derived from the N-best filename, followed by the words.

rescore-minimize-wer is similar to rescore-reweight but picks hypotheses using the word error minimization algorithm of nbest-lattice(1).

nbest2-to-nbest1 converts an N-best list in ``NBestList2.0'' format to ``NBestlist1.0'', for the benefit of programs that have not yet been updated to deal with the new format.

nbest-rover combines hypotheses from multiple N-best lists at the word level, by performing the same kind of word error minimization as nbest-lattice(1), in a generalization of the ROVER algorithm. sentid-list is a file listing sentence IDs. These must match the filenames in a set of N-best directories, which are specified in a control-file. The format for the latter is

	dir1 lmw1 wtw1 w1 [n1 [s1]]
	dir2 lmw2 wtw2 w2 [n2 [s2]]
	...
Each line specifies an N-best directory, the language model and word transition weights to be used in score combination, and a weight to be applied to the posterior probabilities. An optional next-to-last parameter for each N-best list allows the lists to be truncated to the top n1, n2, etc., hypotheses. The final optional parameter sets the posterior distribution scaling factor, which defaults to the language model weight. Optionally, control-file can also contain lines of the form dir w + These indicate that additional score files can be found in directory dir and that the scores found therein should be added to the following N-best list set with weight w. Several lines of this form may occur preceding a regular N-best directory specification; the corresponding additive combination of multiple scores is performed.
If ``-'' is specified for sentid-list, the sentence IDs are inferred from the contents of the first directory dir1 specified in control-file. If posterior-file is specified on the command line, posterior word probability estimates are written to that file. Any additional arguments are passed as options to the underlying nbest-lattice(1) invocation.
nbest-rover can process N-best lists in any of the formats described in nbest-format(5), as long as all N-best lists for a given utterance are in the same format. When Decipher formats are used only their acoustic scores are used.

combine-rover-controls takes one or more nbest-rover control files as arguments and outputs a new control file that specifies the combination of the input files. Each input system is given equal weight. Directory names in the input files are adjusted to reflect the relative location of the input files. The optional lambda= argument may be used to specify a space-separated list of system weights; the default weights are uniform.

nbest-posteriors rescales the scores in an N-best list to reflect (weighted) posterior probabilities. The output is the same N-best list with acoustic scores set to the log (base 10) of the posterior hyp probabilities and LM scores set to zero. postscale=S attenuates the posterior distribution by dividing combined log scores by S (the default is S=lmw). If weight=W is specified the posteriors are multiplied by W. max_nbest=M limits the number of hypotheses used to the top M. This script is used mostly as a helper in nbest-rover.

merge-nbest merges hypotheses from one or more N-best lists into a single list, collapsing hypotheses that occur in more than one input list. If all input lists use the same nbest-format(5) then the output will also be in that format and contain the information from the first list in which a hypothesis was encountered. Otherwise, the output will be in SRI Decipher(TM) NBestList1.0 format and contain acoustic scores and word strings only. The max_nbest=M option limits input to the first M hypotheses from each input list. multiwords=1 merges hypotheses that are identical after resolving multiwords, with multichar=C defining a non-default multiword separator character. nopauses=1 merges hypotheses that are identical after removal of pause words.

nbest-vocab outputs the vocabulary used in a set of N-best lists. (The N-best files cannot be compressed, but may be concatenated and supplied via stdin.)

nbest-error computes the overall oracle word error rate of a set of N-best lists in directory score-dir or listed in file-list. The reference answers are given in refs in the format output by rescore-reweight (see above). Additional arguments are passed to the underlying invocation of nbest-lattice(1), and can be used to limit the depth of the N-best list, compute lattice error rather than N-best error, etc.

sentid-to-sclite converts 1-best hypotheses and references in the format used here to the ``trn'' format expected by the NIST sclite(1) scoring software.

sentid-to-ctm converts 1-best hypotheses and references in the format used here to NIST ctm(5) format. The script relies on an encoding of conversation IDs, channel, and utterance time marks in the sentence IDs and may need adjustment to local conventions.

fix-ctm converts output produced by the -output-ctm option of nbest-lattice(1) and lattice-tool(1) to a format suitable for scoring with NIST sclite(1). It, too, relies on information encoded in the sentids IDs and may need adjustments.

compute-sclite is a wrapper around the NIST sclite(1) scoring tool. refs and hyps are the reference and hypothesized transcripts, respectively. The refs file can be either in "sentid" format or in stm(5) format. In the latter case, hyps will be converted to ctm(5) format using the sentid-to-ctm helper script. The hyps file can be either in "sentid" format or in ctm(5) format. More than one -h option can be given to combine the contents of multiple hypotheses files. Optionally, -S specifies a sorted list of sentence IDs subset to score. Multiple -S options may be given, to form the intersection of several subsets. -multiwords or -M splits ``multiwords'' joined by underscores into their component words prior to scoring. -noperiods deletes periods from the hypotheses prior to scoring (typically used to bridge different conventions for spelled letters). -R preserves reject words in the hypotheses for scoring (as appropriate if references also contain rejects). -g glmfile enables filtering of references and hypotheses by the NIST csrfilt.sh script, controlled by the filter file glmfile (this is only possible with an stm reference file). In that case, the -H option causes hesitations (as defined by the filter) to be deleted from the output for scoring purposes. -v displays the complete command used to invoke sclite. Any additional options are passed to sclite, e.g., to control its output actions or alignment mode.

compare-sclite scores two sets of hypotheses hyps1 and hyps2 for the same test set and computes in how many cases the first or second set had lower word error. The remaining options are as for compute-sclite. The script ignores hypotheses for sentence that do not appear in both hypothesis files, to ensure comparable scoring results.

SEE ALSO

nbest-format(5), ngram(1), nbest-lattice(1), nbest-optimize(1), sclite(1), stm(5), ctm(5).
J.G. Fiscus, A Post-Processing System to Yield Reduced Word Error Rates: Recognizer Output Voting Error Reduction (ROVER), Proc. IEEE Automatic Speech Recognition and Understanding Workshop, Santa Barbara, CA, 347-352, 1997.
A. Stolcke et al., "The SRI March 2000 Hub-5 Conversational Speech Transcription System", Proc. NIST Speech Transcription Workshop, College Park, MD, 2000.

BUGS

sentid-to-sclite has some assumptions about the structure of sentence IDs built-in and may need to be modified for compute-sclite and compare-sclite to work.

rescore-decipher -pretty may not work correctly with the -limit-vocab option if the word mapping adds to the vocabulary subset used in the N-best lists.

AUTHOR

Andreas Stolcke <stolcke@speech.sri.com>.
Copyright 1995-2006 SRI International