pfsg-scripts

pfsg-scripts

NAME

pfsg-scripts, add-classes-to-pfsg, add-pauses-to-pfsg, classes-to-fsm, fsm-to-pfsg, htklat-vocab, make-nbest-pfsg, make-ngram-pfsg, pfsg-from-ngram, pfsg-to-dot, pfsg-to-fsm, pfsg-vocab, wlat-stats, wlat-to-dot, wlat-to-pfsg - create and manipulate finite-state networks

SYNOPSIS

make-ngram-pfsg [ maxorder=N ] [ check_bows=0|1 ] [ no_empty_bo=1 ] \
	[ version=1 ] [ top_level_name=name ] [ null=string ] \
	[ lm-file ] > pfsg-file
add-pauses-to-pfsg [ vocab=file ] [ pauselast=1 ] [ wordwrap=0 ] \
	[ pause=pauseword ] [ version=1 ] [ top_level_name=name ] \
	[ null=string ] [ pfsg-file ] > new-pfsg-file
add-classes-to-pfsg classes=classes [ null=string ] \
	[ pfsg-file ] > new-pfsg-file
pfsg-from-ngram [ lm-file ] > pfsg-file
make-nbest-pfsg [ notree=0|1 ] [ scale=S ] [ amw=A ] [ lmw=L ] \
	[ wtw=W ] [ nbest-file ] > pfsg-file
pfsg-vocab [ pfsg-file ... ]
htklat-vocab [ quotes=1 ] [ htk-lattice-file ... ]
pfsg-to-dot [ show_probs=0|1 ] [show_logs=0|1 ] [ show_nums=0|1 ] \
	[ pfsg-file ] > dot-file
pfsg-to-fsm [ symbolfile=symbols ] [ symbolic=0|1 ] \
	[ scale=S ] [ final_output=E ] [ pfsg-file ] > fsm-file
fsm-to-pfsg [ pfsg_name=name ] [ transducer=0|1 ] [ scale=S ] \
	[ map_epsilon=E ] [ fsm-file ] > pfsg-file
classes-to-fsm vocab=vocab [ symbolic=0|1 ] [ isymbolfile=isymbols ] \
	[ osymbolfile=osymbols ] [ classes ] > fsm-file
wlat-to-pfsg [ wlat-file ] > pfsg-file
wlat-to-dot [ show_probs=0|1 ] [ show_nums=0|1 ] \
	[ wlat-file ] > dot-file
wlat-stats [ wlat-file ]

DESCRIPTION

These scripts create and manipulate various forms of finite-state networks. Note that they take options with the gawk(1) syntax option=value instead of the more common -option value.

Also, since these tools are implemented as scripts they don't automatically input or output compressed model files correctly, unlike the main SRILM tools. However, since most scripts work with data from standard input or to standard output (by leaving out the file argument, or specifying it as ``-'') it is easy to combine them with gunzip(1) or gzip(1) on the command line.

make-ngram-pfsg encodes a backoff N-gram model in ngram-format(5) as a finite-state network in pfsg-format(5). maxorder=N limits the N-gram length used in PFSG construction to N; the default is to use all N-grams occurring in the input model. check_bows=1 enables a check for conditional probabilities that are smaller than the corresponding backoff probabilities. Such transitions should first be removed from the model with ngram -prune-lowprobs. no_empty_bo=1 Prevents empty paths through the PFSG resulting from transitions through the unigram backoff node.

add-pauses-to-pfsg replaces the word nodes in an input PFSG with sub-PFSGs that allow an optional pause before each word. It also inserts an optional pause following the last word in the sentence. A typical usage is

	make-ngram-pfsg ngram | \
	add-pauses-to-pfsg >final-pfsg
The result is a PFSG suitable for use in a speech recognizer. The option pauselast=1 switches the order of words and pause nodes in the sub-PFSGs; wordwrap=0 disables the insertion of sub-PFSGs altogether.

The options pause=pauseword and top_level_name=name allow changing the default names of the pause word and the top-level grammar, respectively. version=1 inserts a version line at the top of the output as required by the Nuance recognition system (see NUANCE COMPATIBILTY below). add-pauses-to-pfsg uses a heuristic to distinguish word nodes in the input PFSG from other nodes (NULL or sub-PFSGs). The option vocab=file lets one specify a vocabulary of word names to override these heuristics.

add-classes-to-pfsg extends an input PFSG with expansions for word classes, defined in classes. pfsg-file should contain a PFSG generated from the N-gram portion of a class N-gram model. A typical usage is thus

	make-ngram-pfsg class-ngram | \
	add-classes-to-pfsg classes=classes | \
	add-pauses-to-pfsg >final-pfsg

pfsg-from-ngram is a wrapper script that combines removal of low-probability N-grams, conversion to PFSG, and adding of optional pauses to create a PFSG for recognition.

make-nbest-pfsg converts an N-best list in nbest-format(5) into a PFSG which, when used in recognition, allows exactly the hypotheses contained in the N-best list. notree=1 creates separate PFSG nodes for all word instances; the default is to construct a prefix-tree structured PFSG. scale=S multiplies the total hypothesis scores by S; the default is 0, meaning that all hypotheses have identical probability in the PFSG. Three options, amw=A, lmw=L, and wtw=W, control the score weighting in N-best lists that contain separate acoustic and language model scores, setting the acoustic model weight to A, the language model weight to L, and the word transition weight to W.

pfsg-vocab extracts the vocabulary used in one or more PFSGs. htklat-vocab does the same for lattices in HTK standard lattice format. The quotes=1 option enables processing of HTK quotes.

pfsg-to-dot renders a PFSG in dot(1) format for subsequent layout, printing, etc. show_probs=1 includes transition probabilities in the output. show_logs=1 includes log (base 10) transition probabilities in the output. show_nums=1 includes node numbers in the output.

pfsg-to-fsm converts a finite-state network in pfsg-format(5) into an equivalent network in AT&T fsm(5) format. This involves moving output actions from nodes to transitions. If symbolfile=symbols is specified, the mapping from FSM output symbols is written to symbols for later use with the -i or -o options of fsm(1) tools. symbolic=1 preserves the word strings in the resulting FSA. scale=S scales the transition weights by a factor S; the default is -1 (to conform to the default FSM semiring). final_output=E forces the final FSA node to have output label S; this also forces creation of a unique final FSA node, which is otherwise unnecessary if the final node has a null output.

fsm-to-pfsg conversely transforms fsm(5) format into pfsg-format(5). This involves moving output actions from transitions to nodes, and generally requires an increase in the number of nodes. (The conversion is done such that pfsg-to-fsm and fsm-to-pfsg are exact inverses of each other.) The name parameter sets the name field of the output PFSG. transducer=1 indicates that the input is a transducer and that input:output pairs should be preserved in the PFSG. scale=S scales the transition weights by a factor S; the default is -1 (to conform to the default FSM semiring). map_epsilon=E specifies a string E that FSM epsilon symbols are to be mapped to.

classes-to-fsm converts a classes-format(5) file into a transducer in fsm(5) format, such that composing the transducer with an FSA encoding a class language model results in an FSA for the word language model. The word vocabulary needs to be given in file vocab. isymbolfile=isymbols and osymbolfile=osymbols allow saving the input and output symbol tables of the transducer for later use. symbolic=1 preserves the word strings in the resulting FSA.

The following commands show the creation of an FSA encoding the class N-gram grammar ``test.bo'' with vocabulary ``test.vocab'' and class expansions ``test.classes'':

	classes-to-fsm vocab=test.vocab symbolic=1 \
        	isymbolfile=CLASSES.inputs \
		osymbolfile=CLASSES.outputs \
		test.classes >CLASSES.fsm

	make-ngram-pfsg test.bo | \
	pfsg-to-fsm symbolic=1 >test.fsm
	fsmcompile -i CLASSES.inputs test.fsm  >test.fsmc

	fsmcompile -t -i CLASSES.inputs -o CLASSES.outputs \
		CLASSES.fsm >CLASSES.fsmc
	fsmcompose test.fsmc CLASSES.fsmc >result.fsmc

wlat-to-pfsg converts a word posterior lattice or mesh ("sausage") in wlat-format(5) into pfsg-format(5).

wlat-to-dot renders a wlat-format(5) word lattice in dot(1) format for subsequent layout, printing, etc. show_probs=1 includes node posterior probabilities in the output. show_nums=1 includes node indices in the output.

wlat-stats computes statistics of word posterior lattices, including the number of word hypotheses, the entropy (log base 10) of the sentence hypothesis set represented, and the posterior expected number of words. For word meshes that have been aligned with references, the 1-best and oracle lattice error rates are also computed.

NUANCE COMPATIBILITY

The Nuance recognizer (as of version 6.2) understands a variant of the PFSG format; hence the scripts above should be useful in building recognition systems for that recognizer.

A suitable PFSG can be generated from an N-gram backoff model in ARPA ngram-format(5) using the following command:

	ngram -debug 1 -order N -lm LM.bo -prune-lowprobs -write-lm - | \
	make-ngram-pfsg | \
	add-pauses-to-pfsg version=1 pauselast=1 pause=_pau_ top_level_name=.TOP_LEVEL >LM.pfsg
assuming the pause word in the dictionary is ``_pau_''. Certain restrictions on the naming of words (e.g., no hyphens are allowed) have to be respected.

The resulting PFSG can then be referenced in a Nuance grammar file, e.g.,

	.TOP [NGRAM_PFSG]
	NGRAM_PFSG:lm LM.pfsg

In newer Nuance versions the name for a non-emitting node was changed to NULNOD, and inter-word optional pauses are automatically added to the grammar. This means that the PFSG should be create using

	ngram -debug 1 -order N -lm LM.bo -prune-lowprobs -write-lm - | \
	make-ngram-pfsg version=1 top_level_name=.TOP_LEVEL null=NULNOD >LM.pfsg
The null=NULNOD option should also be passed to add-classes-to-pfsg.

Starting with version 8, Nuance supports N-gram LMs. However, you can still use SRILM to create LMs, as described above. The syntax for inclusion of a PFSG has changed to

	NGRAM_PFSG:slm LM.pfsg

Caveat: Compatibility with Nuance is purely due to historical circumstance and not supported.

SEE ALSO

lattice-tool(1), ngram(1), ngram-format(5), pfsg-format(5), wlat-format(5), nbest-format(5), classes-format(5), fsm(5), dot(1).

BUGS

make-ngram-pfsg should be reimplemented in C++ for speed and some size optimizations that require more global operations on the PFSG.

AUTHOR

Andreas Stolcke <stolcke@speech.sri.com>.
Copyright 1995-2005 SRI International