pfsg-format

NAME

pfsg-format - File format for Decipher(TM) probabilistic finite-state grammars

SYNOPSIS

name name
nodes N w1 ... wN
initial i
final f
transitions T
n1 n2 p
...

DESCRIPTION

Probabilistic finite-state grammars (PFSGs) are a form of finite-state automaton or transducer used by the SRI Decipher(TM) recognizer. PFSGs emit words (outputs) at the nodes, not on the arcs. Certain types of language models manipulated by SRILM can be translated into PFSGs for direct use in the recognizer.

Since it is usually fairly easy to convert between different finite-state network representations, PFSGs can serve as an intermediate format for the generation of other finite-state formats. For example, PFSGs can be converted to the AT&T fsm(5) format.

Each PFSGs is given a name. The name is significant if PFSGs are to be composed, in which case the name specifies the category it expands.

The nodes line gives the number of nodes in the state graph, followed by the word strings associated with each node. If the node represents a category expanded by another PFSG, then the name string of that PFSG is given here. The token NULL is special and designates the corresponding node as non-emitting. It is conventional to use lowercase strings for words, and uppercase for categories and PFSG names (``NULL'' must be avoided, of course).

The initial and final lines specify the start and end states of the grammar, respectively. Nodes are numbered starting at zero.

The transitions line gives the number of arcs (transitions) between states. It is followed by as many lines, each specifying one transition by its originating state n1, its target state n2, and the transition cost p. The transition cost is usually interpreted as 10000.5 times the natural logarithm of a probability, and should be normalized and scaled accordingly.

BUGS

File formats are a matter of taste ...
There is no way to specify words with embedded whitespace.

pfsg-format

NAME

SYNOPSIS

DESCRIPTION

SEE ALSO

BUGS

AUTHOR