next up previous
Next: Estimation Up: The Model Previous: The Cleanup Model

Probability computation

To account for the hidden DF events potentially occurring between any two words, a forward computation is carried out to find the probability of a sentence prefix tex2html_wrap_inline412 . Conditional word probabilities are then computed as

displaymath408

If the underlying N-gram model is a trigram, it is sufficient to keep eight states for each word position, according to whether the DF prior to tex2html_wrap_inline414 was NODF (none), FP (filled pause), SDEL, DEL1, DEL2, REP1, REP2, or the second position after a REP2 event. To illustrate, the partial computation involving just the NODF and REP1 states is shown here.

eqnarray68

where tex2html_wrap_inline416 if tex2html_wrap_inline418 , and 0 otherwise. Trigram probabilities are denoted by tex2html_wrap_inline422 ; these are obtained through the usual backoff procedure [5]. The total prefix probability is then computed as

displaymath409

where X ranges over the hidden states representing the disfluency types (including NODF).



Andreas Stolcke
Fri Jun 28 19:31:43 PDT 1996