next up previous
Next: Results and Analysis Up: The Model Previous: Probability computation

Estimation

The backoff N-gram probabilities in the model are estimated from N-gram counts, including counts of the DF events. We used standard Good-Turing discounting in the backoff for both baseline and DF trigram models. For experiments reported here involving hidden DF events, we used a subset of the Switchboard corpus that was hand-annotated for disfluencies as well as for linguistic segments.gif In the absence of hand-annotated training data, an iterative reestimation (EM) algorithm could be used to estimate the N-gram probabilities for hidden DF events.

When counting N-grams for the DF model, the same context modifications used in the DF cleanup operations must be performed on the training data. For example, the word sequence

     <s> SHE UH GOT REAL LUCKY
is counted as having the following trigrams:
     <s> SHE UH          <s> SHE GOT
     SHE GOT REAL        GOT REAL LUCKY
Note that the trigrams
     SHE UH GOT          UH GOT REAL
which would be generated for a standard trigram LM are not generated for the DF model.

Because DF and word events are represented uniformly as N-grams in the model, the standard estimation procedure will normalize DF and non-DF event probabilities. This is a convenient simplification over alternative approaches in which DFs are modeled separately from the fluent word sequences.



Andreas Stolcke
Fri Jun 28 19:31:43 PDT 1996