The backoff N-gram probabilities in the model are estimated from N-gram
counts, including counts of the DF events.
We used standard Good-Turing discounting in the backoff for both baseline
and DF trigram models.
For experiments
reported here involving hidden DF events, we used a subset of
the Switchboard corpus that was hand-annotated for disfluencies as well
as for linguistic segments.
In the absence of hand-annotated training
data, an iterative reestimation (EM) algorithm could be used to estimate
the N-gram probabilities for hidden DF events.
When counting N-grams for the DF model, the same context modifications used in the DF cleanup operations must be performed on the training data. For example, the word sequence
<s> SHE UH GOT REAL LUCKYis counted as having the following trigrams:
<s> SHE UH <s> SHE GOT
SHE GOT REAL GOT REAL LUCKY
Note that the trigrams
SHE UH GOT UH GOT REALwhich would be generated for a standard trigram LM are not generated for the DF model.
Because DF and word events are represented uniformly as N-grams in the model, the standard estimation procedure will normalize DF and non-DF event probabilities. This is a convenient simplification over alternative approaches in which DFs are modeled separately from the fluent word sequences.