The central assumption incorporated in our DF language model is that probability estimates for words after a DF are more accurate if conditioned on the intended fluent word sequence. A secondary assumption is that DFs themselves can be modeled as word-like events, each having a probability conditioned on the context. A standard language model, by contrast, would look only at the surface string of words and assign word probabilities in a strictly sequential manner.
Because of the central assumption, we call our DF model the `Cleanup Model.' It is implemented as a standard backoff trigram model with the following three modifications to account for DFs.
For example, the probability estimate for ``WANT'' following ``BECAUSE I I'' would be
where
denotes a repetition event.
The repeated ``I'' is deleted from the context.
By representing DFs simply as another type of N-gram event, we allow DFs to be conditioned on specific lexical contexts, so that simple word-based regularities in their distribution can be captured. Furthermore, because of its simple N-gram character, the model does not embody specific assumptions or constraints about the distribution of DF events.