As shown above, the Cleanup Model as applied to filled pauses yields a higher perplexity overall than the baseline trigram model. This is largely attributable to poorer word probability estimates at locations immediately following a filled pause. In prior work Shriberg [9] observed that filled pauses tend to occur at linguistic segment (e.g., clause) boundaries. Since the standard LM test utterances are segmented according to acoustic criteria, filled pauses around linguistic boundaries can actually occur in the middle of acoustic utterance segments. At such locations, the assumptions of the Cleanup Model would be grossly violated, since the preceding words actually belong to a different linguistic segment. The standard model, on the other hand, can produce reasonable predictions, as the filled pause can serve as an indicator of the boundary.
To test this hypothesis we compared the perplexities of both models on a subset of the test data that was hand-annotated for linguistic segmentations, and that had been re-segmented accordingly (10250 words in 1325 segments). Specifically, we compared the perplexities of words following medial filled pauses, i.e., filled pauses not occurring as the first or last word in a linguistic segment. Results are shown in Table 5.
Table 5: Local perplexities after medial filled pauses
We see that the Cleanup Model is the better predictor for words following medial FPs, the reverse of the result for acoustically segmented utterances. That is, the cleanup assumption holds for medial FPs if one models utterances based on linguistic, rather than acoustic, segments.