next up previous
Next: Repetitions Up: Analysis by DF type Previous: Analysis by DF type

Filled pauses

 

table103


Table 2:   Local perplexities at filled pause positions.

A trigram model with special DF modeling for filled pauses only was trained on 1.8 million words of acoustically segmented Switchboard transcripts. The test set consisted of 1861 acoustic segments containing 17,500 words. Table 2 shows the perplexities of the baseline and FP models for the FPs themselves (UH, UM), the words after (UH+1, UM+1), and two words after (UH+2, UM+2). The surprising result is that deleting FPs from N-gram contexts does not help the LM; it actually significantly increases the perplexity of the word following the FP. That is, on average, the FP itself is the best predictor of the following word, not the context preceding the FP. This conclusion is also supported by the corresponding bigram perplexities, which exhibit the same pattern. Apparently, FPs correlate strongly with certain lexical choices or syntactic structures, and thus give useful information regarding their neighbors to the right. We investigate this question further in Section 3.3.



Andreas Stolcke
Fri Jun 28 19:31:43 PDT 1996