next up previous
Next: node1.html

Automatic Linguistic Segmentation
of Conversational Speech

Andreas Stolcke - Elizabeth Shriberg
Speech Technology and Research Laboratory
SRI International, Menlo Park, CA 94025
stolcke@speech.sri.com
ees@speech.sri.com

Abstract:

As speech recognition moves toward more unconstrained domains such as conversational speech, we encounter a need to be able to segment (or resegment) waveforms and recognizer output into linguistically meaningful units, such a sentences. Toward this end, we present a simple automatic segmenter of transcripts based on N-gram language modeling. We also study the relevance of several word-level features for segmentation performance. Using only word-level information, we achieve 85% recall and 70% precision on linguistic boundary detection.





Andreas Stolcke
Fri Jun 28 19:46:11 PDT 1996