Dialog-prosody


	Speech Technology and Research Laboratory

	People

	Current Research Activities

	Past Research Activities

	Publications

	SRILM

	Seminars

	Technologies for License

	In the News

	Career Opportunities

	Contact Us

	Information and Computing Sciences Division

Prosody for Dialog Systems

Investigators

Elizabeth Shriberg
Andreas Stolcke
Harry Bratt
Luciana Ferrer
Kemal Sönmez

Project Summary

SRI is investigating the use of prosody, the rhythm and melody of speech, in voice input to human-computer dialog systems. Current dialog systems often model prosody on the output side, to generate acceptable speech synthesis, but few systems use prosody on the input side, as this is a quite difficult task. Nevertheless, we believe the that pursuing this goal will be worth the effort, because prosody is one of the main cues that people use in conveying information to each other. Prosody can enhance spoken interaction with dialog systems in several important ways, for example by detecting user emotions (such as frustration or boredom), disfluencies and repairs, locating endpoints of user utterances, and distinguishing statements from questions.

SRI research in these areas was funded by the Intelligent Systems program at NASA's Ames Research Center, and by DARPA through the Reliable Omni-Present Automatic Recognition (ROAR) program. The project has also benefitted from two prior SRI projects: Hidden Event Modeling and Information Extraction from Speech, and the research on emotion involves an ongoing collaboration with ICSI.

Recent Publications and Presentations:

E. Shriberg, A. Stolcke, & J. Ang, Prosody-Based Detection of Annoyance and Frustration in Communicator Dialogs, Presentation at the DARPA ROAR Workshop, Orlando, FL, Nov. 30, 2001. (PowerPoint)
E. Shriberg & A. Stolcke, Harnessing Speech Prosody for Human-Computer Interaction, Presentation at the NASA Intelligent Systems Workshop, Pensacola, FL, Feb. 26, 2002. (PowerPoint)
J. Ang, R. Dhillon, A. Krupski, E. Shriberg, and A. Stolcke (2002), Prosody-Based Automatic Detection of Annoyance and Frustration in Human-Computer Dialog. Proc. Intl. Conf. on Spoken Language Processing, Denver, vol. 3, pp. 2037-2040. (PDF)
L. Ferrer, E. Shriberg, and A. Stolcke (2002), Is the Speaker Done Yet? Faster and More Accurate End-of-Utterance Detection Using Prosody in Human-Computer Dialog. Proc. Intl. Conf. on Spoken Language Processing, Denver, vol. 3, pp. 2061-2064. (PDF)
L. Ferrer, E. Shriberg, and A. Stolcke (2003), A prosody-based approach to end-of-utterance detection that does not require speech recognition. Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing, Hong Kong, vol. 1, pp. 608-611. (PDF)

About Us Vertical divider R&D Divisions Divider Careers Divider Newsroom Divider Contact Us
©2011 SRI International, 333 Ravenswood Avenue, Menlo Park, CA 94025-3493
SRI International is an independent, nonprofit corporation. Privacy policy
Last modified Aug 23, 2022