| |
Prosody for Automatic Speech and Language Processing
This Web page is a companion to Elizabeth Shriberg's 2008 keynote talks
at the TSD and COLING conferences.
It contains links to some useful resources for researchers who may wish to
explore prosody in their work.
Presentations
The presentations differ in some details.
Tutorials
-
E. Shriberg,
Computational Modeling of Prosody for Spontaneous Speech,
given at Academia Sinica, Taiwan, Oct. 2007.
Note this tutorial focusses on methods used in early SRI work,
and does not cover the full range of approaches discussed in the
presentation above.
-
A description of ToBI,
a system for labeling pitch accents and prosodic boundaries.
Note that this approach is NOT used in our work on prosody modeling,
but is useful as background.
-
An MIT course on prosodic labeling.
Research Papers
A selection of research papers covering various aspects of prosody modeling
and its applications in automatic speech processing.
Prosody in linguistics and communication
-
D. Hirst and A. Di Cristo (1998),
Intonation Systems. A Survey of Twenty Languages,
Cambridge University Press.
Chapter "A survey of intonation systems", pp. 1-44.
-
J. Hirschberg (2002),
Communication and Prosody: Functional Aspects of Prosody,
Speech Communication: Special Issue on Dialogue and Prosody 36,
pp. 31-43.
-
S. G. Nooteboom (1997),
The prosody of speech: Melody and rhythm.
In W J. Hardcastle & J. Laver (eds.),
The Handbook of Phonetic Sciences,
Oxford: Blackwell Publishers, pp. 640-673.
-
A. Cutler, D. Dahan, and W. van Donselaar (1997),
Prosody in the comprehension of spoken language: A literature review,
Language and Speech 40(2), pp. 141-201.
Automatic prosody modeling and labeling
-
P. Taylor (2000),
Analysis and synthesis of intonation using the Tilt model,
J. Acoustical Society of America 107(3), pp. 1697-1714.
-
N. Campbell (1993),
Automatic detection of prosodic boundaries in speech,
Speech Communication 13(3-4), pp. 343-354.
-
M. Ostendorf, P. J. Price, and S. Shattuck-Hufnagel (1993),
Combining statistical and linguistic methods for modeling prosody,
In ESCA Workshop on Prosody, Lund, Sweden, pp. 272-275.
-
Y. Sagisaka, N. Campbell, and N. Higuchi (eds.) (1996),
Computing prosody: computational models for processing spontaneous
speech,
New York: Springer.
-
E. Shriberg and A. Stolcke (2004),
Prosody Modeling for Automatic Speech Recognition and Understanding,
in
M. Johnson, S. Khudanpur, M. Ostendorf and R. Rosenfeld (eds.),
Mathematical Foundations of Speech and Language Processing,
IMA Volumes in Mathematics and Its Applications, Vol. 138,
Springer-Verlag, New York, pp. 105-114.
-
H. Fujisaki (2004)
Information, prosody, and modeling - with emphasis on tonal features of speech,
Proc. Speech Prosody, Nara, Japan, pp. 1-10.
-
S. Ananthakrishnan and S. Narayanan (2008)
Automatic Prosody Labeling using Acoustic, Lexical, and Syntactic Evidence,
IEEE Transactions on Speech, Audio and Language Processing 16(1),
216-228.
Recognition of emotion, deception, charisma, etc.
-
R. Cowie and E. Douglas-Cowie (1999),
Changing Emotional Tone in Dialogue and its Prosodic Correlates,
Proc. ESCA Workshop on Dialogue and Prosody, Veldhoven, The Netherlands, p. 41-46.
-
R. Cowie, E. Douglas-Cowie, N. Tsapatsoulis, G. Votsis, S. Kollias,
W. Fellenz, J. Taylor (2001),
Emotion Recognition in Human-Computer Interaction,
IEEE Signal Processing Magazine 18(1), pp. 32-80.
-
A. Batliner , K. Fischer , R. Huber , J. Spilker, and E. Nöth (2003),
How to find trouble in communication,
Speech Communication 40(1-2), pp. 117-143.
-
C. M. Lee and S. Narayanan (2005),
Towards detecting emotions in spoken dialogs,
IEEE Transactions on Speech and Audio Processing 13(2), pp. 293-303.
-
J. Ang, R. Dhillon, A. Krupski, E. Shriberg, and A. Stolcke (2002),
Prosody-based automatic detection of annoyance
and frustration in human-computer dialog,
Proc. of ICSLP, Denver, Colo., pp. 2037-2039.
-
B. Wrede and E. Shriberg (2003),
Spotting "Hotspots" in Meetings: Human Judgments and Prosodic Cues,
Proc. Eurospeech, Geneva, pp. 2805-2808.
-
J. Liscombe, J. Venditti, and J. Hirschberg (2005),
Detecting Certainness in Spoken Tutorial Dialogues,
Proc. Interspeech, Lisbon, pp. 1837-1840.
-
J. Hirschberg, S. Benus, J. M. Brenier, F. Enos, S. Friedman, S. Gilman,
C. Girand, M. Graciarena, A. Kathol, L. Michaelis, B. Pellom, E. Shriberg,
A. Stolcke (2005),
Distinguishing Deceptive from Non-Deceptive Speech,
Proc. Interspeech, Lisbon, 1833-1836.
-
A. Rosenberg and J. Hirschberg (2005),
Acoustic/Prosodic and Lexical Correlates of Charismatic Speech,
Proc. Interspeech, Lisbon, 513-516.
-
K. Forbes-Riley and D. Litman (2004),
Predicting emotion in spoken dialogue from multiple knowledge sources,
Proc. 4th Meeting of HLT/NAACL, Boston, pp. 201-208.
-
F. Biadsy, J. Hirschberg, A. Rosenberg, and W. Dakka (2007),
Comparing American and Palestinian Perceptions of Charisma Using
Acoustic-Prosodic and Lexical Analysis,
Proc. Interspeech 2007, Antwerp, pp. 2221-2224.
-
J. Venditti, J. Liscombe, and J. Hirschberg (2006),
Intonational Cues to Student Questions in Tutoring Dialogs,
Proc. Interspeech, Pittsburgh, pp. 1-4.
-
M. Graciarena, E. Shriberg, A. Stolcke, F. Enos, J. Hirschberg,
and S. Kajarekar (2006),
Combining Prosodic Lexical and Cepstral Systems for Deceptive Speech Detection,
Proc. IEEE ICASSP, Toulouse, pp.1033-1036.
-
Marc Schröder (2004),
Emotions for user friendly multimodal interfaces,
HUMAINE IST Event, Den Haag.
Prosody for dialog act tagging
-
E. Shriberg, R. Bates, A. Stolcke, P. Taylor, D. Jurafsky, K. Ries,
N. Coccaro, R. Martin, M. Meteer, and C. Van Ess-Dykema (1998),
Can Prosody Aid the Automatic Classification of Dialog Acts in
Conversational Speech?,
Language and Speech 41(3-4), pp. 439-487.
-
S. Bhagat, H. Carvey, and E. Shriberg (2003),
Automatically Generated Prosodic Cues to Lexically Ambiguous Dialog Acts
in Multiparty Meetings
Proc. International Congress of Phonetic Sciences, Barcelona.
-
F. Yang, G. Tur, and E. Shriberg (2008),
Exploiting dialogue act tagging and prosodic information for action item
identification,
Proc. IEEE ICASSP, Las Vegas, pp. 4941-4944.
Prosody for sentence and topic segmentation
-
E. Shriberg, A. Stolcke, D. Hakkani-Tur, and G. Tur (2000),
Prosody-Based Automatic Segmentation of Speech into Sentences and Topics,
Speech Communication 32(1-2), pp. 127-154.
-
G. Tur, D. Hakkani-Tur, A. Stolcke, and E. Shriberg, E. (2001).
Integrating Prosodic and Lexical Cues for Automatic Topic Segmentation,
Computational Linguistics 27(1), pp. 31-57.
-
L. Ferrer, E. Shriberg, and A. Stolcke (2002),
Is the Speaker Done Yet? Faster and More Accurate End-of-Utterance Detection,
Proc. ICSLP, Denver, pp. 2061-2064.
-
S. Cuendet, D. Hakkani-Tur, E. Shriberg, J. Fung, and B. Favre (2007),
Cross-Genre Feature Comparisons for Spoken Sentence Segmentation,
International Journal of Semantic Computing 1(3), pp. 335-346.
-
J. Fung, D. Hakkani-Tur, M. Magimai-Doss, E. Shriberg, S. Cuendet, and
N. Mirghafori (2007),
Prosodic Features and Feature Selection for Multi-Lingual Sentence
Segmentation ,
Proceedings Interspeech, Antwerp, pp. 2585-2588.
-
Y. Liu, E. Shriberg, A. Stolcke, D. Hillard, M. Ostendorf, and M. Harper (2006),
Enriching Speech Recognition with Automatic Detection of Sentence Boundaries
and Disfluencies,
IEEE Trans. Audio, Speech and Language Processing 14(5), pp. 1526-1540.
-
J. Kolar, E. Shriberg, and Y. Liu (2006),
Using Prosody for Automatic Sentence Segmentation of Multi-Party Meetings,
Proc. International Conference on Text, Speech, and Dialogue,
Czech Republic.
Speaker-specific prosody modeling
Prosody for speaker recognition
-
L. Ferrer, H. Bratt, V. R. R. Gadde, S. Kajarekar, E. Shriberg, K. Sonmez,
A. Stolcke, and A. Venkataraman (2003),
Modeling duration patterns for speaker recognition,
Proc. Eurospeech, Geneva, pp. 2017-2020.
-
A. G. Adami, R. Mihaescu, D. A. Reynolds, and J. J. Godfrey (2003),
Modeling Prosodic Dynamics for Speaker Recognition,
Proc. IEEE ICASSP.
-
E. Shriberg, L. Ferrer, S. Kajarekar, A. Venkataraman, and A. Stolcke (2005),
Modeling Prosodic Feature Sequences for Speaker Recognition,
Speech Communication 46(3-4), pp. 455-472.
-
E. E. Shriberg (2007),
Higher Level Features in Speaker Recognition,
in C. Müller (ed.),
Speaker Classification I,
Volume 4343 of Lecture Notes in Computer Science/Artificial Intelligence,
Springer: Heidelberg/Berlin/New York, pp. 241-259.
-
E. Shriberg and L. Ferrer (2007),
A Text-Constrained Prosodic System for Speaker Verification,
Proc. Interspeech, Antwerp, pp. 1226-1229.
-
L. Ferrer, E. Shriberg, S. Kajarekar, and K. Sonmez (2007),
Parameterization of Prosodic Feature Distributions for SVM Modeling in
Speaker Recognition,
Proc. IEEE ICASSP, Honolulu, vol. 4, pp. 233-236.
-
N. Dehak, P. Kenny, and P. Dumouchel (2007),
Continuous Prosodic Features and Formant Modeling with Joint Factor Analysis
for Speaker Verification,
Proc. Interspeech, Antwerp, pp. 1234-1237.
Prosody for nonnativeness detection
Conferences
Recent meetings sponsored by the
ISCA Special Interest Group on Speech Prosody
have their proceedings online.
Software
Some useful software packages for extracting prosodic features:
- Praat
- Several packages such as WaveSurfer, Snack and ESPS can be obtained from
KTH.
Snack contains a pitch tracker.
Projects
Some sample projects and systems that use prosody in a significant way:
For comments or additions to this page, please contact
.
Back to Liz Shriberg's home page.
|
|