LVCSR Publications

LVCSR Publications and Presentations

System Overviews

A. Stolcke, R. Gadde, A. Venkataraman, D. Vergyri, J. Zheng, & C. Wooters (2002), The SRI RT-02 Speech-to-Text System. Presentation at the NIST Rich Transcription Workshop, Vienna, VA, May 2002. (PDF)

A. Stolcke et al. (2001), The SRI March 2001 Hub-5 Conversational Speech Transcription System. Presentation at the NIST Large Vocabulary Conversational Speech Recognition Workshop, Linthicum Heights, MD, May 3, 2001.

A. Stolcke, H. Bratt, J. Butzberger, H. Franco, V. R. Rao Gadde, M. Plauche, C. Richey, E. Shriberg, K. Sonmez, F. Weng, J. Zheng (2000), The SRI March 2000 Hub-5 Conversational Speech Transcription System. Proc. NIST Speech Transcription Workshop, College Park, MD. (HTML, PDF)

Spontaneous Speech Modeling

J. Zheng, H. Franco, & A. Stolcke (2003), Modeling word-level rate-of-speech variation in large vocabulary conversational speech recognition. Speech Communication 41, 273-285. (PDF)

J. Zheng, H. Franco, & A. Stolcke (2000), Rate-dependent Acoustic Modeling for Large Vocabulary Conversational Speech Recognition. Proc. ISCA Tutorial and Research Workshop on Automatic Speech Recognition: Challenges for the new Millennium, pp. 145-149, Paris. (PDF)

J. Zheng, H. Franco, & A. Stolcke (2000), Rate-dependent Acoustic Modeling for Large Vocabulary Conversational Speech Recognition. Proc. NIST Speech Transcription Workshop, College Park, MD. (Preliminary version of paper above, HTML, PDF)

K. Sonmez, M. Plauche, E. Shriberg, & H. Franco (2000), Consonant discrimination in elicited and spontaneous speech: A case for signal-adaptive front ends in ASR. Proc. NIST Speech Transcription Workshop, College Park, MD. (HTML, PDF)

K. Sonmez, M. Plauche, E. Shriberg, & H. Franco (2000), Consonant discrimination in elicited and spontaneous speech: A case for signal-adaptive front ends in ASR. Proc. Intl. Conf. on Spoken Language Processing, vol. I, pp. 548-551, Beijing. (PDF)

A. Stolcke, E. Shriberg, R. Bates, M. Ostendorf, D. Hakkani, M. Plauche, G. Tur, & Y. Lu (1998), Automatic Detection of Sentence Boundaries and Disfluencies based on Recognized Words. Proc. Intl. Conf. on Spoken Language Processing, vol. 5, pp. 2247-2250, Sydney, Australia. (PDF)

E. Shriberg & A. Stolcke (1998), How far do speakers back up in their repairs? A quantitative model. Proc. Intl. Conf. on Spoken Language Processing, vol. 5, pp. 2183-2186, Sydney, Australia. (PDF)

E. Shriberg, R. Bates, & A. Stolcke (1997), A Prosody-Only Decision-Tree Model for Disfluency Detection. Proc. EUROSPEECH, vol. 5, pp. 2383-2386, Rhodes, Greece. (PDF)

A. Stolcke (1997), Modeling Linguistic Segment and Turn Boundaries for N-best Rescoring of Spontaneous Speech. Proc. EUROSPEECH, vol. 5, pp. 2779-2782, Rhodes, Greece. (PDF)

M. Weintraub, K. Taussig, K. Hunicke-Smith, & A. Snodgrass (1996), Effect of Speaking Style on LVCSR Performance, Proc. Intl. Conf. on Spoken Language Processing, Addendum, pp. 16-19, Philadelphia, PA. (PDF)

A. Stolcke & E. Shriberg (1996), Automatic linguistic segmentation of conversational speech. Proc. Intl. Conf. on Spoken Language Processing, vol. 2, pp. 1005-1008, Philadelphia, PA. (HTML, PDF)

E. Shriberg & A. Stolcke (1996), Word predictability after filled pauses: A corpus-based study. Proc. Intl. Conf. on Spoken Language Processing, vol. 3, pp. 1868-1871, Philadelphia, PA. (PDF)

A. Stolcke & E. Shriberg (1996), Statistical language modeling for speech disfluencies. Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing, vol. 1, pp. 405-409, Atlanta, GA. (HTML, PDF)

Duration and Prosody Models for LVCSR

D. Vergyri, A. Stolcke, V. R. R. Gadde, L. Ferrer, & E. Shriberg (2003), Prosodic Knowledge Sources for Automatic Speech Recognition. Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing, vol. 1, pp. 208-211, Hong Kong. (PDF)

V. R. Rao Gadde (2000), Modeling Word Durations. Proc. Intl. Conf. on Spoken Language Processing, vol. I, pp. 601-604, Beijing. (PDF)

V. R. Rao Gadde (2000), Modeling Word Duration for Better Speech Recognition. Proc. NIST Speech Transcription Workshop, College Park, MD. (HTML, PDF)

D. Hakkani-Tur, G. Tur, A. Stolcke, & E. Shriberg (1999), Combining Words and Prosody for Information Extraction from Speech. Proc. EUROSPEECH, vol. 5, pp. 1991-1994, Budapest. (PDF)

R. R. Gadde, E. Shriberg, A. Stolcke, D. Hakkani-Tur, & G. Tur (1999), Prosody Modeling for Speech Recognition and Understanding, Hub-5 Conversational Speech Understanding Workshop, Baltimore.

A. Stolcke, E. Shriberg, D. Hakkani-Tur, & G. Tur (1999), Modeling the Prosody of Hidden Events for Improved Word Recognition. Proc. EUROSPEECH, vol. 1, pp. 307-310, Budapest. (PDF)

Dialog Modeling for ASR

A. Stolcke, K. Ries, N. Coccaro, E. Shriberg, R. Bates, D. Jurafsky, P. Taylor, R. Martin, C. Van Ess-Dykema, & M. Meteer (2000), Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech, Computational Linguistics 26(3), 339-373. (PDF)

E. Shriberg, R. Bates, A. Stolcke, P. Taylor, D. Jurafsky, K. Ries, N. Coccaro, R. Martin, M. Meteer, & C. Van Ess-Dykema (1998), Can Prosody Aid the Automatic Classification of Dialog Acts in Conversational Speech? Language and Speech 41(3-4), 439-487. (PDF)

A. Stolcke, E. Shriberg, R. Bates, N. Coccaro, D. Jurafsky, R. Martin, M. Meteer, K. Ries, P. Taylor, & C. Van Ess-Dykema (1998), Dialog Act Modeling for Conversational Speech. In Applying Machine Learning to Discourse Processing. Papers from the 1998 AAAI Spring Symposium, Technical Report SS-98-01, pp. 98-105. AAAI Press, Menlo Park, CA. (PDF)

Discriminative Modeling

J. Zheng (2001), A New Derivation for MMIE Training. Presentation at the NIST Large Vocabulary Conversational Speech Recognition Workshop, Linthicum Heights, MD, May 3, 2001.

J. Zheng, J. Butzberger, H. Franco, and A. Stolcke (2001), Improved Maximum Mutual Information Estimation Training of Continuous Density HMMs. Proc. EUROSPEECH, vol. 2, pp. 679-682, Aalborg, Denmark. (PDF)

A. Stolcke (2001), Error Modeling and Unsupervised Language Modeling. Presentation at the NIST Large Vocabulary Conversational Speech Recognition Workshop, Linthicum Heights, MD, May 3, 2001.

F. Beaufays, M. Weintraub, & Y. Konig (1999), Discriminative Mixture Weight Estimation for Large Gaussian Mixture Models, Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing, vol. 1, pp. 337-340, Phoenix, AZ. (PDF)

Y. Konig, L. Heck, M. Weintraub, & K. Sonmez, (1998), Nonlinear Discriminant Feature Extraction for Robust Text-Independent Speaker Recognition, Proc. RLA2C-ESCA Speaker Recognition and its Commercial and Forensic Applications, pp. 72-75, Avignon, France. (PDF)

L. Heck & Y. Konig (1998), Discriminative Training of Minimum Cost Speaker Verification Systems, Proc. RLA2C-ESCA Speaker Recognition and its Commercial and Forensic Applications, pp. 93-96, Avignon, France. (PDF)

F. Beaufays, M. Weintraub, & Y. Konig (1998), DYNAMO: An Algorithm for Dynamic Acoustic Modeling, Proc. DARPA Broadcast News Transcription and Understanding Workshop, pp. 311-316, Landsdowne, VA. (HTML, PDF)

Wordspotting and Confidence Measures

L. Mangu, E. Brill, & A. Stolcke (2000), Finding consensus in speech recognition: word error minimization and other applications of confusion networks, Computer Speech and Language 14(4), 373-400. (PDF)

L. Mangu, E. Brill, & A. Stolcke (1999), Finding Consensus Among Words: Lattice-based Word Error Minimization. Proc. EUROSPEECH, vol. 1, 495-498, Budapest. (PDF)

A. Stolcke, Y. Konig, & M. Weintraub (1997), Explicit Word Error Minimization in N-best List Rescoring. Proc. EUROSPEECH, vol. 1, pp. 163-166, Rhodes, Greece. (PDF)

M. Weintraub, F. Beaufays, Z. Rivlin, Y. Konig, & A. Stolcke (1997), Neural-Network Based Measures of Confidence for Word Recognition. Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing, vol. 2, pp. 887-890, Munich. (PDF)

M. Weintraub (1995), LVCSR Log-Likelihood Ratio Rescoring for Keyword Spotting, Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing, vol. 1, pp. 297-300, Detroit. (PDF)

LVCSR Summer Research Workshops

Members of SRI's LVCSR team have played an active role in several of the Johns Hopkins Summer Research Workshops on LVCSR.

LM'95

M. Weintraub, Y. Aksu, S. Dharanipragada, S. Khudanpur, H. Ney, J. Prange, A. Stolcke, F. Jelinek, & E. Shriberg (1996), Fast Training and Portability, 1995 Language Modeling Summer Research Workshop Technical Report, Research Note 1, Center for Language and Speech Processing, Johns Hopkins University, Baltimore.

R. Rosenfeld, R. Agarwal, B. Byrne, R. Iyer, M. Liberman, E. Shriberg, J. Unverferth, D. Vergyri, & E. Vidal (1996), Error Analysis and Disfluency Modeling in the Switchboard Domain, Proc. Intl. Conf. on Spoken Language Processing, Addendum, p. 15, Philadelphia, PA.

WS'96

A. Stolcke, C. Chelba, D. Engle, V. Jimenez, L. Mangu, H. Printz, E. Ristad, R. Rosenfeld, D. Wu, F. Jelinek , & S. Khudanpur (1997) Dependency Language Modeling, 1996 Large Vocabulary Continuous Speech Recognition Summer Research Workshop Technical Report, Research Note 24, Center for Language and Speech Processing, Johns Hopkins University, Baltimore.

M. Ostendorf, B. Byrne, M. Bacchiani, M. Finke, A. Gunawardana, K. Ross, S. Roweis, E. Shriberg, D. Talkin, A. Waibel, B. Wheatley, & T. Zeppenfeld (1997), Modeling Systematic Variations in Pronunciation via a Language-Dependent Hidden Speaking Mode, 1996 LVCSR Summer Research Workshop Technical Report, Research Note 24, Center for Language and Speech Processing, Johns Hopkins University, Baltimore.

M. Weintraub, E. Fosler, C. Galles, Y.-H. Kao, S. Khundapur, M. Saraclar, & S. Wegmann (1997), Automatic Learning of Word Pronunciations from Data, 1996 Large Vocabulary Continuous Speech Recognition Summer Research Workshop Technical Report, Research Note 24, Center for Language and Speech Processing, Johns Hopkins University, Baltimore.

C. Chelba, D. Engle, F. Jelinek, V. Jimenez, S. Khudanpur, L. Mangu, H. Printz, E. Ristad, R. Rosenfeld, A. Stolcke, D. Wu (1997), Structure and Performance of a Dependency Language Model. Proc. EUROSPEECH, vol. 5, pp. 2775-2778, Rhodes, Greece. (PDF)

WS'97

D. Jurafsky, R. Bates, N. Coccaro, R. Martin, M. Meteer, K. Ries, E. Shriberg, A. Stolcke, Paul Taylor, & C. Van Ess-Dykema (1998), Discourse Language Modeling, 1997 Large Vocabulary Continuous Speech Recognition Summer Research Workshop Technical Report, Research Note 30, Center for Language and Speech Processing, Johns Hopkins University, Baltimore.

D. Jurafsky, R. Bates, N. Coccaro, R. Martin, M. Meteer, K. Ries, E. Shriberg, A. Stolcke, Paul Taylor, & C. Van Ess-Dykema (1997), Automatic Detection of Discourse Structure for Speech Recognition and Understanding. Proc. IEEE Workshop on Speech Recognition and Understanding, pp. 88-95, Santa Barbara, CA. (PDF)