Hybrid Neural Network/Hidden Markov Speech Recognition

Investigators

Victor Abrash
Michael Cohen
Horacio Franco

Project Summary

This was a DARPA-funded project in the Speech Technology and Research Laboratory at SRI International, which ended in 1997.

Most current leading edge speech recognition systems are based on an approach called hidden Markov modeling (HMM). Traditional HMMs make some false assumptions, e.g., that speech features occurring at one time are uncorrelated, and independent of other recently occuring features (even ten milliseconds earlier). SRI has developed a hybrid neural network/hidden Markov model speech recognizer that improves the accuracy of traditional HMM by modeling correlations among simultaneously occuring speech features and between current and recent features. More recent work involved modeling longer-term correlations and developing speaker adaptation approaches within this new framework.

Representative Publications

H. Sedarat, R. Khadem, H. Franco (1998), Simplified Neural Network Architectures in a Hybrid system for Isolated Speech Recognition, Submitted to the International Conference on Acoustics, Speech, and Signal Processing, Seattle, WA.

H. Franco, V. Digalakis (1997), Correlation Modeling in a Hybrid Neural Network Hidden Markov Model Speech Recognizer, Submitted to IEEE Transactions on Speech and Audio Processing.

V. Abrash (1997), Mixture Input Transformations for Adaptation of Hybrid Connectionist Speech Recognizers, Proceedings of the 5th European Conference of Speech Communication and Technology, Rhodes, Greece. (pdf format)

J. Goldberger, D. Burshtein, H. Franco (1997), Segmental Modeling Using a Continuous Mixture of Non-Parametric Models, Proceedings of the 5th European Conference of Speech Communication and Technology, Rhodes, Greece.

H. Franco, M. Weintraub, M. Cohen (1997), Context Modeling in a Hybrid HMM-Neural Net Speech Recognition System Proceedings of the International Conference on Neural Networks, Houston, TX.

V. Abrash, A. Sankar, H. Franco, M. Cohen (1996), Acoustic Adaptation using Nonlinear Transformations of HMM Parameters, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing Atlanta, GA.

V. Abrash, H. Franco, A. Sankar, M. Cohen (1995), Connectionist Speaker Normalization and Adaptation, Proceedings of the 4th European Conference of Speech Communication and Technology, Madrid, Spain.

H. Franco, V. Digalakis (1995), Temporal Correlation Modeling in a Hybrid Neural Network/Hidden MArkov Model Speech Recognizer, Proceedings of the 4th European Conference of Speech Communication and Technology, Madrid, Spain.

M. Weintraub, V. Abrash, H. Franco, M. Cohen (1995), "SRI Telespot, An LVCSR Telephone Transcription and Wordspotting System, Version using Multi-Layer Perceptrons", SRI Technical Report.

H. Franco, V. Abrash, M. Cohen (1995), "Neural Net Trainer for SRI's Hybrid HMM/MLP Speech Recognition System", SRI Technical Report.

H. Franco, V. Abrash, M. Cohen, A. Sankar, M. Weintraub (1994), Hybrid HMM/MLP Speech Recognition, ARPA Artificial Neural Network Technology 1994 Program Review, December 6-8, Key West, FL.

H. Franco, M. Cohen, N. Morgan, D. Rumelhart, V. Abrash (1994), Context-Dependent Connectionist Probabilty Estimatation in a Hybrid Hidden Markov Model-Neural Net Speech Recognition System, Computer Speech & Language, 8, pg. 211-222.

V. Abrash, M. Cohen, H. Franco, and I. Arima (1994), Incorporating Linguistic Features in a Hybrid HMM/MLP Speech Recognizer, Proceedings International Conference on Acoustics, Speech, and Signal Processing, Adelaide, Australia.

H. Franco (1993), Implementing a Weight Elimination and Pruning Scheme for the Hybrid NN/HMM Speech Recognition System, SRI Technical Report.

Y. Konig, N. Morgan, C. Wooters, V. Abrash, M. Cohen, and H. Franco (1993), Modeling Consistency in a Speaker Independent Continuous Speech Recognition System, In Hanson. J.S., Cowan. J.D., and Giles. C.L., editors, Advances in Neural Information Processing Systems 5, San Mateo,CA, Morgan Kaufman.

M. Cohen, H. Franco, N. Morgan, D. Rumelhart, V. Abrash (1993), Context-Dependent Multiple Distribution Phonetic Modeling with MLPs, Advances in Neural Information Processing Systems 5, Hanson, et al., (eds.), Morgan Kaufmann Publishers, Inc.

M. Cohen, H. Franco, N. Morgan, D. Rumelhart, V. Abrash (1992), Hybrid Neural Network/Hidden Markov Model Continuous Speech Recognition, Proceedings of the International Conference on Spoken Language Processing, Banff, Canada.

V. Abrash, H. Franco, M. Cohen, N. Morgan, Y. Konig (1992), Connectionist Gender Adaptation in a Hybrid Neural Network / Hidden Markov Model Speech Recognition System, Proceedings International Conference on Spoken Language Processing, Banff, Canada.

M. Cohen, H. Franco, N. Morgan, D. Rumelhart, V. Abrash (1992), Multiple-State Context-Dependent Phonetic Modeling with MLPs, Proceedings of Speech Research Symposium XII, Baltimore, MD.

H. Franco, M. Cohen, N. Morgan, D. Rumelhart, V. Abrash (1992), Context-Dependent Connectionist Probability Estimation in a Hybrid HMM-Neural Net Speech Recognition System, Proceedings International Joint Conference on Neural Networks, Beijing, China.

M. Cohen, H. Franco, N. Morgan, D. Rumelhart, V. Abrash, Y. Konig (1992), Integrating Neural Networks into Computer Speech Recognition Systems, Proceedings GOMAC-92.

M. Cohen, H. Franco, N. Morgan, D. Rumelhart, V. Abrash, Y. Konig (1992), Combining Neural Networks and Hidden Markov Models, Proceedings of the DARPA Speech and Natural Language Workshop, Harriman, NY.

D. Rumelhart, M. Cohen, H. Franco, V. Abrash (1991), Supplementing HMM Continuous Speech Recognition with Neural Network Word Spotting, Proceedings of the Speech Research Symposium XI, Baltimore, MD.

Project Publications (LaTeX format, no links to online papers)

For more information on this project, contact Horacio Franco () or Victor Abrash ()