Speech Technology and Research Laboratory
  Research Activities
  Past Research Activities
  Technologies for License
  In the News
  Career Opportunities
  Contact Us
  Information and Computing Sciences Division
SpacerAbout UsDividerR and D DivisionsDividerCareersDividerNewsroomDividerContact UsDividerSRI HomeSpacer

  SRI Logo

Speech Technology and Research (STAR) Laboratory Seminar Series

Past talks: 2006

  • Speaker: Fei Sha, Computer Science Division, UC Berkeley
    Time: Wednesday, Oct. 25, 2006, 10:30 am
    Venue: STAR Lab, EJ 124
    Title: Large margin approaches for automatic speech recognition


    Most modern speech recognizers are based on continuous-density hidden Markov models (CD-HMMs). The hidden states in these CD-HMMs model different phonemes or sub-phonetic elements, while the observations model cepstral feature vectors. Distributions of cepstral feature vectors are most often represented by Gaussian mixture models (GMMs). The accuracy of the recognizer depends critically on the careful estimation of GMM parameters. The most basic approach involves maximum likelihood (ML) estimation. The main attraction of the EM algorithm is that no free parameters need to be tuned for its convergence. However, in general, maximum likelihood training criteria do not optimize classification error rates directly. In many cases, alternative training criteria which track error rates more explicitly, tend to perform better. Two well-known examples are discriminative methods like conditional maximum likelihood (CML)/maximum mutual information (MMI) and minimum classification errors (MCE). In this talk, I will present a new framework of discriminative training called large margin hidden Markov models. Inspired by the principles of large margin, a well-studied statistical learning framework, the large margin HMMs parameter estimation techniques learn parameters by separating correct labeling sequence from incorrect labeling sequences by a large margin. The large margin is directly proportional to the number of labeling mistakes. The training is cast as a convex optimization which maximizes the margins. I will describe the framework and the training algorithm of the large margin HMMs. I will also present experimental results of applying this training criteria to building phoneme recognizers. We found significantly improved phoneme recognition accuracy on the TIMIT speech corpus. We also systematically compared to other leading discriminative training methods. We found greater error reduction from baseline systems than both CML and MCE. Joint work with Dr. Lawrence K. Saul (U. of California, San Diego).


    Fei Sha and Lawrence K. Saul (2006). Large margin Gaussian mixture models for automatic speech recognition. To appear in Neural Information Processing Systems Conference 2006 (Vancouver, CA).
    Fei Sha and Lawrence K. Saul (2006). Large margin Gaussian mixture modeling for phonetic classification and recognition. Proc. of ICASSP 2006, Tolouse, France.
    Fei Sha and Lawrence K. Saul (2007). Comparison of large margin training to other discriminative methods for phonetic recognition by hidden Markov models. Submitted to ICASSP 2007.


About Us  Vertical divider  R&D Divisions  Divider  Careers  Divider  Newsroom  Divider  Contact Us
©2011 SRI International, 333 Ravenswood Avenue, Menlo Park, CA 94025-3493
SRI International is an independent, nonprofit corporation. Privacy policy

Last modified Mar 08, 2007