SRI's speaker independent, continuous-speech recognition system
(DECIPHER(TM)) is based on hidden Markov models (HMM). Optimum HMM state
clustering, Gaussian mixture modeling, and statistical language
modeling are combined to give state-of-the art performance. In
addition, robustness to different acoustic environments such as
channels, noise, and nonnative speakers is achieved through
noise-robust feature extraction and acoustic adaptation technology.
New research ideas are implemented within the DECIPHER software so as
to be available across different projects. Results of such research
are incorporated into applications through the use the Nuance
recognizer, developed by Nuance Communications, a spin-off from SRI's
Speech Technology and Research (STAR) Laboratory. Close collaboration
between the STAR laboratory and Nuance Communications facilitates the
quick integration of novel research conducted by STAR laboratory
researchers into the Nuance recognizer.
We are working on the following areas:
Recognition accuracy will be improved by robust training. This will be
accomplished by developing techniques to determine the number of model
parameters that can be robustly estimated. New parameter sharing
techniques will be developed that will result in fewer and more
robustly estimated parameters. Methods based on our previous work in
acoustic adaptation will be used to robustly train large recognition
Recognition Speed and Memory
Recognition speed will be increased and memory decreased by developing
methods to remove the large amount of redundancy that exists in
current modeling techniques. This will result in a significant
speed-up, while decreasing the number of model parameters and
increasing recognition accuracy.
Acoustic adaptation algorithms will be developed to port speech models
to new unseen domains with only a small amount of target-specific
data. For adaptation, the information from the small amount of
target-specific data will be augmented by using correlations with
large amount of available training data.
A statistical class-based language modeling approach will be studied
for named-entity recognition. The classes in the grammar will
correspond to the entities to be recognized. This will allow both the
word string and the named entities to be simultaneously recognized.