|
LVCSR
LVCSR: Large Vocabulary Conversational Speech Recognition
Investigators
Andreas Stolcke (PI)
Harry Bratt
Horacio Franco
Ramana Rao Gadde
Colleen Richey
Elizabeth Shriberg
Kemal Sönmez
Dimitra Vergyri
Former collaborators
Mitch Weintraub
Françoise Beaufays
Yochai Konig
Ananth Sankar
Project Summary
The goal of the LVCSR projects is to develop all aspects of
speech recognition in the domain of spontaneous, human-human
conversational speech (as opposed to planned, read, or human-machine
dialog).
This includes feature extraction, acoustic modeling, language modeling,
and speech understanding.
Most of our research uses the Switchboard and CallHome/CallFriend
conversational telephone speech corpora.
Research Efforts
We are presently focusing on a number of fundamental research
problems that have to be solved in order to attain the ultimate goal
of conversational speech understanding.
- Front end/Feature extraction
We seek to develop new front-end features that are automatically
trained to enhance discrimination for the purpose of enhanced
word recognition. Research is carried out in collaboration with
the Speaker Recognition project.
- Discriminative modeling
We are exploring new training methods for acoustic and
language models that enhance recognition accuracy by explicit
optimization of discrimination between correct and incorrect
hypotheses.
- Wordspotting and confidence measures
LVCSR methods can be used to improve limited vocabulary word spotting,
and conversely word-spotting-like techniques can be employed to
optimize LVCSR word error. Related to this are methods to
estimate the confidence in word recognition results.
- Conversational speech phenomena
In collaboration with the
Disfluencies and
Hidden Event Modeling
projects, we aim to model and detect events that are characteristic
to spontaenous speech, such as
hesitations, self-repairs, and covert sentence boundaries.
Explicit modeling of such events is important for effective
speech understanding, but also enhances word recognition accuracy.
- Duration and prosody modeling for recognition
We have recently started to explore the durational and other
prosodic properties of speech for improved word recognition in
conversational speech.
- Language modeling
We investigate language modeling techniques specifically for
conversational speech, especially in the context of the general
research topics above. For example, we have developed discriminative
LM training methods and LMs that capitalize on conversational
speech patterns.
Much of the SRI Language Modeling Toolkit
was developed as a by-product of LVCSR research, and
SRI often provides language modeling support for other sites in
the LVCSR community.
Publications and Presentations
LVCSR research
publications and presentations
by SRI staff.
Presentations from the
2001
LVCSR post-evaluation workshop at NIST.
|
|