Speech Translation Research at SRI International
Full Spontaneous Translation
SRI's newest translation technology permits bidirectional, voice-to-voice machine
translation of spontaneous utterances.
Unlike the Phraselator or BPTS,
the full spontaneous translation system is not restricted to prerecorded
translations. It can translate a wider range of utterances,
including novel utterances it has never seen before.
Our most advanced translation system is IraqCommTM,
which has been in use in Iraq since early 2006. Below is described
our earlier work on a similar system for Pashto, a major language of
Speech synthesis output
In the full translation system, the computer-generated translations
(in both directions) are played through a speech synthesizer. While
the synthesized speech used in the full translation system is smooth
and fluent, it is necessarily of lower quality than the prerecorded
human translations used in the Phraselator and BPTS.
The speech synthesis technology in our translation systems
is provided by
Cepstral LLC. Translations
into English are synthesized in Cepstral's
off-the-shelf English voice. Foreign-language
voices are custom-built by Cepstral
specially for this project based on data that we provide.
The full translation system relies entirely on computer-generated
translations. This makes it more flexible than the Phraselator and
BPTS, whose translations are hand-crafted in advance.
We currently have two different translation technologies: SRInterp statistical machine translation engine and Gemini interlingual translation engine. SRInterp is SRI's cross-platform large-scale statistical machine translation (SMT) decoder, which supports the state-of-the-art translation techniques, including phrase-based, hierarchical, syntax-based and string-to-dependency translation models. SRInterp has been used in SRI's major projects, including GALE and TRANSTAC. The latest IraqComm speech-to-speech translation system uses SRInterp technologies.
Gemini is an interlingual machine translation system, a system developed in SRI's
Artificial Intelligence Center. The Gemini system can both interpret and generate natural language utterances, which makes it well-suited to automatic translation work. Gemini's translation abilities rely on sophisticated grammars developed by linguists for both the source and target languages. Our grammars of English and Pashto each contain thousands of words and hundreds of grammatical rules. Gemini system can generate high quality translations when the grammars cover the application domain, and can work completementarily with statistical translation.
Full translation proceeds as follows.
First, the Dynaspeak speech recognition system sends a
transcribed utterance in the source language to a translation engine, which finds a grammatical, natural-sounding, and semantically equivalent utterence in the target language based on its translation model or rules. This target-language translation is then output through the speech synthesizer.
History of full spontaneous translation
Work on the full translation system began early 2002.
The use of Gemini for producing computer-generated translations was
inspired by a previous SRI project called the Spoken Language Translator,
which lasted from 1992 until 1999. The Spoken Language Translator,
one of the first and most successful projects in the area of automatic
speech translation, was able to translate among English, French, and
Swedish in the domain of air travel planning.
At the heart of the Spoken Language Translator was a natural language processing system called the Core Language Engine, a predecessor of Gemini.
More information about the Spoken Language Translator