Call for Workshop Papers HIGHER-LEVEL LINGUISTIC AND OTHER KNOWLEDGE FOR AUTOMATIC SPEECH PROCESSING (Workshop in conjunction with NAACL/HLT 2004) The Park Plaza Hotel, Boston, Massachusetts Thursday, May 6, 2004 The theme of this workshop is the use of higher-level linguistic and other types of knowledge for automatic speech processing, especially, but not limited to, speech recognition (ASR). Most current state-of-the-art speech recognizers do not explicitly use linguistic information (with the exception of pronunciation dictionaries), relying mainly on information encoded in statistical N-gram language models. Higher-level linguistic processes such as prosody, syntax, semantics, and pragmatics are obviously important, but such information is typically harder to label, model, and integrate into the standard computational frameworks (such as hidden Markov models). In addition, high-level meta-information, such as personal information stored in a database or dialogue and pragmatic coherence constraints, can also play important roles. All these sources of information can potentially compensate for acoustic confusability resulting from noisy environments and unexpected channel and speaker mismatch, which are very challenging issues for automatic speech recognizers. Furthermore, high-level information is typically crucial when the ultimate goal is to interpret the spoken input (i.e., the same sequence of words can mean different things depending on prosodic and syntactic features, as well as pragmatic constraints). Speaker recognition is another field that has recently recognized the importance of higher-level linguistic features, due to the fact that speakers exhibit idiosyncratic prosodic, lexico-syntactic, and pragmatic patterns ("conversational biometrics"). This workshop seeks to bring together researchers in speech, NLP, and linguistics, exploring novel ideas on the use of information beyond the low-level approaches traditionally used in speech processing (frame-level acoustic modeling and N-gram based language modeling). Many vigorous research efforts in this direction are well-established, and some have proven to be very successful, such as structured/dependency language models for speech recognition, or prosodic information for speaker recognition. For limited domains (e.g. travel reservation and financial transactions), semantic information has clearly been useful for improving speech recognition. Recently, more human knowledge resources that encode different aspects of syntax, semantics, ontology, and common-sense knowledge have become available, and could well be used to augment language models to improve speech recognition. Such resources may include, but are not limited to, annotated corpora such as the Penn Treebank and PropBank, as well as FrameNet, WordNet, OpenCyc, etc. One challenge is that conditioning a language model on such information typically leads to data sparseness/fragmentation, so a proper representation of such knowledge is absolutely critical to success. This workshop seeks to improve the dissemination and exchange of ideas, methods, and data resources that are relevant to further progress. This workshop is seeking papers that present novel ideas of how higher-level linguistic and other types of information can be utilized for automatic speech processing, as well as experimental results. IMPORTANT DATES Wed, Jan 21, 2004 Submissions due Fri, Feb 20, 2004 Acceptance/rejection notification Mon, Mar 8, 2004 Camera ready copy due Thu, May 6, 2004 Workshop SUBMISSION FORMAT The format and length requirements will be the same as for full papers of NAACL/HLT 2004, except that submissions need not be anonymized. For details, go to http://www1.cs.columbia.edu/~pablo/hlt-naacl04/callpapers.html. SUBMISSION PROCEDURE Papers should be sent to hlt-workshop@speech.sri.com. The paper should be an attachment in PDF format and the heading on the email should read "PAPER SUBMISSION". Notification of acceptance or rejection will be sent to the originating email address. PROGRAM COMMITTEE Yuqing Gao (Co-chair) (IBM TJ Watson Research Center) Hong-Kwang Jeff Kuo (Co-chair) (IBM TJ Watson Research Center) Andreas Stolcke (Co-chair) (SRI & ICSI) Jerome Bellegarda (Apple Computer) Ciprian Chelba (Microsoft Research) Jennifer Chu-Carroll (IBM TJ Watson Research Center) Dan Jurafsky (University of Colorado) Sanjeev Khudanpur (Johns Hopkins University) Martha Palmer (U. Penn) Barbara Peskin (ICSI) Roberto Pieraccini (IBM TJ Watson Research Center) Roni Rosenfeld (CMU) Julia Hirschberg (Columbia University) Stephanie Seneff (MIT) CONTACT INFORMATION All inquiries should be sent to hlt-workshop@speech.sri.com with the SUBJECT heading "NAACL/HLT WORKSHOP INQUIRY".