Speech Technology and Research Laboratory
  Research Activities
  Past Research Activities
  Technologies for License
  In the News
  Career Opportunities
  Contact Us
  Information and Computing Sciences Division
SpacerAbout UsDividerR and D DivisionsDividerCareersDividerNewsroomDividerContact UsDividerSRI HomeSpacer

  SRI Logo

Socio-Cultural Information from Spoken Interactions


Kristin Precoda
Dilek Hakkani-Tür
Geoffrey Raymond
Colleen Richey
Elizabeth Shriberg
Gokhan Tur
Sibel Yaman
Wen Wang
Ximena Avila
Gabriel Jiva

Project Summary

The Socio-Cultural Content in Language (SCIL) Program from the Intelligence Advanced Research Projects Activity (IARPA) aims to develop methods to correlate the social goals of the members of a group with the language that they use. SRI is carrying out research within the SCIL Program to model and automatically detect speaker roles, social relations, and speaker characteristics from spoken interactions. SRI's work is informed by the insights and theories of conversation analysis, and builds upon SRI's state-of-the-art speech recognizer and spoken language processing tools for multiple languages. The approach is to develop a cross-language and cross-genre computational framework, making use of not only lexical information but also phenomena such as turn-taking behavior, prosodic information, dialog act tags, features related to social network analysis, and many others. We are working with three languages (American English, Modern Standard Arabic, and Mandarin Chinese) to ensure that our approaches are generalizable to other languages, and to learn about differences between languages and cultures.

SCIL Overview

One focus of our work is on speaker roles related to a personís institutional identity within an interaction. Because such institutional identities are constituted in interaction through recurrent patterns of action and a specialization of forms that speakers produce in ordinary conversation, the identities can be automatically detected. Critical to this approach is the leveraging of basic patterns that underpin a broad range of institutional settings, without losing the specificity that differentiates these settings. For example, patterns of questioning and answering are constitutive of distinct speaker roles across a range of institutional environments: talk show hosts gather information by asking experts and other sources questions, doctors seek to understand the problems confronting their patients by asking them questions about their symptoms and experiences, police seek information by asking questions of witnesses, lawyers and judges conduct trials by questioning witnesses, and so on. As a consequence, the detection of such patterns provides important clues about the social roles that speakers occupy in an interaction. In addition to these patterns, other features are characteristically used or characteristically avoided in specific institutional environments. For example, doctors almost never respond to patient answers to medically relevant questions with "oh" since that would treat the information as personally relevant to the doctor, or with "you're kidding" since that would indicate surprise. Instead, the forms that they typically use, such as "right" or "okay," provide a more neutral form of acknowledgement.

Another focus of the work is the social relations between individuals within a group. An important factor that researchers have linked to the social efficacy of groups (how likely the group is to affect others' beliefs and how likely the participants are to act as a group) and to their durability (e.g., how stable or unstable a group is, and what factors are most likely to contribute to, or undermine, its sustained organization and efficacy) is the degree of cohesion of the group or social relationship. Does a group consist of like-minded individuals who share a common world-view and underlying set of values, or are their interactions fraught with disagreement and conflict? Inferences can be drawn by evaluating how speakers produce and respond to yes/no questions, and from patterns of agreement and disagreement more generally. We begin with the understanding that the basic sequences of actions through which groups are formed, sustained, and transformed in interaction are systematically biased against conflict (Heritage 1984, Schegloff 2007). Against this backdrop, we examine how speakers pose questions and statements to recipients, and how those recipients respond, to infer information about group and social relations.

Recent Publications

Wen Wang, Kristin Precoda, Colleen Richey, Geoffrey Raymond. "Identifying Agreement/Disagreement in Conversational Speech: A Cross-lingual Study", Interspeech 2011, Florence, Italy, August 2011.

Wen Wang, Sibel Yaman, Kristin Precoda, Colleen Richey, Geoffrey Raymond. "Detection of Agreement and Disagreement in Broadcast Conversations", ACL/HLT 2011, Portland, OR, June 2011.

Wen Wang, Sibel Yaman, Kristin Precoda, Colleen Richey. "Automatic Identification of Speaker Role and Agreement/Disagreement in Broadcast Conversation", ICASSP 2011, Prague, Czech Republic, May 2011.

Sibel Yaman, Dilek Hakkani-Tür, and Gokhan Tur. "Detection of Social Roles in Conversations Using Dynamic Bayesian Networks," Proc. Interspeech 2010, Makuhari, Japan, Sept. 2010. (PDF)


We have developed and delivered a software system that uses the Service-Oriented Architecture (SOA) paradigm and models speaker roles and group cohesion in the context of broadcast conversations and meetings.


About Us  Vertical divider  R&D Divisions  Divider  Careers  Divider  Newsroom  Divider  Contact Us
©2011 SRI International, 333 Ravenswood Avenue, Menlo Park, CA 94025-3493
SRI International is an independent, nonprofit corporation. Privacy policy

Last modified Oct 31, 2011