Socio-Cultural Information from Spoken Interactions
The Socio-Cultural Content in Language (SCIL) Program
from the Intelligence Advanced Research Projects Activity (IARPA)
aims to develop methods to correlate the social goals of the members of a group with the language that they use. SRI is carrying out research within the SCIL Program to model and automatically
detect speaker roles, social relations, and speaker characteristics from spoken interactions. SRI's work is informed by the insights and theories of conversation analysis, and builds upon SRI's
state-of-the-art speech recognizer and spoken language processing tools for multiple languages. The approach is to develop a cross-language and cross-genre computational framework, making use of not only
lexical information but also phenomena such as turn-taking behavior, prosodic information, dialog act tags, features related to social network analysis, and many others. We are working with three
languages (American English, Modern Standard Arabic, and Mandarin Chinese) to ensure that our approaches are generalizable to other languages, and to learn about differences between languages and cultures.
One focus of our work is on speaker roles related to a personís institutional identity within an interaction. Because such institutional identities are constituted in interaction through recurrent patterns
of action and a specialization of forms that speakers produce in ordinary conversation, the identities can be automatically detected. Critical to this approach is the leveraging of basic patterns that underpin a
broad range of institutional settings, without losing the specificity that differentiates these settings. For example, patterns of questioning and answering are constitutive of distinct speaker roles across a
range of institutional environments: talk show hosts gather information by asking experts and other sources questions, doctors seek to understand the problems confronting their patients by asking them questions about their
symptoms and experiences, police seek information by asking questions of witnesses, lawyers and judges conduct trials by questioning witnesses, and so on. As a consequence, the detection of such patterns provides important
clues about the social roles that speakers occupy in an interaction. In addition to these patterns, other features are characteristically used or characteristically avoided in specific institutional environments. For example, doctors
almost never respond to patient answers to medically relevant questions with "oh" since that would treat the information as personally relevant to the doctor, or with "you're kidding" since that would indicate surprise. Instead,
the forms that they typically use, such as "right" or "okay," provide a more neutral form of acknowledgement.
Another focus of the work is the social relations between individuals within a group. An important factor that researchers have linked to the social efficacy of groups (how likely the group is to
affect others' beliefs and how likely the participants are to act as a group) and to their durability (e.g., how stable or unstable a group is, and what factors are most likely to contribute to, or
undermine, its sustained organization and efficacy) is the degree of cohesion of the group or social relationship. Does a group consist of like-minded individuals who share a common world-view and
underlying set of values, or are their interactions fraught with disagreement and conflict? Inferences can be drawn by evaluating how speakers produce and respond to yes/no questions, and from
patterns of agreement and disagreement more generally. We begin with the understanding that the basic sequences of actions through which groups are formed, sustained, and transformed in interaction are
systematically biased against conflict (Heritage 1984, Schegloff 2007). Against this backdrop, we examine how speakers pose questions and statements to recipients, and how those recipients respond,
to infer information about group and social relations.
Wen Wang, Kristin Precoda, Colleen Richey, Geoffrey Raymond. "Identifying Agreement/Disagreement in Conversational Speech: A Cross-lingual Study", Interspeech 2011, Florence, Italy, August 2011.
Wen Wang, Sibel Yaman, Kristin Precoda, Colleen Richey, Geoffrey Raymond. "Detection of Agreement and Disagreement in Broadcast Conversations", ACL/HLT 2011, Portland, OR, June 2011.
Wen Wang, Sibel Yaman, Kristin Precoda, Colleen Richey. "Automatic Identification of Speaker Role and Agreement/Disagreement in Broadcast Conversation", ICASSP 2011, Prague, Czech Republic, May 2011.
Sibel Yaman, Dilek Hakkani-Tür, and Gokhan Tur. "Detection of Social Roles in Conversations Using Dynamic Bayesian Networks," Proc. Interspeech 2010, Makuhari, Japan, Sept. 2010. (PDF)
We have developed and delivered a software system that
uses the Service-Oriented Architecture (SOA) paradigm and models
speaker roles and group cohesion in the context of broadcast
conversations and meetings.