329 research outputs found

    Toward a social signaling framework : activity and emphasis in speech

    Get PDF
    Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 67-70).Language is not the only form of verbal communication. Loudness, pitch, speaking rate, and other non-linguistic speech features are crucial aspects of human spoken interaction. In this thesis, we separate these speech features into two categories -- vocal Activity and vocal Emphasis -- and propose a framework for classifying high-level social behavior according to those metrics. We present experiments showing that non-linguistic speech analysis alone can account for appreciable portions of social phenomena. We report statistically significant results in measuring the persuasiveness of pitches, the effectiveness of customer service representatives, and the severity of depression. Effect sizes of these studies explain up to 60% of the sample variances and yield binary decision accuracies nearing 90%.by William T. Stoltzman.M.Eng

    Computational modeling of turn-taking dynamics in spoken conversations

    Get PDF
    The study of human interaction dynamics has been at the center for multiple research disciplines in- cluding computer and social sciences, conversational analysis and psychology, for over decades. Recent interest has been shown with the aim of designing computational models to improve human-machine interaction system as well as support humans in their decision-making process. Turn-taking is one of the key aspects of conversational dynamics in dyadic conversations and is an integral part of human- human, and human-machine interaction systems. It is used for discourse organization of a conversation by means of explicit phrasing, intonation, and pausing, and it involves intricate timing. In verbal (e.g., telephone) conversation, the turn transitions are facilitated by inter- and intra- speaker silences and over- laps. In early research of turn-taking in the speech community, the studies include durational aspects of turns, cues for turn yielding intention and lastly designing turn transition modeling for spoken dia- log agents. Compared to the studies of turn transitions very few works have been done for classifying overlap discourse, especially the competitive act of overlaps and function of silences. Given the limitations of the current state-of-the-art, this dissertation focuses on two aspects of con- versational dynamics: 1) design automated computational models for analyzing turn-taking behavior in a dyadic conversation, 2) predict the outcome of the conversations, i.e., observed user satisfaction, using turn-taking descriptors, and later these two aspects are used to design a conversational profile for each speaker using turn-taking behavior and the outcome of the conversations. The analysis, experiments, and evaluation has been done on a large dataset of Italian call-center spoken conversations where customers and agents are engaged in real problem-solving tasks. Towards solving our research goal, the challenges include automatically segmenting and aligning speakers’ channel from the speech signal, identifying and labeling the turn-types and its functional aspects. The task becomes more challenging due to the presence of overlapping speech. To model turn- taking behavior, the intension behind these overlapping turns needed to be considered. However, among all, the most critical question is how to model observed user satisfaction in a dyadic conversation and what properties of turn-taking behavior can be used to represent and predict the outcome. Thus, the computational models for analyzing turn-taking dynamics, in this dissertation includes au- tomatic segmenting and labeling turn types, categorization of competitive vs non-competitive overlaps, silences (e.g., lapse, pauses) and functions of turns in terms of dialog acts. The novel contributions of the work presented here are to 1. design of a fully automated turn segmentation and labeling (e.g., agent vs customer’s turn, lapse within the speaker, and overlap) system. 2. the design of annotation guidelines for segmenting and annotating the speech overlaps with the competitive and non-competitive labels. 3. demonstrate how different channels of information such as acoustic, linguistic, and psycholin- guistic feature sets perform in the classification of competitive vs non-competitive overlaps. 4. study the role of speakers and context (i.e., agents’ and customers’ speech) for conveying the information of competitiveness for each individual feature set and their combinations. 5. investigate the function of long silences towards the information flow in a dyadic conversation. The extracted turn-taking cues is then used to automatically predict the outcome of the conversation, which is modeled from continuous manifestations of emotion. The contributions include 1. modeling the state of the observed user satisfaction in terms of the final emotional manifestation of the customer (i.e., user). 2. analysis and modeling turn-taking properties to display how each turn type influence the user satisfaction. 3. study of how turn-taking behavior changes within each emotional state. Based on the studies conducted in this work, it is demonstrated that turn-taking behavior, specially competitiveness of overlaps, is more than just an organizational tool in daily human interactions. It represents the beneficial information and contains the power to predict the outcome of the conversation in terms of satisfaction vs not-satisfaction. Combining the turn-taking behavior and the outcome of the conversation, the final and resultant goal is to design a conversational profile for each speaker. Such profiled information not only facilitate domain experts but also would be useful to the call center agent in real time. These systems are fully automated and no human intervention is required. The findings are po- tentially relevant to the research of overlapping speech and automatic analysis of human-human and human-machine interactions

    Criteria for developing banking interactive voice response user interface

    Get PDF
    The goal of this paper was to facilitate the evolution of a usable and consistent style of user interface for IVR banking purpose. The idea was to designed an IVR banking user interface to help the customers, the banks and other interested parties to access banking telephony services. In so doing, telephone-based user interface cases for banking purposes were gathered through far-reaching library search and on-line questionnaires. The results were used to develop draft guidelines. Following that, extensive structured interviews with audio-based user interface experts as well as experienced IVR banking users were carried out in order to validate the guidelines and to produce the user interface scripts. This method provides an effective way of collecting primary data by directly taking into account the target audiences’ view, opinion and perspective. At the end of the project we manage to table most aspects of telephone-based user interface that should be considered for developing IVR banking user interface. The outcome was used in designing IVR banking user interface hypotheses. This paper also illustrates how the general hypotheses obtained, provides guidance for script development which then enables the translation of those hypotheses into actual scripts by categorizing them into three types of interactions, namely messages, prompts, and information. The hypotheses and the actual IVR banking scripts were significant contribution to the IVR banking systems developers, the banks, and certainly, the field of audio-based human-computer interactions (HCI)

    Interactively skimming recorded speech

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1994.Includes bibliographical references (p. 143-156).Barry Michael Arons.Ph.D

    REAL-TIME ANGER DETECTION IN ARABIC SPEECH DIALOGS

    Get PDF

    REAL-TIME ANGER DETECTION IN ARABIC SPEECH DIALOGS

    Get PDF
    corecore