1,322 research outputs found

    Surface and Contextual Linguistic Cues in Dialog Act Classification: A Cognitive Science View

    Full text link
    What role do linguistic cues on a surface and contextual level have in identifying the intention behind an utterance? Drawing on the wealth of studies and corpora from the computational task of dialog act classification, we studied this question from a cognitive science perspective. We first reviewed the role of linguistic cues in dialog act classification studies that evaluated model performance on three of the most commonly used English dialog act corpora. Findings show that frequency‐based, machine learning, and deep learning methods all yield similar performance. Classification accuracies, moreover, generally do not explain which specific cues yield high performance. Using a cognitive science approach, in two analyses, we systematically investigated the role of cues in the surface structure of the utterance and cues of the surrounding context individually and combined. By comparing the explained variance, rather than the prediction accuracy of these cues in a logistic regression model, we found that (1) while surface and contextual linguistic cues can complement each other, surface linguistic cues form the backbone in human dialog act identification, (2) with word frequency statistics being particularly important for the dialog act, and (3) the similar trends across corpora, despite differences in the type of dialog, corpus setup, and dialog act tagset. The importance of surface linguistic cues in dialog act classification sheds light on how both computers and humans take advantage of these cues in speech act recognition

    Computational modeling of turn-taking dynamics in spoken conversations

    Get PDF
    The study of human interaction dynamics has been at the center for multiple research disciplines in- cluding computer and social sciences, conversational analysis and psychology, for over decades. Recent interest has been shown with the aim of designing computational models to improve human-machine interaction system as well as support humans in their decision-making process. Turn-taking is one of the key aspects of conversational dynamics in dyadic conversations and is an integral part of human- human, and human-machine interaction systems. It is used for discourse organization of a conversation by means of explicit phrasing, intonation, and pausing, and it involves intricate timing. In verbal (e.g., telephone) conversation, the turn transitions are facilitated by inter- and intra- speaker silences and over- laps. In early research of turn-taking in the speech community, the studies include durational aspects of turns, cues for turn yielding intention and lastly designing turn transition modeling for spoken dia- log agents. Compared to the studies of turn transitions very few works have been done for classifying overlap discourse, especially the competitive act of overlaps and function of silences. Given the limitations of the current state-of-the-art, this dissertation focuses on two aspects of con- versational dynamics: 1) design automated computational models for analyzing turn-taking behavior in a dyadic conversation, 2) predict the outcome of the conversations, i.e., observed user satisfaction, using turn-taking descriptors, and later these two aspects are used to design a conversational profile for each speaker using turn-taking behavior and the outcome of the conversations. The analysis, experiments, and evaluation has been done on a large dataset of Italian call-center spoken conversations where customers and agents are engaged in real problem-solving tasks. Towards solving our research goal, the challenges include automatically segmenting and aligning speakers’ channel from the speech signal, identifying and labeling the turn-types and its functional aspects. The task becomes more challenging due to the presence of overlapping speech. To model turn- taking behavior, the intension behind these overlapping turns needed to be considered. However, among all, the most critical question is how to model observed user satisfaction in a dyadic conversation and what properties of turn-taking behavior can be used to represent and predict the outcome. Thus, the computational models for analyzing turn-taking dynamics, in this dissertation includes au- tomatic segmenting and labeling turn types, categorization of competitive vs non-competitive overlaps, silences (e.g., lapse, pauses) and functions of turns in terms of dialog acts. The novel contributions of the work presented here are to 1. design of a fully automated turn segmentation and labeling (e.g., agent vs customer’s turn, lapse within the speaker, and overlap) system. 2. the design of annotation guidelines for segmenting and annotating the speech overlaps with the competitive and non-competitive labels. 3. demonstrate how different channels of information such as acoustic, linguistic, and psycholin- guistic feature sets perform in the classification of competitive vs non-competitive overlaps. 4. study the role of speakers and context (i.e., agents’ and customers’ speech) for conveying the information of competitiveness for each individual feature set and their combinations. 5. investigate the function of long silences towards the information flow in a dyadic conversation. The extracted turn-taking cues is then used to automatically predict the outcome of the conversation, which is modeled from continuous manifestations of emotion. The contributions include 1. modeling the state of the observed user satisfaction in terms of the final emotional manifestation of the customer (i.e., user). 2. analysis and modeling turn-taking properties to display how each turn type influence the user satisfaction. 3. study of how turn-taking behavior changes within each emotional state. Based on the studies conducted in this work, it is demonstrated that turn-taking behavior, specially competitiveness of overlaps, is more than just an organizational tool in daily human interactions. It represents the beneficial information and contains the power to predict the outcome of the conversation in terms of satisfaction vs not-satisfaction. Combining the turn-taking behavior and the outcome of the conversation, the final and resultant goal is to design a conversational profile for each speaker. Such profiled information not only facilitate domain experts but also would be useful to the call center agent in real time. These systems are fully automated and no human intervention is required. The findings are po- tentially relevant to the research of overlapping speech and automatic analysis of human-human and human-machine interactions

    Filled pauses and prolongations in Roman Italian task-oriented dialogue

    Get PDF
    This paper presents work in progress on two markers of hesitation in Roman Italian task-oriented dialogue, namely filled pauses and prolongations. We investigate their form, relative frequency, and distributional characteristics in Italian. Initial results suggest that Italian speakers produce prolongations more frequently than filled pauses, and that the prototypical hesitant prolongation involves a word-final vowel

    Filled pauses and prolongations in Roman Italian task-oriented dialogue

    Get PDF
    This paper presents work in progress on two markers of hesitation in Roman Italian task-oriented dialogue, namely filled pauses and prolongations. We investigate their form, relative frequency, and distributional characteristics in Italian. Initial results suggest that Italian speakers produce prolongations more frequently than filled pauses, and that the prototypical hesitant prolongation involves a word-final vowel

    Affective Brain-Computer Interfaces

    Get PDF

    A review of theories and methods in the science of face-to-face social interaction

    Get PDF
    For most of human history, face-to-face interactions have been the primary and most fundamental way to build social relationships, and even in the digital era they remain the basis of our closest bonds. These interactions are built on the dynamic integration and coordination of verbal and non-verbal information between multiple people. However, the psychological processes underlying face-to-face interaction remain difficult to study. In this Review, we discuss three ways the multimodal phenomena underlying face-to-face social interaction can be organized to provide a solid basis for theory development. Next, we review three types of theory of social interaction: theories that focus on the social meaning of actions, theories that explain actions in terms of simple behaviour rules and theories that rely on rich cognitive models of the internal states of others. Finally, we address how different methods can be used to distinguish between theories, showcasing new approaches and outlining important directions for future research. Advances in how face-to-face social interaction can be studied, combined with a renewed focus on cognitive theories, could lead to a renaissance in social interaction research and advance scientific understanding of face-to-face interaction and its underlying cognitive foundations

    A data-driven approach to spoken dialog segmentation

    Get PDF
    In This Paper, We Present A Statistical Model For Spoken Dialog Segmentation That Decides The Current Phase Of The Dialog By Means Of An Automatic Classification Process. We Have Applied Our Proposal To Three Practical Conversational Systems Acting In Different Domains. The Results Of The Evaluation Show That Is Possible To Attain High Accuracy Rates In Dialog Segmentation When Using Different Sources Of Information To Represent The User Input. Our Results Indicate How The Module Proposed Can Also Improve Dialog Management By Selecting Better System Answers. The Statistical Model Developed With Human-Machine Dialog Corpora Has Been Applied In One Of Our Experiments To Human-Human Conversations And Provides A Good Baseline As Well As Insights In The Model Limitation

    Layered structures in dialogues:from what to how and vv

    Get PDF
    corecore