1,696 research outputs found

    Conversational Exploratory Search via Interactive Storytelling

    Get PDF
    Conversational interfaces are likely to become more efficient, intuitive and engaging way for human-computer interaction than today's text or touch-based interfaces. Current research efforts concerning conversational interfaces focus primarily on question answering functionality, thereby neglecting support for search activities beyond targeted information lookup. Users engage in exploratory search when they are unfamiliar with the domain of their goal, unsure about the ways to achieve their goals, or unsure about their goals in the first place. Exploratory search is often supported by approaches from information visualization. However, such approaches cannot be directly translated to the setting of conversational search. In this paper we investigate the affordances of interactive storytelling as a tool to enable exploratory search within the framework of a conversational interface. Interactive storytelling provides a way to navigate a document collection in the pace and order a user prefers. In our vision, interactive storytelling is to be coupled with a dialogue-based system that provides verbal explanations and responsive design. We discuss challenges and sketch the research agenda required to put this vision into life.Comment: Accepted at ICTIR'17 Workshop on Search-Oriented Conversational AI (SCAI 2017

    Prosody-Based Adaptive Metaphoric Head and Arm Gestures Synthesis in Human Robot Interaction

    Get PDF
    International audienceIn human-human interaction, the process of communication can be established through three modalities: verbal, non-verbal (i.e., gestures), and/or para-verbal (i.e., prosody). The linguistic literature shows that the para-verbal and non-verbal cues are naturally aligned and synchronized, however the natural mechanism of this synchronization is still unexplored. The difficulty encountered during the coordination between prosody and metaphoric head-arm gestures concerns the conveyed meaning , the way of performing gestures with respect to prosodic characteristics, their relative temporal arrangement, and their coordinated organization in the phrasal structure of utterance. In this research, we focus on the mechanism of mapping between head-arm gestures and speech prosodic characteristics in order to generate an adaptive robot behavior to the interacting human's emotional state. Prosody patterns and the motion curves of head-arm gestures are aligned separately into parallel Hidden Markov Models (HMM). The mapping between speech and head-arm gestures is based on the Coupled Hidden Markov Models (CHMM), which could be seen as a multi-stream collection of HMM, characterizing the segmented prosody and head-arm gestures' data. An emotional state based audio-video database has been created for the validation of this study. The obtained results show the effectiveness of the proposed methodology

    Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech

    Get PDF
    We describe a statistical approach for modeling dialogue acts in conversational speech, i.e., speech-act-like units such as Statement, Question, Backchannel, Agreement, Disagreement, and Apology. Our model detects and predicts dialogue acts based on lexical, collocational, and prosodic cues, as well as on the discourse coherence of the dialogue act sequence. The dialogue model is based on treating the discourse structure of a conversation as a hidden Markov model and the individual dialogue acts as observations emanating from the model states. Constraints on the likely sequence of dialogue acts are modeled via a dialogue act n-gram. The statistical dialogue grammar is combined with word n-grams, decision trees, and neural networks modeling the idiosyncratic lexical and prosodic manifestations of each dialogue act. We develop a probabilistic integration of speech recognition with dialogue modeling, to improve both speech recognition and dialogue act classification accuracy. Models are trained and evaluated using a large hand-labeled database of 1,155 conversations from the Switchboard corpus of spontaneous human-to-human telephone speech. We achieved good dialogue act labeling accuracy (65% based on errorful, automatically recognized words and prosody, and 71% based on word transcripts, compared to a chance baseline accuracy of 35% and human accuracy of 84%) and a small reduction in word recognition error.Comment: 35 pages, 5 figures. Changes in copy editing (note title spelling changed

    Predicting video-conferencing conversation outcomes based on modeling facial expression synchronization

    Get PDF
    Effective video-conferencing conversations are heavily influenced by each speaker's facial expression. In this study, we propose a novel probabilistic model to represent interactional synchrony of conversation partners' facial expressions in video-conferencing communication. In particular, we use a hidden Markov model (HMM) to capture temporal properties of each speaker's facial expression sequence. Based on the assumption of mutual influence between conversation partners, we couple their HMMs as two interacting processes. Furthermore, we summarize the multiple coupled HMMs with a stochastic process prior to discover a set of facial synchronization templates shared among the multiple conversation pairs. We validate the model, by utilizing the exhibition of these facial synchronization templates to predict the outcomes of video-conferencing conversations. The dataset includes 75 video-conferencing conversations from 150 Amazon Mechanical Turkers in the context of a new recruit negotiation. The results show that our proposed model achieves higher accuracy in predicting negotiation winners than support vector machine and canonical HMMs. Further analysis indicates that some synchronized nonverbal templates contribute more in predicting the negotiation outcomes

    Sensing and modeling human networks

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2004.Includes bibliographical references (p. 101-105).This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Knowledge of how groups of people interact is important in many disciplines, e.g. organizational behavior, social network analysis, knowledge management and ubiquitous computing. Existing studies of social network interactions have either been restricted to online communities, where unambiguous measurements about how people interact can be obtained (available from chat and email logs), or have been forced to rely on questionnaires, surveys or diaries to get data on face-to-face interactions between people. The aim of this thesis is to automatically model face-to-face interactions within a community. The first challenge was to collect rich and unbiased sensor data of natural interactions. The "sociometer", a specially designed wearable sensor package, was built to address this problem by unobtrusively measuring face-to-face interactions between people. Using the sociometers, 1518 hours of wearable sensor data from 23 individuals was collected over a two-week period (66 hours per person). This thesis develops a computational framework for learning the interaction structure and dynamics automatically from the sociometer data. Low-level sensor data are transformed into measures that can be used to learn socially relevant aspects of people's interactions - e.g. identifying when people are talking and whom they are talking to. The network structure is learned from the patterns of communication among people. The dynamics of a person's interactions, and how one person's dynamics affects the other's style of interaction are also modeled. Finally, a person's style of interaction is related to the person's role within the network. The algorithms are evaluated by comparing the output against hand-labeled and survey data.by Tanzeem Khalid Choudhury.Ph.D

    Analyzing Group Interactions in Conversations: a Review

    Get PDF
    \noindent Multiparty face-to-face conversations in professional and social settings represent an emerging research domain for which automatic activity-based analysis is relevant for scientific and practical reasons. The activity patterns emerging from groups engaged in conversations are intrinsically multimodal and thus constitute interesting target problems for multistream and multisensor fusion techniques. In this paper, a summarized review of the literature on automatic analysis of group activities in face-to-face conversational settings is presented. A basic categorization of group activities is proposed based on their typical temporal scale, and existing works are then discussed for various types of activities and trends including addressing, turn taking, interest, and dominance

    Auditory dialog analysis and understanding by generative modelling of interactional dynamics

    Get PDF
    In the last few years, the interest in the analysis of human behavioral schemes has dramatically grown, in particular for the interpretation of the communication modalities called social signals. They represent well defined interaction patterns, possibly unconscious, characterizing different conversational situations and behaviors in general. In this paper, we illustrate an automatic system based on a generative structure able to analyze conversational scenarios. The generative model is composed by integrating a Gaussian mixture model and the (observed) influence model, and it is fed with a novel kind of simple low-level auditory social signals, which are termed steady conversational periods (SCPs). These are built on duration of continuous slots of silence or speech, taking also into account conversational turn-taking. The interactional dynamics built upon the transitions among SCPs provide a behavioral blueprint of conversational settings without relying on segmental or continuous phonetic features. Our contribution here is to show the effectiveness of our model when applied on dialogs classification and clustering tasks, considering dialogs between adults and between children and adults, in both flat and arguing discussions, and showing excellent performances also in comparison with state-of-the-art frameworks

    Visual recognition of American sign language using hidden Markov models

    Get PDF
    Thesis (M.S.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1995.Includes bibliographical references (leaves 48-52).by Thad Eugene Starner.M.S

    Automatic recognition of multiparty human interactions using dynamic Bayesian networks

    Get PDF
    Relating statistical machine learning approaches to the automatic analysis of multiparty communicative events, such as meetings, is an ambitious research area. We have investigated automatic meeting segmentation both in terms of “Meeting Actions” and “Dialogue Acts”. Dialogue acts model the discourse structure at a fine grained level highlighting individual speaker intentions. Group meeting actions describe the same process at a coarse level, highlighting interactions between different meeting participants and showing overall group intentions. A framework based on probabilistic graphical models such as dynamic Bayesian networks (DBNs) has been investigated for both tasks. Our first set of experiments is concerned with the segmentation and structuring of meetings (recorded using multiple cameras and microphones) into sequences of group meeting actions such as monologue, discussion and presentation. We outline four families of multimodal features based on speaker turns, lexical transcription, prosody, and visual motion that are extracted from the raw audio and video recordings. We relate these lowlevel multimodal features to complex group behaviours proposing a multistreammodelling framework based on dynamic Bayesian networks. Later experiments are concerned with the automatic recognition of Dialogue Acts (DAs) in multiparty conversational speech. We present a joint generative approach based on a switching DBN for DA recognition in which segmentation and classification of DAs are carried out in parallel. This approach models a set of features, related to lexical content and prosody, and incorporates a weighted interpolated factored language model. In conjunction with this joint generative model, we have also investigated the use of a discriminative approach, based on conditional random fields, to perform a reclassification of the segmented DAs. The DBN based approach yielded significant improvements when applied both to the meeting action and the dialogue act recognition task. On both tasks, the DBN framework provided an effective factorisation of the state-space and a flexible infrastructure able to integrate a heterogeneous set of resources such as continuous and discrete multimodal features, and statistical language models. Although our experiments have been principally targeted on multiparty meetings; features, models, and methodologies developed in this thesis can be employed for a wide range of applications. Moreover both group meeting actions and DAs offer valuable insights about the current conversational context providing valuable cues and features for several related research areas such as speaker addressing and focus of attention modelling, automatic speech recognition and understanding, topic and decision detection
    • 

    corecore