17 research outputs found

    Analyzing Nonverbal Listener Responses using Parallel Recordings of Multiple Listeners

    Get PDF
    In this paper we study nonverbal listener responses on a corpus with multiple parallel recorded listeners. These listeners were meant to believe that they were the sole listener, while in fact there were three persons listening to the same speaker. The speaker could only see one of the listeners. We analyze the impact of the particular setup of the corpus on the behavior and perception of the two types of listeners; the listeners that could be seen by the speaker and the listeners that could not be seen. Furthermore we compare the nonverbal listening behaviors of these three listeners to each other with regard to timing and form. We correlate these behaviors with behaviors of the speaker, like pauses and whether the speaker is looking at the listeners or not

    Continuous Interaction with a Virtual Human

    Get PDF
    Attentive Speaking and Active Listening require that a Virtual Human be capable of simultaneous perception/interpretation and production of communicative behavior. A Virtual Human should be able to signal its attitude and attention while it is listening to its interaction partner, and be able to attend to its interaction partner while it is speaking – and modify its communicative behavior on-the-fly based on what it perceives from its partner. This report presents the results of a four week summer project that was part of eNTERFACE’10. The project resulted in progress on several aspects of continuous interaction such as scheduling and interrupting multimodal behavior, automatic classification of listener responses, generation of response eliciting behavior, and models for appropriate reactions to listener responses. A pilot user study was conducted with ten participants. In addition, the project yielded a number of deliverables that are released for public access

    Observations on listener responses from multiple perspectives

    Get PDF
    Proceedings of the 3rd Nordic Symposium on Multimodal Communication. Editors: Patrizia Paggio, Elisabeth Ahlsén, Jens Allwood, Kristiina Jokinen, Costanza Navarretta. NEALT Proceedings Series, Vol. 15 (2011), 48–55. © 2011 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/22532

    Differences in Listener Responses between Procedural and Narrative Tasks

    Get PDF
    In the long tradition of corpus based research on listener behavior, whether it entails linguistic analysis or social signal processing, many different tasks have been used during the recording of the corpus. So far in no study the task which has been given to the participants has been an independent variable and no studies have looked into the effect of this variable on listener responses. In this paper we present the results of our comparison between listening behavior elicited by procedural and narrative tasks which were used during the recording of our MultiLis corpus. We will show that listeners in the procedural tasks show more agreement in their responses than listeners in the narrative tasks. Furthermore we will show that the long procedural task elicits more responses per minute than the short procedural task. We will reflect on these results in light of cognitive load and grounding theory

    Speaker-adaptive multimodal prediction model for listener responses

    Get PDF
    The goal of this paper is to analyze and model the variability in speaking styles in dyadic interactions and build a predictive algorithm for listener responses that is able to adapt to these different styles. The end result of this research will be a virtual human able to automatically respond to a human speaker with proper listener responses (e.g., head nods). Our novel speaker-adaptive prediction model is created from a corpus of dyadic interactions where speaker variability is analyzed to identify a subset of prototypical speaker styles. During a live interaction our prediction model automatically identifies the closest prototypical speaker style and predicts listener responses based on this ``communicative style". Central to our approach is the idea of ``speaker profile" which uniquely identifies each speaker and enables the matching between prototypical speakers and new speakers. The paper shows the merits of our speaker-adaptive listener response prediction model by showing improvement over a state-of-the-art approach which does not adapt to the speaker. Besides the merits of speaker-adapta-tion, our experiments highlights the importance of using multimodal features when comparing speakers to select the closest prototypical speaker style

    Iterative Perceptual Learning for Social Behavior Synthesis

    Get PDF
    We introduce Iterative Perceptual Learning (IPL), a novel approach for learning computational models for social behavior synthesis from corpora of human-human interactions. The IPL approach combines perceptual evaluation with iterative model refinement. Human observers rate the appropriateness of synthesized individual behaviors in the context of a conversation. These ratings are in turn used to refine the machine learning models. As the ratings correspond to those moments in the conversation where the production of a specific social behavior is inappropriate, we can regard features extracted at these moments as negative samples for the training of a machine learning classifier. This is an advantage over traditional corpusbased approaches, in which negative samples at extracted at random from moments in the conversation where the specific social behavior does not occur. We perform a comparison between the IPL approach and the traditional corpus-based approach on the timing of backchannels for a listener in speaker-listener dialogs. While both models perform similarly in terms of precision and recall scores, the results of the IPL model are rated as more appropriate in the perceptual evaluation.We additionally investigate the effect of the amount of available training data and the variation of training data on the outcome of the models

    Backchannel relevance spaces

    Get PDF
    This contribution introduces backchannel relevance spaces – intervals where it is relevant for a listener in a conversation to produce a backchannel. By annotating and comparing actual visual and vocal backchannels with potential backchannels established using a group of subjects acting as third-party listeners, we show (i) that visual only backchannels represent a substantial proportion of all backchannels; and (ii) that there are more opportunities for backchannels (i.e. potential backchannels or backchannel relevance spaces) than there are actual vocal and visual backchannels. These findings indicate that backchannel relevance spaces enable more accurate acoustic, prosodic, lexical (et cetera) descriptions of backchannel inviting cues than descriptions based on the context of actual vocal backchannels only

    Proceedings

    Get PDF
    Proceedings of the 3rd Nordic Symposium on Multimodal Communication. Editors: Patrizia Paggio, Elisabeth Ahlsén, Jens Allwood, Kristiina Jokinen, Costanza Navarretta. NEALT Proceedings Series, Vol. 15 (2011), vi+87 pp. © 2011 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/22532
    corecore