7,420 research outputs found

    Learning and Evaluating Response Prediction Models using Parallel Listener Consensus

    Get PDF
    Traditionally listener response prediction models are learned from pre-recorded dyadic interactions. Because of individual differences in behavior, these recordings do not capture the complete ground truth. Where the recorded listener did not respond to an opportunity provided by the speaker, another listener would have responded or vice versa. In this paper, we introduce the concept of parallel listener consensus where the listener responses from multiple parallel interactions are combined to better capture differences and similarities between individuals. We show how parallel listener consensus can be used for both learning and evaluating probabilistic prediction models of listener responses. To improve the learning performance, the parallel consensus helps identifying better negative samples and reduces outliers in the positive samples. We propose a new error measurement called Fconsensus which exploits the parallel consensus to better define the concepts of exactness (mislabels) and completeness (missed labels) for prediction models. We present a series of experiments using the MultiLis Corpus where three listeners were tricked into believing that they had a one-on-one conversation with a speaker, while in fact they were recorded in parallel in interaction with the same speaker. In this paper we show that using parallel listener consensus can improve learning performance and represent better evaluation criteria for predictive models

    A Survey on Evaluation Metrics for Backchannel Prediction Models

    Get PDF
    In this paper we give an overview of the evaluation metrics used to measure the performance of backchannel prediction models. Both objective and subjective evaluation metrics are discussed. The survey shows that almost every backchannel prediction model is evaluated with a different evaluation metric. This makes comparison between developed models unreliable, even beside the other variables in play, such as different corpora, language, conversational setting, amount of data and/or definition of the term backchannel

    Analyzing Nonverbal Listener Responses using Parallel Recordings of Multiple Listeners

    Get PDF
    In this paper we study nonverbal listener responses on a corpus with multiple parallel recorded listeners. These listeners were meant to believe that they were the sole listener, while in fact there were three persons listening to the same speaker. The speaker could only see one of the listeners. We analyze the impact of the particular setup of the corpus on the behavior and perception of the two types of listeners; the listeners that could be seen by the speaker and the listeners that could not be seen. Furthermore we compare the nonverbal listening behaviors of these three listeners to each other with regard to timing and form. We correlate these behaviors with behaviors of the speaker, like pauses and whether the speaker is looking at the listeners or not

    Speaker-adaptive multimodal prediction model for listener responses

    Get PDF
    The goal of this paper is to analyze and model the variability in speaking styles in dyadic interactions and build a predictive algorithm for listener responses that is able to adapt to these different styles. The end result of this research will be a virtual human able to automatically respond to a human speaker with proper listener responses (e.g., head nods). Our novel speaker-adaptive prediction model is created from a corpus of dyadic interactions where speaker variability is analyzed to identify a subset of prototypical speaker styles. During a live interaction our prediction model automatically identifies the closest prototypical speaker style and predicts listener responses based on this ``communicative style". Central to our approach is the idea of ``speaker profile" which uniquely identifies each speaker and enables the matching between prototypical speakers and new speakers. The paper shows the merits of our speaker-adaptive listener response prediction model by showing improvement over a state-of-the-art approach which does not adapt to the speaker. Besides the merits of speaker-adapta-tion, our experiments highlights the importance of using multimodal features when comparing speakers to select the closest prototypical speaker style

    Observations on listener responses from multiple perspectives

    Get PDF
    Proceedings of the 3rd Nordic Symposium on Multimodal Communication. Editors: Patrizia Paggio, Elisabeth Ahlsén, Jens Allwood, Kristiina Jokinen, Costanza Navarretta. NEALT Proceedings Series, Vol. 15 (2011), 48–55. © 2011 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/22532

    Iterative Perceptual Learning for Social Behavior Synthesis

    Get PDF
    We introduce Iterative Perceptual Learning (IPL), a novel approach for learning computational models for social behavior synthesis from corpora of human-human interactions. The IPL approach combines perceptual evaluation with iterative model refinement. Human observers rate the appropriateness of synthesized individual behaviors in the context of a conversation. These ratings are in turn used to refine the machine learning models. As the ratings correspond to those moments in the conversation where the production of a specific social behavior is inappropriate, we can regard features extracted at these moments as negative samples for the training of a machine learning classifier. This is an advantage over traditional corpusbased approaches, in which negative samples at extracted at random from moments in the conversation where the specific social behavior does not occur. We perform a comparison between the IPL approach and the traditional corpus-based approach on the timing of backchannels for a listener in speaker-listener dialogs. While both models perform similarly in terms of precision and recall scores, the results of the IPL model are rated as more appropriate in the perceptual evaluation.We additionally investigate the effect of the amount of available training data and the variation of training data on the outcome of the models

    Backchannel relevance spaces

    Get PDF
    This contribution introduces backchannel relevance spaces – intervals where it is relevant for a listener in a conversation to produce a backchannel. By annotating and comparing actual visual and vocal backchannels with potential backchannels established using a group of subjects acting as third-party listeners, we show (i) that visual only backchannels represent a substantial proportion of all backchannels; and (ii) that there are more opportunities for backchannels (i.e. potential backchannels or backchannel relevance spaces) than there are actual vocal and visual backchannels. These findings indicate that backchannel relevance spaces enable more accurate acoustic, prosodic, lexical (et cetera) descriptions of backchannel inviting cues than descriptions based on the context of actual vocal backchannels only

    Twente Debate Corpus - A Multimodal Corpus for Head Movement Analysis

    Get PDF
    This paper introduces a multimodal discussion corpus for the study into head movement and turn-taking patterns in debates. Given that participants either acted alone or in a pair, cooperation and competition and their nonverbal correlates can be analyzed. In addition to the video and audio of the recordings, the corpus contains automatically estimated head movements, and manual annotations of who is speaking and who is looking where. The corpus consists of over 2 hours of debates, in 6 groups with 18 participants in total. We describe the recording setup and present initial analyses of the recorded data. We found that the person who acted as single debater speaks more and also receives more attention compared to the other debaters, also when corrected for the time speaking.We also found that a single debater was more likely to speak after a team debater. Future work will be aimed at further analysis of the relation between speaking and looking patterns, the outcome of the debate and perceived dominance of the debaters
    corecore