249 research outputs found

    Attentive Speaking. From Listener Feedback to Interactive Adaptation

    Get PDF
    Buschmeier H. Attentive Speaking. From Listener Feedback to Interactive Adaptation. Bielefeld: UniversitĂ€t Bielefeld; 2018.Dialogue is an interactive endeavour in which participants jointly pursue the goal of reaching understanding. Since participants enter the interaction with their individual conceptualisation of the world and their idiosyncratic way of using language, understanding cannot, in general, be reached by exchanging messages that are encoded when speaking and decoded when listening. Instead, speakers need to design their communicative acts in such a way that listeners are likely able to infer what is meant. Listeners, in turn, need to provide evidence of their understanding in such a way that speakers can infer whether their communicative acts were successful. This is often an interactive and iterative process in which speakers and listeners work towards understanding by jointly coordinating their communicative acts through feedback and adaptation. Taking part in this interactive process requires dialogue participants to have ‘interactional intelligence’. This conceptualisation of dialogue is rather uncommon in formal or technical approaches to dialogue modelling. This thesis argues that it may, nevertheless, be a promising research direction for these fields, because it de-emphasises raw language processing performance and focusses on fundamental interaction skills. Interactionally intelligent artificial conversational agents may thus be able to reach understanding with their interlocutors by drawing upon such competences. This will likely make them more robust, more understandable, more helpful, more effective, and more human-like. This thesis develops conceptual and computational models of interactional intelligence for artificial conversational agents that are limited to (1) the speaking role, and (2) evidence of understanding in form of communicative listener feedback (short but expressive verbal/vocal signals, such as ‘okay’, ‘mhm’ and ‘huh’, head gestures, and gaze). This thesis argues that such ‘attentive speaker agents’ need to be able (1) to probabilistically reason about, infer, and represent their interlocutors’ listening related mental states (e.g., their degree of understanding), based on their interlocutors’ feedback behaviour; (2) to interactively adapt their language and behaviour such that their interlocutors’ needs, derived from the attributed mental states, are taken into account; and (3) to decide when they need feedback from their interlocutors and how they can elicit it using behavioural cues.This thesis describes computational models for these three processes, their integration in an incremental behaviour generation architecture for embodied conversational agents, and a semi-autonomous interaction study in which the resulting attentive speaker agent is evaluated. The evaluation finds that the computational models of attentive speaking developed in this thesis enable conversational agents to interactively reach understanding with their human interlocutors (through feedback and adaptation) and that these interlocutors are willing to provide natural communicative listener feedback to such an attentive speaker agent. The thesis shows that computationally modelling interactional intelligence is generally feasible, and thereby raises many new research questions and engineering problems in the interdisciplinary fields of dialogue and artificial conversational agents

    Speech-driven animation using multi-modal hidden Markov models

    Get PDF
    The main objective of this thesis was the synthesis of speech synchronised motion, in particular head motion. The hypothesis that head motion can be estimated from the speech signal was confirmed. In order to achieve satisfactory results, a motion capture data base was recorded, a definition of head motion in terms of articulation was discovered, a continuous stream mapping procedure was developed, and finally the synthesis was evaluated. Based on previous research into non-verbal behaviour basic types of head motion were invented that could function as modelling units. The stream mapping method investigated in this thesis is based on Hidden Markov Models (HMMs), which employ modelling units to map between continuous signals. The objective evaluation of the modelling parameters confirmed that head motion types could be predicted from the speech signal with an accuracy above chance, close to 70%. Furthermore, a special type ofHMMcalled trajectoryHMMwas used because it enables synthesis of continuous output. However head motion is a stochastic process therefore the trajectory HMM was further extended to allow for non-deterministic output. Finally the resulting head motion synthesis was perceptually evaluated. The effects of the “uncanny valley” were also considered in the evaluation, confirming that rendering quality has an influence on our judgement of movement of virtual characters. In conclusion a general method for synthesising speech-synchronised behaviour was invented that can applied to a whole range of behaviours

    Social talk capabilities for dialogue systems

    Get PDF
    Small talk capabilities are an important but very challenging extension to dialogue systems. Small talk (or social talk) refers to a kind of conversation, which does not focus on the exchange of information, but on the negotiation of social roles and situations. The goal of this thesis is to provide knowledge, processes and structures that can be used by dialogue systems to satisfactorily participate in social conversations. For this purpose the thesis presents research in the areas of natural-language understanding, dialogue management and error handling. Nine new models of social talk based on a data analysis of small talk conversations are described. The functionally-motivated and content-abstract models can be used for small talk conversations on various topics. The basic elements of the models consist of dialogue acts for social talk newly developed on basis of social science theory. The thesis also presents some conversation strategies for the treatment of so-called out-of-domain (OoD) utterances that can be used to avoid errors in the input understanding of dialogue systems. Additionally, the thesis describes a new extension to dialogue management that flexibly manages interwoven dialogue threads. The small talk models as well as the strategies for handling OoD utterances are encoded as computational dialogue threads

    Social talk capabilities for dialogue systems

    Get PDF
    Small talk capabilities are an important but very challenging extension to dialogue systems. Small talk (or “social talk”) refers to a kind of conversation, which does not focus on the exchange of information, but on the negotiation of social roles and situations. The goal of this thesis is to provide knowledge, processes and structures that can be used by dialogue systems to satisfactorily participate in social conversations. For this purpose the thesis presents research in the areas of natural-language understanding, dialogue management and error handling. Nine new models of social talk based on a data analysis of small talk conversations are described. The functionally-motivated and content-abstract models can be used for small talk conversations on various topics. The basic elements of the models consist of dialogue acts for social talk newly developed on basis of social science theory. The thesis also presents some conversation strategies for the treatment of so-called “out-of-domain” (OoD) utterances that can be used to avoid errors in the input understanding of dialogue systems. Additionally, the thesis describes a new extension to dialogue management that flexibly manages interwoven dialogue threads. The small talk models as well as the strategies for handling OoD utterances are encoded as computational dialogue threads

    Proceedings

    Get PDF
    Proceedings of the 3rd Nordic Symposium on Multimodal Communication. Editors: Patrizia Paggio, Elisabeth Ahlsén, Jens Allwood, Kristiina Jokinen, Costanza Navarretta. NEALT Proceedings Series, Vol. 15 (2011), vi+87 pp. © 2011 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/22532

    Gesture generation by imitation : from human behavior to computer character animation

    Get PDF
    This dissertation shows how to generate conversational gestures for an animated agent based on annotated text input. The central idea is to imitate the gestural behavior of human individuals. Using TV show recordings as empirical data, gestural key parameters are extracted for the generation of natural and individual gestures. For each of the three tasks in the generation pipeline a software was developed. The generic ANVIL annotation tool allows to transcribe gesture and speech in the empirical data. The NOVALIS module uses the annotations to compute individual gesture profiles with statistical methods. The NOVA generator creates gestures based on these profiles and heuristic rules, and outputs them in a linear script. In all, this work presents a complete work pipeline from collecting empirical data to obtaining an executable script and provides the necessary software, too.Die vorliegende Dissertation stellt einen Ansatz zur Generierung von Konversationsgesten fĂŒr animierte Agenten aus annotatiertem Textinput vor. Zentrale Idee ist es, die Gestik menschlicher Individuen zu imitieren. Als empirisches Material dient eine Fernsehsendung, aus der SchlĂŒsselparameter zur Generierung natĂŒrlicher und individueller Gesten extrahiert werden. Die Generierungsaufgabe wurde in drei Schritten mit eigens entwickelter Software gelöst. Das generische ANVIL-Annotationswerkzeug ermöglicht die Transkription von Gestik und Sprache in den empirischen Daten. Das NOVALIS-Modul berechnet aus den Annotationen individuelle Gestenprofile mit Hilfe statistischer Verfahren. Der NOVAGenerator erzeugt Gesten anhand dieser Profile und allgemeiner Heuristiken und gibt diese in Skriptform aus. Die Arbeit stellt somit einen vollstĂ€ndigen Arbeitspfad von empirischer Datenerhebung bis zum abspielfertigen Skript vor und liefert die entsprechenden Software-Werkzeuge dazu

    Gesture generation by imitation : from human behavior to computer character animation

    Get PDF
    This dissertation shows how to generate conversational gestures for an animated agent based on annotated text input. The central idea is to imitate the gestural behavior of human individuals. Using TV show recordings as empirical data, gestural key parameters are extracted for the generation of natural and individual gestures. For each of the three tasks in the generation pipeline a software was developed. The generic ANVIL annotation tool allows to transcribe gesture and speech in the empirical data. The NOVALIS module uses the annotations to compute individual gesture profiles with statistical methods. The NOVA generator creates gestures based on these profiles and heuristic rules, and outputs them in a linear script. In all, this work presents a complete work pipeline from collecting empirical data to obtaining an executable script and provides the necessary software, too.Die vorliegende Dissertation stellt einen Ansatz zur Generierung von Konversationsgesten fĂŒr animierte Agenten aus annotatiertem Textinput vor. Zentrale Idee ist es, die Gestik menschlicher Individuen zu imitieren. Als empirisches Material dient eine Fernsehsendung, aus der SchlĂŒsselparameter zur Generierung natĂŒrlicher und individueller Gesten extrahiert werden. Die Generierungsaufgabe wurde in drei Schritten mit eigens entwickelter Software gelöst. Das generische ANVIL-Annotationswerkzeug ermöglicht die Transkription von Gestik und Sprache in den empirischen Daten. Das NOVALIS-Modul berechnet aus den Annotationen individuelle Gestenprofile mit Hilfe statistischer Verfahren. Der NOVAGenerator erzeugt Gesten anhand dieser Profile und allgemeiner Heuristiken und gibt diese in Skriptform aus. Die Arbeit stellt somit einen vollstĂ€ndigen Arbeitspfad von empirischer Datenerhebung bis zum abspielfertigen Skript vor und liefert die entsprechenden Software-Werkzeuge dazu
    • 

    corecore