145 research outputs found

    Addressee and Response Selection in Multi-Party Conversations with Speaker Interaction RNNs

    Full text link
    In this paper, we study the problem of addressee and response selection in multi-party conversations. Understanding multi-party conversations is challenging because of complex speaker interactions: multiple speakers exchange messages with each other, playing different roles (sender, addressee, observer), and these roles vary across turns. To tackle this challenge, we propose the Speaker Interaction Recurrent Neural Network (SI-RNN). Whereas the previous state-of-the-art system updated speaker embeddings only for the sender, SI-RNN uses a novel dialog encoder to update speaker embeddings in a role-sensitive way. Additionally, unlike the previous work that selected the addressee and response separately, SI-RNN selects them jointly by viewing the task as a sequence prediction problem. Experimental results show that SI-RNN significantly improves the accuracy of addressee and response selection, particularly in complex conversations with many speakers and responses to distant messages many turns in the past.Comment: AAAI 201

    A comparison of addressee detection methods for multiparty conversations

    Get PDF
    Several algorithms have recently been proposed for recognizing addressees in a group conversational setting. These algorithms can rely on a variety of factors including previous conversational roles, gaze and type of dialogue act. Both statistical supervised machine learning algorithms as well as rule based methods have been developed. In this paper, we compare several algorithms developed for several different genres of muliparty dialogue, and propose a new synthesis algorithm that matches the performance of machine learning algorithms while maintaning the transparancy of semantically meaningfull rule-based algorithms

    Are you talking to me? Improving the robustness of dialogue systems in a multi party HRI scenario by incorporating gaze direction and lip movement of attendees

    Get PDF
    Richter V, Carlmeyer B, Lier F, et al. Are you talking to me? Improving the robustness of dialogue systems in a multi party HRI scenario by incorporating gaze direction and lip movement of attendees. In: Proceedings of the Fourth International Conference on Human-agent Interaction. Proceedings of the Fourth International Conference on Human-agent Interaction. Singapore: ACM Digital Library; 2016.In this paper we present our humanoid robot “Meka”, partici- pating in a multi party human robot dialogue scenario. Active arbitration of the robot's attention based-on multi-modal stim- uli is utilised to attain persons which are outside of the robots field of view. We investigate the impact of this attention management and an addressee recognition on the robot's capability to distinguish utterances directed at it from communication between humans. Based on the results of a user study, we show that mutual gaze at the end of an utterance, as a means of yielding a turn, is a substantial cue for addressee recognition. Verification of a speaker through the detection of lip movements can be used to further increase precision. Further- more, we show that even a rather simplistic fusion of gaze and lip movement cues allows a considerable enhancement in addressee estimation, and can be altered to adapt to the requirements of a particular scenario

    Moving together: the organisation of non-verbal cues during multiparty conversation

    Get PDF
    PhDConversation is a collaborative activity. In face-to-face interactions interlocutors have mutual access to a shared space. This thesis aims to explore the shared space as a resource for coordinating conversation. As well demonstrated in studies of two-person conversations, interlocutors can coordinate their speech and non-verbal behaviour in ways that manage the unfolding conversation. However, when scaling up from two people to three people interacting, the coordination challenges that the interlocutors face increase. In particular speakers must manage multiple listeners. This thesis examines the use of interlocutors’ bodies in shared space to coordinate their multiparty dialogue. The approach exploits corpora of motion captured triadic interactions. The thesis first explores how interlocutors coordinate their speech and non-verbal behaviour. Inter-person relationships are examined and compared with artificially created triples who did not interact. Results demonstrate that interlocutors avoid speaking and gesturing over each other, but tend to nod together. Evidence is presented that the two recipients of an utterance have different patterns of head and hand movement, and that some of the regularities of movement are correlated with the task structure. The empirical section concludes by uncovering a class of coordination events, termed simultaneous engagement events, that are unique to multiparty dialogue. They are constructed using combinations of speaker head orientation and gesture orientation. The events coordinate multiple recipients of the dialogue and potentially arise as a result of the greater coordination challenges that interlocutors face. They are marked in requiring a mutually accessible shared space in order to be considered an effective interactional cue. The thesis provides quantitative evidence that interlocutors’ head and hand movements are organised by their dialogue state and the task responsibilities that the bear. It is argued that a shared interaction space becomes a more important interactional resource when conversations scale up to three people

    Towards Automatic Dialogue Understanding

    Get PDF
    In this paper we will present work carried out to scale up the system for text understanding called GETARUNS, and port it to be used in dialogue understanding. The current goal is that of extracting automatically argumentative information in order to build argumentative structure. The long term goal is using argumentative structure to produce automatic summarization of spoken dialogues. Very much like other deep linguistic processing systems (see Allen et al, 2007), our system is a generic text/dialogue understanding system that can be used in connection with an ontology – WordNet – and other similar repositories of commonsense knowledge. Word sense disambiguation takes place at the level of semantic interpretation and is represented in the Discourse Model. We will present the adjustments we made in order to cope with transcribed spoken dialogues like those produced in the ICSI Berkely project. The low level component is organized according to LFG theory; at this level, the system does pronominal binding, quantifier raising and temporal interpretation. The high level component is where the Discourse Model is created from the Logical Form. For longer sentences the system switches from the top-down to the bottom-up system. In case of failure it will back off to the partial system which produces a very lean and shallow semantics with no inference rules. In a final section, we present preliminary evaluation of the system on two tasks: the task of automatic argumentative labelling and another frequently addressed task: referential vs. non-referential pronominal detection. Results obtained fair much higher than those reported in similar experiments with machine learning approaches

    Context-based multimodal interpretation : an integrated approach to multimodal fusion and discourse processing

    Get PDF
    This thesis is concerned with the context-based interpretation of verbal and nonverbal contributions to interactions in multimodal multiparty dialogue systems. On the basis of a detailed analysis of context-dependent multimodal discourse phenomena, a comprehensive context model is developed. This context model supports the resolution of a variety of referring and elliptical expressions as well as the processing and reactive generation of turn-taking signals and the identification of the intended addressee(s) of a contribution. A major goal of this thesis is the development of a generic component for multimodal fusion and discourse processing. Based on the integration of this component into three distinct multimodal dialogue systems, the generic applicability of the approach is shown.Diese Dissertation befasst sich mit der kontextbasierten Interpretation von verbalen und nonverbalen Gesprächsbeiträgen im Rahmen von multimodalen Dialogsystemen. Im Rahmen dieser Arbeit wird, basierend auf einer detaillierten Analyse multimodaler Diskursphänomene, ein umfassendes Modell des Gesprächskontextes erarbeitet. Dieses Modell soll sowohl die Verarbeitung einer Vielzahl von referentiellen und elliptischen Ausdrücken, als auch die Erzeugung reaktiver Aktionen wie sie für den Sprecherwechsel benötigt werden unterstützen. Ein zentrales Ziel dieser Arbeit ist die Entwicklung einer generischen Komponente zur multimodalen Fusion und Diskursverarbeitung. Anhand der Integration dieser Komponente in drei unterschiedliche Dialogsysteme soll der generische Charakter dieser Komponente gezeigt werden

    Automatic Context-Driven Inference of Engagement in HMI: A Survey

    Full text link
    An integral part of seamless human-human communication is engagement, the process by which two or more participants establish, maintain, and end their perceived connection. Therefore, to develop successful human-centered human-machine interaction applications, automatic engagement inference is one of the tasks required to achieve engaging interactions between humans and machines, and to make machines attuned to their users, hence enhancing user satisfaction and technology acceptance. Several factors contribute to engagement state inference, which include the interaction context and interactants' behaviours and identity. Indeed, engagement is a multi-faceted and multi-modal construct that requires high accuracy in the analysis and interpretation of contextual, verbal and non-verbal cues. Thus, the development of an automated and intelligent system that accomplishes this task has been proven to be challenging so far. This paper presents a comprehensive survey on previous work in engagement inference for human-machine interaction, entailing interdisciplinary definition, engagement components and factors, publicly available datasets, ground truth assessment, and most commonly used features and methods, serving as a guide for the development of future human-machine interaction interfaces with reliable context-aware engagement inference capability. An in-depth review across embodied and disembodied interaction modes, and an emphasis on the interaction context of which engagement perception modules are integrated sets apart the presented survey from existing surveys
    corecore