145 research outputs found
Addressee and Response Selection in Multi-Party Conversations with Speaker Interaction RNNs
In this paper, we study the problem of addressee and response selection in
multi-party conversations. Understanding multi-party conversations is
challenging because of complex speaker interactions: multiple speakers exchange
messages with each other, playing different roles (sender, addressee,
observer), and these roles vary across turns. To tackle this challenge, we
propose the Speaker Interaction Recurrent Neural Network (SI-RNN). Whereas the
previous state-of-the-art system updated speaker embeddings only for the
sender, SI-RNN uses a novel dialog encoder to update speaker embeddings in a
role-sensitive way. Additionally, unlike the previous work that selected the
addressee and response separately, SI-RNN selects them jointly by viewing the
task as a sequence prediction problem. Experimental results show that SI-RNN
significantly improves the accuracy of addressee and response selection,
particularly in complex conversations with many speakers and responses to
distant messages many turns in the past.Comment: AAAI 201
A comparison of addressee detection methods for multiparty conversations
Several algorithms have recently been proposed for recognizing addressees in a group conversational setting. These algorithms can rely on a variety of factors including previous conversational roles, gaze and type of dialogue act. Both statistical supervised machine learning algorithms as well as rule based methods have been developed. In this paper, we compare several algorithms developed for several different genres of muliparty dialogue, and propose a new synthesis algorithm that matches the performance of machine learning algorithms while maintaning the transparancy of semantically meaningfull rule-based algorithms
Are you talking to me? Improving the robustness of dialogue systems in a multi party HRI scenario by incorporating gaze direction and lip movement of attendees
Richter V, Carlmeyer B, Lier F, et al. Are you talking to me? Improving the robustness of dialogue systems in a multi party HRI scenario by incorporating gaze direction and lip movement of attendees. In: Proceedings of the Fourth International Conference on Human-agent Interaction. Proceedings of the Fourth International Conference on Human-agent Interaction. Singapore: ACM Digital Library; 2016.In this paper we present our humanoid robot “Meka”, partici- pating in a multi party human robot dialogue scenario. Active arbitration of the robot's attention based-on multi-modal stim- uli is utilised to attain persons which are outside of the robots field of view. We investigate the impact of this attention management and an addressee recognition on the robot's capability to distinguish utterances directed at it from communication between humans. Based on the results of a user study, we show that mutual gaze at the end of an utterance, as a means of yielding a turn, is a substantial cue for addressee recognition. Verification of a speaker through the detection of lip movements can be used to further increase precision. Further- more, we show that even a rather simplistic fusion of gaze and lip movement cues allows a considerable enhancement in addressee estimation, and can be altered to adapt to the requirements of a particular scenario
Moving together: the organisation of non-verbal cues during multiparty conversation
PhDConversation is a collaborative activity. In face-to-face interactions interlocutors have mutual
access to a shared space. This thesis aims to explore the shared space as a resource for coordinating
conversation. As well demonstrated in studies of two-person conversations, interlocutors
can coordinate their speech and non-verbal behaviour in ways that manage the unfolding conversation.
However, when scaling up from two people to three people interacting, the coordination
challenges that the interlocutors face increase. In particular speakers must manage multiple listeners.
This thesis examines the use of interlocutors’ bodies in shared space to coordinate their
multiparty dialogue.
The approach exploits corpora of motion captured triadic interactions. The thesis first explores
how interlocutors coordinate their speech and non-verbal behaviour. Inter-person relationships
are examined and compared with artificially created triples who did not interact. Results demonstrate
that interlocutors avoid speaking and gesturing over each other, but tend to nod together.
Evidence is presented that the two recipients of an utterance have different patterns of head and
hand movement, and that some of the regularities of movement are correlated with the task structure.
The empirical section concludes by uncovering a class of coordination events, termed simultaneous
engagement events, that are unique to multiparty dialogue. They are constructed using
combinations of speaker head orientation and gesture orientation. The events coordinate multiple
recipients of the dialogue and potentially arise as a result of the greater coordination challenges
that interlocutors face. They are marked in requiring a mutually accessible shared space in order
to be considered an effective interactional cue.
The thesis provides quantitative evidence that interlocutors’ head and hand movements are
organised by their dialogue state and the task responsibilities that the bear. It is argued that a
shared interaction space becomes a more important interactional resource when conversations
scale up to three people
Towards Automatic Dialogue Understanding
In this paper we will present work carried out to scale up the system for text understanding called GETARUNS, and port it to be used in dialogue understanding. The current goal is that of extracting automatically argumentative information in order to build argumentative structure. The long term goal is using argumentative structure to produce automatic summarization of spoken dialogues.
Very much like other deep linguistic processing systems (see Allen et al, 2007), our system is a generic text/dialogue understanding system that can be used in connection with an ontology – WordNet – and other similar repositories of commonsense knowledge. Word sense disambiguation takes place at the level of semantic interpretation and is represented in the Discourse Model. We will present the adjustments we made in order to cope with transcribed spoken dialogues like those produced in the ICSI Berkely project. The low level component is organized according to LFG theory; at this level, the system does pronominal binding, quantifier raising and temporal interpretation. The high level component is where the Discourse Model is created from the Logical Form. For longer sentences the system switches from the top-down to the bottom-up system. In case of failure it will back off to the partial system which produces a very lean and shallow semantics with no inference rules.
In a final section, we present preliminary evaluation of the system on two tasks: the task of automatic argumentative labelling and another frequently addressed task: referential vs. non-referential pronominal detection. Results obtained fair much higher than those reported in similar experiments with machine learning approaches
Context-based multimodal interpretation : an integrated approach to multimodal fusion and discourse processing
This thesis is concerned with the context-based interpretation of verbal and nonverbal contributions to interactions in multimodal multiparty dialogue systems. On the basis of a detailed analysis of context-dependent multimodal discourse phenomena, a comprehensive context model is developed. This context model supports the resolution of a variety of referring and elliptical expressions as well as the processing and reactive generation of turn-taking signals and the identification of the intended addressee(s) of a contribution. A major goal of this thesis is the development of a generic component for multimodal fusion and discourse processing. Based on the integration of this component into three distinct multimodal dialogue systems, the generic applicability of the approach is shown.Diese Dissertation befasst sich mit der kontextbasierten Interpretation von verbalen und nonverbalen Gesprächsbeiträgen im Rahmen von multimodalen Dialogsystemen. Im Rahmen dieser Arbeit wird, basierend auf einer detaillierten Analyse multimodaler Diskursphänomene, ein umfassendes Modell des Gesprächskontextes erarbeitet. Dieses Modell soll sowohl die Verarbeitung einer Vielzahl von referentiellen und elliptischen Ausdrücken, als auch die Erzeugung reaktiver Aktionen wie sie für den Sprecherwechsel benötigt werden unterstützen. Ein zentrales Ziel dieser Arbeit ist die Entwicklung einer generischen Komponente zur multimodalen Fusion und Diskursverarbeitung. Anhand der Integration dieser Komponente in drei unterschiedliche Dialogsysteme soll der generische Charakter dieser Komponente gezeigt werden
Automatic Context-Driven Inference of Engagement in HMI: A Survey
An integral part of seamless human-human communication is engagement, the
process by which two or more participants establish, maintain, and end their
perceived connection. Therefore, to develop successful human-centered
human-machine interaction applications, automatic engagement inference is one
of the tasks required to achieve engaging interactions between humans and
machines, and to make machines attuned to their users, hence enhancing user
satisfaction and technology acceptance. Several factors contribute to
engagement state inference, which include the interaction context and
interactants' behaviours and identity. Indeed, engagement is a multi-faceted
and multi-modal construct that requires high accuracy in the analysis and
interpretation of contextual, verbal and non-verbal cues. Thus, the development
of an automated and intelligent system that accomplishes this task has been
proven to be challenging so far. This paper presents a comprehensive survey on
previous work in engagement inference for human-machine interaction, entailing
interdisciplinary definition, engagement components and factors, publicly
available datasets, ground truth assessment, and most commonly used features
and methods, serving as a guide for the development of future human-machine
interaction interfaces with reliable context-aware engagement inference
capability. An in-depth review across embodied and disembodied interaction
modes, and an emphasis on the interaction context of which engagement
perception modules are integrated sets apart the presented survey from existing
surveys
- …