330 research outputs found
A comparison of addressee detection methods for multiparty conversations
Several algorithms have recently been proposed for recognizing addressees in a group conversational setting. These algorithms can rely on a variety of factors including previous conversational roles, gaze and type of dialogue act. Both statistical supervised machine learning algorithms as well as rule based methods have been developed. In this paper, we compare several algorithms developed for several different genres of muliparty dialogue, and propose a new synthesis algorithm that matches the performance of machine learning algorithms while maintaning the transparancy of semantically meaningfull rule-based algorithms
Addressee Identification In Face-to-Face Meetings
We present results on addressee identification in four-participants face-to-face meetings using Bayesian Network and Naive Bayes classifiers. First, we investigate how well the addressee of a dialogue act can be predicted based on gaze, utterance and conversational context features. Then, we explore whether information about meeting context can aid classifiers’ performances. Both classifiers perform the best when conversational context and utterance features are combined with speaker’s gaze information. The classifiers show little gain from information about meeting context
Virtual Meeting Rooms: From Observation to Simulation
Virtual meeting rooms are used for simulation of real meeting behavior and can show how people behave, how they gesture, move their heads, bodies, their gaze behavior during conversations. They are used for visualising models of meeting behavior, and they can be used for the evaluation of these models. They are also used to show the effects of controlling certain parameters on the behavior and in experiments to see what the effect is on communication when various channels of information - speech, gaze, gesture, posture - are switched off or manipulated in other ways. The paper presents the various stages in the development of a virtual meeting room as well and illustrates its uses by presenting some results of experiments to see whether human judges can induce conversational roles in a virtual meeting situation when they only see the head movements of participants in the meeting
Inferring Intentions to Speak Using Accelerometer Data In-the-Wild
Humans have good natural intuition to recognize when another person has
something to say. It would be interesting if an AI can also recognize
intentions to speak. Especially in scenarios when an AI is guiding a group
discussion, this can be a useful skill. This work studies the inference of
successful and unsuccessful intentions to speak from accelerometer data. This
is chosen because it is privacy-preserving and feasible for in-the-wild
settings since it can be placed in a smart badge. Data from a real-life social
networking event is used to train a machine-learning model that aims to infer
intentions to speak. A subset of unsuccessful intention-to-speak cases in the
data is annotated. The model is trained on the successful intentions to speak
and evaluated on both the successful and unsuccessful cases. In conclusion,
there is useful information in accelerometer data, but not enough to reliably
capture intentions to speak. For example, posture shifts are correlated with
intentions to speak, but people also often shift posture without having an
intention to speak, or have an intention to speak without shifting their
posture. More modalities are likely needed to reliably infer intentions to
speak
Detecting Low Rapport During Natural Interactions in Small Groups from Non-Verbal Behaviour
Rapport, the close and harmonious relationship in which interaction partners
are "in sync" with each other, was shown to result in smoother social
interactions, improved collaboration, and improved interpersonal outcomes. In
this work, we are first to investigate automatic prediction of low rapport
during natural interactions within small groups. This task is challenging given
that rapport only manifests in subtle non-verbal signals that are, in addition,
subject to influences of group dynamics as well as inter-personal
idiosyncrasies. We record videos of unscripted discussions of three to four
people using a multi-view camera system and microphones. We analyse a rich set
of non-verbal signals for rapport detection, namely facial expressions, hand
motion, gaze, speaker turns, and speech prosody. Using facial features, we can
detect low rapport with an average precision of 0.7 (chance level at 0.25),
while incorporating prior knowledge of participants' personalities can even
achieve early prediction without a drop in performance. We further provide a
detailed analysis of different feature sets and the amount of information
contained in different temporal segments of the interactions.Comment: 12 pages, 6 figure
Turn-taking patterns in human discourse and their impact on group communication service design
Recent studies demonstrated the benefit of integrating speaker prediction features into the design of group-communication services supporting multiparty online discourse. This paper aims at delivering a more elaborate analysis of speaker prediction by analyzing a larger volume of data. Moreover, it tests the existence of speakers dominating speaking time. Towards this end, we analyze tens of hours of recorded meeting and lecture sessions. Our principal results for meeting-like interaction manifest that the next speaker is one of the last four speakers with over 90% probability. This is seen consistently across our data with little variance (standard deviation of 8.71%) independent of the total number of potential speakers. Furthermore, lecture time is in most cases significantly dominated by the tutor. In meetings, although a single dominating speaker is always evident, domination exhibited high variability. Generally, our findings strengthen and further motivate the act of incorporating user-beha vior awareness into group communication service desig
Social behavior modeling based on Incremental Discrete Hidden Markov Models
12 pagesInternational audienceModeling multimodal face-to-face interaction is a crucial step in the process of building social robots or users-aware Embodied Conversational Agents (ECA). In this context, we present a novel approach for human behavior analysis and generation based on what we called "Incremental Discrete Hidden Markov Model" (IDHMM). Joint multimodal activities of interlocutors are first modeled by a set of DHMMs that are specific to supposed joint cognitive states of the interlocutors. Respecting a task-specific syntax, the IDHMM is then built from these DHMMs and split into i) a recognition model that will determine the most likely sequence of cognitive states given the multimodal activity of the in- terlocutor, and ii) a generative model that will compute the most likely activity of the speaker given this estimated sequence of cognitive states. Short-Term Viterbi (STV) decoding is used to incrementally recognize and generate behav- ior. The proposed model is applied to parallel speech and gaze data of interact- ing dyads
Moving together: the organisation of non-verbal cues during multiparty conversation
PhDConversation is a collaborative activity. In face-to-face interactions interlocutors have mutual
access to a shared space. This thesis aims to explore the shared space as a resource for coordinating
conversation. As well demonstrated in studies of two-person conversations, interlocutors
can coordinate their speech and non-verbal behaviour in ways that manage the unfolding conversation.
However, when scaling up from two people to three people interacting, the coordination
challenges that the interlocutors face increase. In particular speakers must manage multiple listeners.
This thesis examines the use of interlocutors’ bodies in shared space to coordinate their
multiparty dialogue.
The approach exploits corpora of motion captured triadic interactions. The thesis first explores
how interlocutors coordinate their speech and non-verbal behaviour. Inter-person relationships
are examined and compared with artificially created triples who did not interact. Results demonstrate
that interlocutors avoid speaking and gesturing over each other, but tend to nod together.
Evidence is presented that the two recipients of an utterance have different patterns of head and
hand movement, and that some of the regularities of movement are correlated with the task structure.
The empirical section concludes by uncovering a class of coordination events, termed simultaneous
engagement events, that are unique to multiparty dialogue. They are constructed using
combinations of speaker head orientation and gesture orientation. The events coordinate multiple
recipients of the dialogue and potentially arise as a result of the greater coordination challenges
that interlocutors face. They are marked in requiring a mutually accessible shared space in order
to be considered an effective interactional cue.
The thesis provides quantitative evidence that interlocutors’ head and hand movements are
organised by their dialogue state and the task responsibilities that the bear. It is argued that a
shared interaction space becomes a more important interactional resource when conversations
scale up to three people
- …