330 research outputs found

    A comparison of addressee detection methods for multiparty conversations

    Get PDF
    Several algorithms have recently been proposed for recognizing addressees in a group conversational setting. These algorithms can rely on a variety of factors including previous conversational roles, gaze and type of dialogue act. Both statistical supervised machine learning algorithms as well as rule based methods have been developed. In this paper, we compare several algorithms developed for several different genres of muliparty dialogue, and propose a new synthesis algorithm that matches the performance of machine learning algorithms while maintaning the transparancy of semantically meaningfull rule-based algorithms

    Addressee Identification In Face-to-Face Meetings

    Get PDF
    We present results on addressee identification in four-participants face-to-face meetings using Bayesian Network and Naive Bayes classifiers. First, we investigate how well the addressee of a dialogue act can be predicted based on gaze, utterance and conversational context features. Then, we explore whether information about meeting context can aid classifiers’ performances. Both classifiers perform the best when conversational context and utterance features are combined with speaker’s gaze information. The classifiers show little gain from information about meeting context

    Virtual Meeting Rooms: From Observation to Simulation

    Get PDF
    Virtual meeting rooms are used for simulation of real meeting behavior and can show how people behave, how they gesture, move their heads, bodies, their gaze behavior during conversations. They are used for visualising models of meeting behavior, and they can be used for the evaluation of these models. They are also used to show the effects of controlling certain parameters on the behavior and in experiments to see what the effect is on communication when various channels of information - speech, gaze, gesture, posture - are switched off or manipulated in other ways. The paper presents the various stages in the development of a virtual meeting room as well and illustrates its uses by presenting some results of experiments to see whether human judges can induce conversational roles in a virtual meeting situation when they only see the head movements of participants in the meeting

    Inferring Intentions to Speak Using Accelerometer Data In-the-Wild

    Full text link
    Humans have good natural intuition to recognize when another person has something to say. It would be interesting if an AI can also recognize intentions to speak. Especially in scenarios when an AI is guiding a group discussion, this can be a useful skill. This work studies the inference of successful and unsuccessful intentions to speak from accelerometer data. This is chosen because it is privacy-preserving and feasible for in-the-wild settings since it can be placed in a smart badge. Data from a real-life social networking event is used to train a machine-learning model that aims to infer intentions to speak. A subset of unsuccessful intention-to-speak cases in the data is annotated. The model is trained on the successful intentions to speak and evaluated on both the successful and unsuccessful cases. In conclusion, there is useful information in accelerometer data, but not enough to reliably capture intentions to speak. For example, posture shifts are correlated with intentions to speak, but people also often shift posture without having an intention to speak, or have an intention to speak without shifting their posture. More modalities are likely needed to reliably infer intentions to speak

    Detecting Low Rapport During Natural Interactions in Small Groups from Non-Verbal Behaviour

    Full text link
    Rapport, the close and harmonious relationship in which interaction partners are "in sync" with each other, was shown to result in smoother social interactions, improved collaboration, and improved interpersonal outcomes. In this work, we are first to investigate automatic prediction of low rapport during natural interactions within small groups. This task is challenging given that rapport only manifests in subtle non-verbal signals that are, in addition, subject to influences of group dynamics as well as inter-personal idiosyncrasies. We record videos of unscripted discussions of three to four people using a multi-view camera system and microphones. We analyse a rich set of non-verbal signals for rapport detection, namely facial expressions, hand motion, gaze, speaker turns, and speech prosody. Using facial features, we can detect low rapport with an average precision of 0.7 (chance level at 0.25), while incorporating prior knowledge of participants' personalities can even achieve early prediction without a drop in performance. We further provide a detailed analysis of different feature sets and the amount of information contained in different temporal segments of the interactions.Comment: 12 pages, 6 figure

    Turn-taking patterns in human discourse and their impact on group communication service design

    Get PDF
    Recent studies demonstrated the benefit of integrating speaker prediction features into the design of group-communication services supporting multiparty online discourse. This paper aims at delivering a more elaborate analysis of speaker prediction by analyzing a larger volume of data. Moreover, it tests the existence of speakers dominating speaking time. Towards this end, we analyze tens of hours of recorded meeting and lecture sessions. Our principal results for meeting-like interaction manifest that the next speaker is one of the last four speakers with over 90% probability. This is seen consistently across our data with little variance (standard deviation of 8.71%) independent of the total number of potential speakers. Furthermore, lecture time is in most cases significantly dominated by the tutor. In meetings, although a single dominating speaker is always evident, domination exhibited high variability. Generally, our findings strengthen and further motivate the act of incorporating user-beha vior awareness into group communication service desig

    Social behavior modeling based on Incremental Discrete Hidden Markov Models

    No full text
    12 pagesInternational audienceModeling multimodal face-to-face interaction is a crucial step in the process of building social robots or users-aware Embodied Conversational Agents (ECA). In this context, we present a novel approach for human behavior analysis and generation based on what we called "Incremental Discrete Hidden Markov Model" (IDHMM). Joint multimodal activities of interlocutors are first modeled by a set of DHMMs that are specific to supposed joint cognitive states of the interlocutors. Respecting a task-specific syntax, the IDHMM is then built from these DHMMs and split into i) a recognition model that will determine the most likely sequence of cognitive states given the multimodal activity of the in- terlocutor, and ii) a generative model that will compute the most likely activity of the speaker given this estimated sequence of cognitive states. Short-Term Viterbi (STV) decoding is used to incrementally recognize and generate behav- ior. The proposed model is applied to parallel speech and gaze data of interact- ing dyads

    Addressee Identification In Face-to-Face Meetings

    Get PDF

    Moving together: the organisation of non-verbal cues during multiparty conversation

    Get PDF
    PhDConversation is a collaborative activity. In face-to-face interactions interlocutors have mutual access to a shared space. This thesis aims to explore the shared space as a resource for coordinating conversation. As well demonstrated in studies of two-person conversations, interlocutors can coordinate their speech and non-verbal behaviour in ways that manage the unfolding conversation. However, when scaling up from two people to three people interacting, the coordination challenges that the interlocutors face increase. In particular speakers must manage multiple listeners. This thesis examines the use of interlocutors’ bodies in shared space to coordinate their multiparty dialogue. The approach exploits corpora of motion captured triadic interactions. The thesis first explores how interlocutors coordinate their speech and non-verbal behaviour. Inter-person relationships are examined and compared with artificially created triples who did not interact. Results demonstrate that interlocutors avoid speaking and gesturing over each other, but tend to nod together. Evidence is presented that the two recipients of an utterance have different patterns of head and hand movement, and that some of the regularities of movement are correlated with the task structure. The empirical section concludes by uncovering a class of coordination events, termed simultaneous engagement events, that are unique to multiparty dialogue. They are constructed using combinations of speaker head orientation and gesture orientation. The events coordinate multiple recipients of the dialogue and potentially arise as a result of the greater coordination challenges that interlocutors face. They are marked in requiring a mutually accessible shared space in order to be considered an effective interactional cue. The thesis provides quantitative evidence that interlocutors’ head and hand movements are organised by their dialogue state and the task responsibilities that the bear. It is argued that a shared interaction space becomes a more important interactional resource when conversations scale up to three people
    corecore