590 research outputs found

    Spotting Agreement and Disagreement: A Survey of Nonverbal Audiovisual Cues and Tools

    Get PDF
    While detecting and interpreting temporal patterns of non–verbal behavioral cues in a given context is a natural and often unconscious process for humans, it remains a rather difficult task for computer systems. Nevertheless, it is an important one to achieve if the goal is to realise a naturalistic communication between humans and machines. Machines that are able to sense social attitudes like agreement and disagreement and respond to them in a meaningful way are likely to be welcomed by users due to the more natural, efficient and human–centered interaction they are bound to experience. This paper surveys the nonverbal cues that could be present during agreement and disagreement behavioural displays and lists a number of tools that could be useful in detecting them, as well as a few publicly available databases that could be used to train these tools for analysis of spontaneous, audiovisual instances of agreement and disagreement

    LOMo: Latent Ordinal Model for Facial Analysis in Videos

    Full text link
    We study the problem of facial analysis in videos. We propose a novel weakly supervised learning method that models the video event (expression, pain etc.) as a sequence of automatically mined, discriminative sub-events (eg. onset and offset phase for smile, brow lower and cheek raise for pain). The proposed model is inspired by the recent works on Multiple Instance Learning and latent SVM/HCRF- it extends such frameworks to model the ordinal or temporal aspect in the videos, approximately. We obtain consistent improvements over relevant competitive baselines on four challenging and publicly available video based facial analysis datasets for prediction of expression, clinical pain and intent in dyadic conversations. In combination with complimentary features, we report state-of-the-art results on these datasets.Comment: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR

    Measuring, analysing and artificially generating head nodding signals in dyadic social interaction

    Get PDF
    Social interaction involves rich and complex behaviours where verbal and non-verbal signals are exchanged in dynamic patterns. The aim of this thesis is to explore new ways of measuring and analysing interpersonal coordination as it naturally occurs in social interactions. Specifically, we want to understand what different types of head nods mean in different social contexts, how they are used during face-to-face dyadic conversation, and if they relate to memory and learning. Many current methods are limited by time-consuming and low-resolution data, which cannot capture the full richness of a dyadic social interaction. This thesis explores ways to demonstrate how high-resolution data in this area can give new insights into the study of social interaction. Furthermore, we also want to demonstrate the benefit of using virtual reality to artificially generate interpersonal coordination to test our hypotheses about the meaning of head nodding as a communicative signal. The first study aims to capture two patterns of head nodding signals – fast nods and slow nods – and determine what they mean and how they are used across different conversational contexts. We find that fast nodding signals receiving new information and has a different meaning than slow nods. The second study aims to investigate a link between memory and head nodding behaviour. This exploratory study provided initial hints that there might be a relationship, though further analyses were less clear. In the third study, we aim to test if interactive head nodding in virtual agents can be used to measure how much we like the virtual agent, and whether we learn better from virtual agents that we like. We find no causal link between memory performance and interactivity. In the fourth study, we perform a cross-experimental analysis of how the level of interactivity in different contexts (i.e., real, virtual, and video), impacts on memory and find clear differences between them

    Robust Modeling of Epistemic Mental States

    Full text link
    This work identifies and advances some research challenges in the analysis of facial features and their temporal dynamics with epistemic mental states in dyadic conversations. Epistemic states are: Agreement, Concentration, Thoughtful, Certain, and Interest. In this paper, we perform a number of statistical analyses and simulations to identify the relationship between facial features and epistemic states. Non-linear relations are found to be more prevalent, while temporal features derived from original facial features have demonstrated a strong correlation with intensity changes. Then, we propose a novel prediction framework that takes facial features and their nonlinear relation scores as input and predict different epistemic states in videos. The prediction of epistemic states is boosted when the classification of emotion changing regions such as rising, falling, or steady-state are incorporated with the temporal features. The proposed predictive models can predict the epistemic states with significantly improved accuracy: correlation coefficient (CoERR) for Agreement is 0.827, for Concentration 0.901, for Thoughtful 0.794, for Certain 0.854, and for Interest 0.913.Comment: Accepted for Publication in Multimedia Tools and Application, Special Issue: Socio-Affective Technologie

    Are You on My Wavelength? Interpersonal Coordination in Dyadic Conversations

    Get PDF
    Conversation between two people involves subtle nonverbal coordination in addition to speech. However, the precise parameters and timing of this coordination remain unclear, which limits our ability to theorize about the neural and cognitive mechanisms of social coordination. In particular, it is unclear if conversation is dominated by synchronization (with no time lag), rapid and reactive mimicry (with lags under 1 s) or traditionally observed mimicry (with several seconds lag), each of which demands a different neural mechanism. Here we describe data from high-resolution motion capture of the head movements of pairs of participants (n = 31 dyads) engaged in structured conversations. In a pre-registered analysis pathway, we calculated the wavelet coherence of head motion within dyads as a measure of their nonverbal coordination and report two novel results. First, low-frequency coherence (0.2–1.1 Hz) is consistent with traditional observations of mimicry, and modeling shows this behavior is generated by a mechanism with a constant 600 ms lag between leader and follower. This is in line with rapid reactive (rather than predictive or memory-driven) models of mimicry behavior, and could be implemented in mirror neuron systems. Second, we find an unexpected pattern of lower-than-chance coherence between participants, or hypo-coherence, at high frequencies (2.6–6.5 Hz). Exploratory analyses show that this systematic decoupling is driven by fast nodding from the listening member of the dyad, and may be a newly identified social signal. These results provide a step towards the quantification of real-world human behavior in high resolution and provide new insights into the mechanisms of social coordination

    Discriminatively Trained Latent Ordinal Model for Video Classification

    Full text link
    We study the problem of video classification for facial analysis and human action recognition. We propose a novel weakly supervised learning method that models the video as a sequence of automatically mined, discriminative sub-events (eg. onset and offset phase for "smile", running and jumping for "highjump"). The proposed model is inspired by the recent works on Multiple Instance Learning and latent SVM/HCRF -- it extends such frameworks to model the ordinal aspect in the videos, approximately. We obtain consistent improvements over relevant competitive baselines on four challenging and publicly available video based facial analysis datasets for prediction of expression, clinical pain and intent in dyadic conversations and on three challenging human action datasets. We also validate the method with qualitative results and show that they largely support the intuitions behind the method.Comment: Paper accepted in IEEE TPAMI. arXiv admin note: substantial text overlap with arXiv:1604.0150

    Moving together: the organisation of non-verbal cues during multiparty conversation

    Get PDF
    PhDConversation is a collaborative activity. In face-to-face interactions interlocutors have mutual access to a shared space. This thesis aims to explore the shared space as a resource for coordinating conversation. As well demonstrated in studies of two-person conversations, interlocutors can coordinate their speech and non-verbal behaviour in ways that manage the unfolding conversation. However, when scaling up from two people to three people interacting, the coordination challenges that the interlocutors face increase. In particular speakers must manage multiple listeners. This thesis examines the use of interlocutors’ bodies in shared space to coordinate their multiparty dialogue. The approach exploits corpora of motion captured triadic interactions. The thesis first explores how interlocutors coordinate their speech and non-verbal behaviour. Inter-person relationships are examined and compared with artificially created triples who did not interact. Results demonstrate that interlocutors avoid speaking and gesturing over each other, but tend to nod together. Evidence is presented that the two recipients of an utterance have different patterns of head and hand movement, and that some of the regularities of movement are correlated with the task structure. The empirical section concludes by uncovering a class of coordination events, termed simultaneous engagement events, that are unique to multiparty dialogue. They are constructed using combinations of speaker head orientation and gesture orientation. The events coordinate multiple recipients of the dialogue and potentially arise as a result of the greater coordination challenges that interlocutors face. They are marked in requiring a mutually accessible shared space in order to be considered an effective interactional cue. The thesis provides quantitative evidence that interlocutors’ head and hand movements are organised by their dialogue state and the task responsibilities that the bear. It is argued that a shared interaction space becomes a more important interactional resource when conversations scale up to three people

    SEWA DB: A rich database for audio-visual emotion and sentiment research in the wild

    Get PDF
    Natural human-computer interaction and audio-visual human behaviour sensing systems, which would achieve robust performance in-the-wild are more needed than ever as digital devices are becoming indispensable part of our life more and more. Accurately annotated real-world data are the crux in devising such systems. However, existing databases usually consider controlled settings, low demographic variability, and a single task. In this paper, we introduce the SEWA database of more than 2000 minutes of audio-visual data of 398 people coming from six cultures, 50% female, and uniformly spanning the age range of 18 to 65 years old. Subjects were recorded in two different contexts: while watching adverts and while discussing adverts in a video chat. The database includes rich annotations of the recordings in terms of facial landmarks, facial action units (FAU), various vocalisations, mirroring, and continuously valued valence, arousal, liking, agreement, and prototypic examples of (dis)liking. This database aims to be an extremely valuable resource for researchers in affective computing and automatic human sensing and is expected to push forward the research in human behaviour analysis, including cultural studies. Along with the database, we provide extensive baseline experiments for automatic FAU detection and automatic valence, arousal and (dis)liking intensity estimation
    corecore