6,383 research outputs found
Multi-modal Social Signal Analysis for Predicting Agreement in Conversation Settings
In this paper we present a non-invasive ambient intelligence framework for the analysis of non-verbal communication applied to conversational settings. In particular, we apply feature extraction techniques to multi-modal audio-RGB-depth data. We compute a set of behavioral indicators that define communicative cues coming from the fields of psychology and observational methodology. We test our methodology over data captured in victim-offender mediation scenarios. Using different state-of-the-art classification approaches, our system achieve upon 75% of recognition predicting agreement among the parts involved in the conversations, using as ground truth the experts opinions
Non-Verbal Communication Analysis in Victim-Offender Mediations
In this paper we present a non-invasive ambient intelligence framework for
the semi-automatic analysis of non-verbal communication applied to the
restorative justice field. In particular, we propose the use of computer vision
and social signal processing technologies in real scenarios of Victim-Offender
Mediations, applying feature extraction techniques to multi-modal
audio-RGB-depth data. We compute a set of behavioral indicators that define
communicative cues from the fields of psychology and observational methodology.
We test our methodology on data captured in real world Victim-Offender
Mediation sessions in Catalonia in collaboration with the regional government.
We define the ground truth based on expert opinions when annotating the
observed social responses. Using different state-of-the-art binary
classification approaches, our system achieves recognition accuracies of 86%
when predicting satisfaction, and 79% when predicting both agreement and
receptivity. Applying a regression strategy, we obtain a mean deviation for the
predictions between 0.5 and 0.7 in the range [1-5] for the computed social
signals.Comment: Please, find the supplementary video material at:
http://sunai.uoc.edu/~vponcel/video/VOMSessionSample.mp
Twente Debate Corpus - A Multimodal Corpus for Head Movement Analysis
This paper introduces a multimodal discussion corpus for the study into head movement and turn-taking patterns in debates. Given that participants either acted alone or in a pair, cooperation and competition and their nonverbal correlates can be analyzed. In addition to the video and audio of the recordings, the corpus contains automatically estimated head movements, and manual annotations of who is speaking and who is looking where. The corpus consists of over 2 hours of debates, in 6 groups with 18 participants in total. We describe the recording setup and present initial analyses of the recorded data. We found that the person who acted as single debater speaks more and also receives more attention compared to the other debaters, also when corrected for the time speaking.We also found that a single debater was more likely to speak after a team debater. Future work will be aimed at further analysis of the relation between speaking and looking patterns, the outcome of the debate and perceived dominance of the debaters
Backchannels: Quantity, Type and Timing Matters
In a perception experiment, we systematically varied the quantity, type and timing of backchannels. Participants viewed stimuli of a real speaker side-by-side with an animated listener and rated how human-like they perceived the latter's backchannel behavior. In addition, we obtained measures of appropriateness and optionality for each backchannel from key strokes. This approach allowed us to analyze the influence of each of the factors on entire fragments and on individual backchannels. The originally performed type and timing of a backchannel appeared to be more human-like, compared to a switched type or random timing. In addition, we found that nods are more often appropriate than vocalizations. For quantity, too few or too many backchannels per minute appeared to reduce the quality of the behavior. These findings are important for the design of algorithms for the automatic generation of backchannel behavior for artificial listeners
Automatic Context-Driven Inference of Engagement in HMI: A Survey
An integral part of seamless human-human communication is engagement, the
process by which two or more participants establish, maintain, and end their
perceived connection. Therefore, to develop successful human-centered
human-machine interaction applications, automatic engagement inference is one
of the tasks required to achieve engaging interactions between humans and
machines, and to make machines attuned to their users, hence enhancing user
satisfaction and technology acceptance. Several factors contribute to
engagement state inference, which include the interaction context and
interactants' behaviours and identity. Indeed, engagement is a multi-faceted
and multi-modal construct that requires high accuracy in the analysis and
interpretation of contextual, verbal and non-verbal cues. Thus, the development
of an automated and intelligent system that accomplishes this task has been
proven to be challenging so far. This paper presents a comprehensive survey on
previous work in engagement inference for human-machine interaction, entailing
interdisciplinary definition, engagement components and factors, publicly
available datasets, ground truth assessment, and most commonly used features
and methods, serving as a guide for the development of future human-machine
interaction interfaces with reliable context-aware engagement inference
capability. An in-depth review across embodied and disembodied interaction
modes, and an emphasis on the interaction context of which engagement
perception modules are integrated sets apart the presented survey from existing
surveys
ConfLab: A Rich Multimodal Multisensor Dataset of Free-Standing Social Interactions in the Wild
Recording the dynamics of unscripted human interactions in the wild is
challenging due to the delicate trade-offs between several factors: participant
privacy, ecological validity, data fidelity, and logistical overheads. To
address these, following a 'datasets for the community by the community' ethos,
we propose the Conference Living Lab (ConfLab): a new concept for multimodal
multisensor data collection of in-the-wild free-standing social conversations.
For the first instantiation of ConfLab described here, we organized a real-life
professional networking event at a major international conference. Involving 48
conference attendees, the dataset captures a diverse mix of status,
acquaintance, and networking motivations. Our capture setup improves upon the
data fidelity of prior in-the-wild datasets while retaining privacy
sensitivity: 8 videos (1920x1080, 60 fps) from a non-invasive overhead view,
and custom wearable sensors with onboard recording of body motion (full 9-axis
IMU), privacy-preserving low-frequency audio (1250 Hz), and Bluetooth-based
proximity. Additionally, we developed custom solutions for distributed hardware
synchronization at acquisition, and time-efficient continuous annotation of
body keypoints and actions at high sampling rates. Our benchmarks showcase some
of the open research tasks related to in-the-wild privacy-preserving social
data analysis: keypoints detection from overhead camera views, skeleton-based
no-audio speaker detection, and F-formation detection.Comment: v2 is the version submitted to Neurips 2022 Datasets and Benchmarks
Trac
- …