Search CORE

16 research outputs found

No-audio speaking status detection in crowded settings via visual pose-based filtering and wearable acceleration

Author: Cabrera-Quiros Laura
Hung Hayley
Vargas-Quiros Jose
Publication venue
Publication date: 01/11/2022
Field of study

Recognizing who is speaking in a crowded scene is a key challenge towards the understanding of the social interactions going on within. Detecting speaking status from body movement alone opens the door for the analysis of social scenes in which personal audio is not obtainable. Video and wearable sensors make it possible recognize speaking in an unobtrusive, privacy-preserving way. When considering the video modality, in action recognition problems, a bounding box is traditionally used to localize and segment out the target subject, to then recognize the action taking place within it. However, cross-contamination, occlusion, and the articulated nature of the human body, make this approach challenging in a crowded scene. Here, we leverage articulated body poses for subject localization and in the subsequent speech detection stage. We show that the selection of local features around pose keypoints has a positive effect on generalization performance while also significantly reducing the number of local features considered, making for a more efficient method. Using two in-the-wild datasets with different viewpoints of subjects, we investigate the role of cross-contamination in this effect. We additionally make use of acceleration measured through wearable sensors for the same task, and present a multimodal approach combining both methods

arXiv.org e-Print Archive

Who is where? Matching people in video to wearable acceleration during crowded mingling events

Author: Cabrera-Quiros Laura
Hung Hayley
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

ConferenciaWe address the challenging problem of associating acceler- ation data from a wearable sensor with the corresponding spatio-temporal region of a person in video during crowded mingling scenarios. This is an important rst step for multi- sensor behavior analysis using these two modalities. Clearly, as the numbers of people in a scene increases, there is also a need to robustly and automatically associate a region of the video with each person's device. We propose a hierarchi- cal association approach which exploits the spatial context of the scene, outperforming the state-of-the-art approaches signi cantly. Moreover, we present experiments on match- ing from 3 to more than 130 acceleration and video streams which, to our knowledge, is signi cantly larger than prior works where only up to 5 device streams are associated

Repository TU/e

Pure OAI Repository

Repositorio Institucional del Instituto Tecnologico de Costa Rica

Impact of annotation modality on label quality and model performance in the automatic assessment of laughter in-the-wild

Author: Cabrera-Quiros Laura
Hung Hayley
Oertel Catharine
Vargas-Quiros Jose
Publication venue
Publication date: 01/11/2022
Field of study

Laughter is considered one of the most overt signals of joy. Laughter is well-recognized as a multimodal phenomenon but is most commonly detected by sensing the sound of laughter. It is unclear how perception and annotation of laughter differ when annotated from other modalities like video, via the body movements of laughter. In this paper we take a first step in this direction by asking if and how well laughter can be annotated when only audio, only video (containing full body movement information) or audiovisual modalities are available to annotators. We ask whether annotations of laughter are congruent across modalities, and compare the effect that labeling modality has on machine learning model performance. We compare annotations and models for laughter detection, intensity estimation, and segmentation, three tasks common in previous studies of laughter. Our analysis of more than 4000 annotations acquired from 48 annotators revealed evidence for incongruity in the perception of laughter, and its intensity between modalities. Further analysis of annotations against consolidated audiovisual reference annotations revealed that recall was lower on average for video when compared to the audio condition, but tended to increase with the intensity of the laughter samples. Our machine learning experiments compared the performance of state-of-the-art unimodal (audio-based, video-based and acceleration-based) and multi-modal models for different combinations of input modalities, training label modality, and testing label modality. Models with video and acceleration inputs had similar performance regardless of training label modality, suggesting that it may be entirely appropriate to train models for laughter detection from body movements using video-acquired labels, despite their lower inter-rater agreement

arXiv.org e-Print Archive

Estimating self-assessed personality from body movements and proximity in crowded mingling scenarios

Author: Cabrera-Quiros Laura
Gedik Ekin
Hung Hayley
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

ArtículoThis paper focuses on the automatic classi cation of self- assessed personality traits from the HEXACO inventory du- ring crowded mingle scenarios. We exploit acceleration and proximity data from a wearable device hung around the neck. Unlike most state-of-the-art studies, addressing per- sonality estimation during mingle scenarios provides a cha- llenging social context as people interact dynamically and freely in a face-to-face setting. While many former studies use audio to extract speech-related features, we present a novel method of extracting an individual's speaking status from a single body worn triaxial accelerometer which scales easily to large populations. Moreover, by fusing both speech and movement energy related cues from just acceleration, our experimental results show improvements on the estima- tion of Humility over features extracted from a single behav- ioral modality. We validated our method on 71 participants where we obtained an accuracy of 69% for Honesty, Consci- entiousness and Openness to Experience. To our knowledge, this is the largest validation of personality estimation carried out in such a social context with simple wearable sensors

Repository TU/e

Crossref

Repositorio Institucional del Instituto Tecnologico de Costa Rica

Towards Analyzing and Predicting the Experience of Live Performances with Wearable Sensing

Author: Cabrera-Quiros Laura
Englebienne Gwenn
Gedik Ekin
Hung Hayley
Martella Claudio
Publication venue
Publication date: 01/01/2021
Field of study

We present an approach to interpret the response of audiences to live performances by processing mobile sensor data. We apply our method on three different datasets obtained from three live performances, where each audience member wore a single tri-axial accelerometer and proximity sensor embedded inside a smart sensor pack. Using these sensor data, we developed a novel approach to predict audience members’ self-reported experience of the performances in terms of enjoyment, immersion, willingness to recommend the event to others, and change in mood. The proposed method uses an unsupervised method to identify informative intervals of the event, using the linkage of the audience members’ bodily movements, and uses data from these intervals only to estimate the audience members’ experience. We also analyze how the relative location of members of the audience can affect their experience and present an automatic way of recovering neighborhood information based on proximity sensors. We further show that the linkage of the audience members’ bodily movements is informative of memorable moments which were later reported by the audience

University of Twente Research Information

A Hierarchical Approach for Associating Body-Worn Sensors to Video Regions in Crowded Mingling Scenarios

Author: Hayley Hung
Laura Cabrera-Quiros
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Towards Analyzing and Predicting the Experience of Live Performances with Wearable Sensing

Author: Cabrera-Quiros Laura
Englebienne Gwenn
Gedik Ekin
Hung Hayley
Martella Claudio
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

We present an approach to interpret the response of audiences to live performances by processing mobile sensor data. We apply our method on three different datasets obtained from three live performances, where each audience member wore a single tri-axial accelerometer and proximity sensor embedded inside a smart sensor pack. Using these sensor data, we developed a novel approach to predict audience members' self-reported experience of the performances in terms of enjoyment, immersion, willingness to recommend the event to others and change in mood. The proposed method uses an unsupervised method to identify informative intervals of the event, using the linkage of the audience members' bodily movements, and uses data from these intervals only to estimate the audience members' experience. We also analyze how the relative location of members of the audience can affect their experience and present an automatic way of recovering neighborhood information based on proximity sensors. We further show that the linkage of the audience members' bodily movements is informative of memorable moments which were later reported by the audience

VU Research Portal

Pure OAI Repository

University of Twente Research Information

The MatchNMingle dataset: a novel multi-sensor resource for the analysis of social interactions and group dynamics in-the-wild during free-standing conversations and speed dates

Author: Cabrera-Quiros Laura
Demetriou Andrew
Gedik Ekin
Hung Hayley
Meij Leander van der
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

We present MatchNMingle, a novel multimodal/multisensor dataset for the analysis of free-standing conversational groups and speed-dates in-the-wild. MatchNMingle leverages the use of wearable devices and overhead cameras to record social interactions of 92 people during real-life speed-dates, followed by a cocktail party. To our knowledge, MatchNMingle has the largest number of participants, longest recording time and largest set of manual annotations for social actions available in this context in a real-life scenario. It consists of 2 hours of data from wearable acceleration, binary proximity, video, audio, personality surveys, frontal pictures and speed-date responses. Participants' positions and group formations were manually annotated; as were social actions (eg. speaking, hand gesture) for 30 minutes at 20fps making it the first dataset to incorporate the annotation of such cues in this context. We present an empirical analysis of the performance of crowdsourcing workers against trained annotators in simple and complex annotation tasks, founding that although efficient for simple tasks, using crowdsourcing workers for more complex tasks like social action annotation led to additional overhead and poor inter-annotator agreement compared to trained annotators (differences up to 0.4 in Fleiss' Kappa coefficients). We also provide example experiments of how MatchNMingle can be used

Pure OAI Repository

Listen to the real experts: Detecting need of caregiver response in a NICU using multimodal monitoring signals

Author: Andriessen Peter
Cabrera Quiros Laura
Cottaar Eduardus J.E.
Long Xi
van Pul Carola
Varisco Gabriele
Zhan Zhuozhao
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/12/2021
Field of study

Vital signs are used in Neonatal Intensive Care Units (NICUs) to monitor the state of multiple patients at once. Alarms are triggered if a vital sign is below/above a predefined threshold. Numerous alarms sound each hour which could translate into an overload for the medical team, known as alarm fatigue. Yet many of these alarms do not require immediate clinical action of the caregivers. In this paper we automatically detect moments that need an immediate response (i.e. interaction with the patient) of the medical team in NICUs by using caregiver response to the patient, which is based on the interpretation of vital signs and of nonverbal cues (e.g. movements) delivered by patients. The ultimate goal of such approach is to reduce the overload of alarms while maintaining the patient safety. We use features extracted from the electrocardiogram (ECG) and pulse oxymetry (SpO2) sensors of the patient, as most unplanned interactions between patient and caregivers are due to deteriorations. Since in our unit an alarm can only be paused or silenced manually at the bedside, we used this information as a prior for caregiver response. We also propose different labeling schemes for classification, each representative of a possible interaction scenario within the nature of our problem. We accomplished a general detection of caregiver response with a mean AUC of 0.82. We also show that when trained only with stable and truly deteriorating (critical state) samples, the classifiers can better learn the difference between alarms that need no immediate response and those that do. In addition, we present an analysis of the posterior probabilities over time for different labeling schemes, and use it to speculate about the reasons behind some failure cases

Pure OAI Repository

Estimation of Heart Rate Directly from ECG Spectrogram in Neonate Intensive Care Units

Author: Andriessen Peter
Cabrera-Quiros Laura
Cottaar Eduardus J E
Long Xi
van Pul Carola
Varisco Gabriele
Zhan Zhuozhao
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2020
Field of study

This paper presents a simple yet novel method to estimate the heart frequency (HF) of neonates directly from the ECG signal, instead of using the RR-interval signals as generally done in clinical practices. From this, the heart rate (HR) can be derived. Thus, we avoid the use of peak detectors and the inherent errors that come with them.Our method leverages the highest Power Spectral Densities (PSD) of the ECG, for the bins around the frequencies related to heart rates for neonates, as they change in time (spectrograms).We tested our approach with the monitoring data of 6 days for 52 patients in a Neonate Intensive Care Unit (NICU) and compared against the HR from a commercial monitor, which produced a sample every second. The comparison showed that 92.4% of the samples have a difference lower than 5bpm. Moreover, we obtained a median MAE (Mean Absolute Error) between subjects equal to 2.28 bpm and a median RMSE (Root Mean Square Error) equal to 5.82 bpm. Although tested for neonates, we hypothesize that this method can also be customized for other populations.Finally, we analyze the failure cases of our method and found a direct co-allocation of errors due to moments with higher PSD in the lower frequencies with the presence of critical alarms related to other physiological systems (e.g. desaturation)

Crossref

Pure OAI Repository