304 research outputs found
ConfLab: A Rich Multimodal Multisensor Dataset of Free-Standing Social Interactions in the Wild
Recording the dynamics of unscripted human interactions in the wild is
challenging due to the delicate trade-offs between several factors: participant
privacy, ecological validity, data fidelity, and logistical overheads. To
address these, following a 'datasets for the community by the community' ethos,
we propose the Conference Living Lab (ConfLab): a new concept for multimodal
multisensor data collection of in-the-wild free-standing social conversations.
For the first instantiation of ConfLab described here, we organized a real-life
professional networking event at a major international conference. Involving 48
conference attendees, the dataset captures a diverse mix of status,
acquaintance, and networking motivations. Our capture setup improves upon the
data fidelity of prior in-the-wild datasets while retaining privacy
sensitivity: 8 videos (1920x1080, 60 fps) from a non-invasive overhead view,
and custom wearable sensors with onboard recording of body motion (full 9-axis
IMU), privacy-preserving low-frequency audio (1250 Hz), and Bluetooth-based
proximity. Additionally, we developed custom solutions for distributed hardware
synchronization at acquisition, and time-efficient continuous annotation of
body keypoints and actions at high sampling rates. Our benchmarks showcase some
of the open research tasks related to in-the-wild privacy-preserving social
data analysis: keypoints detection from overhead camera views, skeleton-based
no-audio speaker detection, and F-formation detection.Comment: v2 is the version submitted to Neurips 2022 Datasets and Benchmarks
Trac
Recommended from our members
High reliability Android application for multidevice multimodal mobile data acquisition and annotation
We have completed the collection of one of the richest accurately annotated mobile dataset of modes of transportation and locomotion. To do this, we developed a highly reliable Android application called DataLogger capable of recording multisensor data from multiple synchronized smartphones simultaneously. The application allows real-time data annotation. We explain how we designed the app to achieve high reliability and ease of use. We also present an evaluation of the application in a big-data collection (750 hours, 950 GB of data, 17 different sensor modalities), analysing the data loss (less than 0.4‰) and battery consumption (≈6% on average per hour). The application is available as open source
Protocol for PD SENSORS:Parkinson’s Disease Symptom Evaluation in a Naturalistic Setting producing Outcomes measuRes using SPHERE technology. An observational feasibility study of multi-modal multi-sensor technology to measure symptoms and activities of daily living in Parkinson’s disease
Introduction The impact of disease-modifying agents on disease progression in Parkinson’s disease is largely assessed in clinical trials using clinical rating scales. These scales have drawbacks in terms of their ability to capture the fluctuating nature of symptoms while living in a naturalistic environment. The SPHERE (Sensor Platform for HEalthcare in a Residential Environment) project has designed a multi-sensor platform with multimodal devices designed to allow continuous, relatively inexpensive, unobtrusive sensing of motor, non-motor and activities of daily living metrics in a home or a home-like environment. The aim of this study is to evaluate how the SPHERE technology can measure aspects of Parkinson’s disease.Methods and analysis This is a small-scale feasibility and acceptability study during which 12 pairs of participants (comprising a person with Parkinson’s and a healthy control participant) will stay and live freely for 5 days in a home-like environment embedded with SPHERE technology including environmental, appliance monitoring, wrist-worn accelerometry and camera sensors. These data will be collected alongside clinical rating scales, participant diary entries and expert clinician annotations of colour video images. Machine learning will be used to look for a signal to discriminate between Parkinson’s disease and control, and between Parkinson’s disease symptoms ‘on’ and ‘off’ medications. Additional outcome measures including bradykinesia, activity level, sleep parameters and some activities of daily living will be explored. Acceptability of the technology will be evaluated qualitatively using semi-structured interviews.Ethics and dissemination Ethical approval has been given to commence this study; the results will be disseminated as widely as appropriate
Real-Time Management of Multimodal Streaming Data for Monitoring of Epileptic Patients
This is the Accepted Manuscript version of the following article: I. Mporas, D. Triantafyllopoulos, V. Megalooikonomou, “Real-Time Management of Multimodal Streaming Data for Monitoring of Epileptic Patients”, Journal of Medical Systems, Vol. 40(45), December 2015. The final published versions is available at: https://link.springer.com/article/10.1007%2Fs10916-015-0403-3 © Springer Science+Business Media New York 2015.New generation of healthcare is represented by wearable health monitoring systems, which provide real-time monitoring of patient’s physiological parameters. It is expected that continuous ambulatory monitoring of vital signals will improve treatment of patients and enable proactive personal health management. In this paper, we present the implementation of a multimodal real-time system for epilepsy management. The proposed methodology is based on a data streaming architecture and efficient management of a big flow of physiological parameters. The performance of this architecture is examined for varying spatial resolution of the recorded data.Peer reviewedFinal Accepted Versio
The Multimodal Tutor: Adaptive Feedback from Multimodal Experiences
This doctoral thesis describes the journey of ideation, prototyping and empirical testing of the Multimodal Tutor, a system designed for providing digital feedback that supports psychomotor skills acquisition using learning and multimodal data capturing. The feedback is given in real-time with machine-driven assessment of the learner's task execution. The predictions are tailored by supervised machine learning models trained with human annotated samples. The main contributions of this thesis are: a literature survey on multimodal data for learning, a conceptual model (the Multimodal Learning Analytics Model), a technological framework (the Multimodal Pipeline), a data annotation tool (the Visual Inspection Tool) and a case study in Cardiopulmonary Resuscitation training (CPR Tutor). The CPR Tutor generates real-time, adaptive feedback using kinematic and myographic data and neural networks
End-to-end Audiovisual Speech Activity Detection with Bimodal Recurrent Neural Models
Speech activity detection (SAD) plays an important role in current speech
processing systems, including automatic speech recognition (ASR). SAD is
particularly difficult in environments with acoustic noise. A practical
solution is to incorporate visual information, increasing the robustness of the
SAD approach. An audiovisual system has the advantage of being robust to
different speech modes (e.g., whisper speech) or background noise. Recent
advances in audiovisual speech processing using deep learning have opened
opportunities to capture in a principled way the temporal relationships between
acoustic and visual features. This study explores this idea proposing a
\emph{bimodal recurrent neural network} (BRNN) framework for SAD. The approach
models the temporal dynamic of the sequential audiovisual data, improving the
accuracy and robustness of the proposed SAD system. Instead of estimating
hand-crafted features, the study investigates an end-to-end training approach,
where acoustic and visual features are directly learned from the raw data
during training. The experimental evaluation considers a large audiovisual
corpus with over 60.8 hours of recordings, collected from 105 speakers. The
results demonstrate that the proposed framework leads to absolute improvements
up to 1.2% under practical scenarios over a VAD baseline using only audio
implemented with deep neural network (DNN). The proposed approach achieves
92.7% F1-score when it is evaluated using the sensors from a portable tablet
under noisy acoustic environment, which is only 1.0% lower than the performance
obtained under ideal conditions (e.g., clean speech obtained with a high
definition camera and a close-talking microphone).Comment: Submitted to Speech Communicatio
A transparent framework towards the context-sensitive recognition of conversational engagement
Modelling and recognising affective and mental user states is an
urging topic in multiple research fields. This work suggests an approach towards adequate recognition of such states by combining
state-of-the-art behaviour recognition classifiers in a transparent and explainable modelling framework that also allows to consider contextual aspects in the inference process. More precisely, in this paper we exemplify the idea of our framework with the recognition of conversational engagement in bi-directional conversations. We introduce a multi-modal annotation scheme for conversational engagement. We further introduce our hybrid approach that combines the accuracy of state-of-the art machine learning techniques, such as deep learning, with the capabilities of Bayesian Networks that are inherently interpretable and feature an important aspect that modern approaches are lacking - causal inference. In an evaluation on a large multi-modal corpus of bi-directional conversations, we show that this hybrid approach can even outperform state-of-the-art black-box approaches by considering context information and causal relations
Toward Emotion Recognition From Physiological Signals in the Wild: Approaching the Methodological Issues in Real-Life Data Collection
Emotion, mood, and stress recognition (EMSR) has been studied in laboratory settings for decades. In particular, physiological signals are widely used to detect and classify affective states in lab conditions. However, physiological reactions to emotional stimuli have been found to differ in laboratory and natural settings. Thanks to recent technological progress (e.g., in wearables) the creation of EMSR systems for a large number of consumers during their everyday activities is increasingly possible. Therefore, datasets created in the wild are needed to insure the validity and the exploitability of EMSR models for real-life applications. In this paper, we initially present common techniques used in laboratory settings to induce emotions for the purpose of physiological dataset creation. Next, advantages and challenges of data collection in the wild are discussed. To assess the applicability of existing datasets to real-life applications, we propose a set of categories to guide and compare at a glance different methodologies used by researchers to collect such data. For this purpose, we also introduce a visual tool called Graphical Assessment of Real-life Application-Focused Emotional Dataset (GARAFED). In the last part of the paper, we apply the proposed tool to compare existing physiological datasets for EMSR in the wild and to show possible improvements and future directions of research. We wish for this paper and GARAFED to be used as guidelines for researchers and developers who aim at collecting affect-related data for real-life EMSR-based applications
- …