304 research outputs found

    ConfLab: A Rich Multimodal Multisensor Dataset of Free-Standing Social Interactions in the Wild

    Full text link
    Recording the dynamics of unscripted human interactions in the wild is challenging due to the delicate trade-offs between several factors: participant privacy, ecological validity, data fidelity, and logistical overheads. To address these, following a 'datasets for the community by the community' ethos, we propose the Conference Living Lab (ConfLab): a new concept for multimodal multisensor data collection of in-the-wild free-standing social conversations. For the first instantiation of ConfLab described here, we organized a real-life professional networking event at a major international conference. Involving 48 conference attendees, the dataset captures a diverse mix of status, acquaintance, and networking motivations. Our capture setup improves upon the data fidelity of prior in-the-wild datasets while retaining privacy sensitivity: 8 videos (1920x1080, 60 fps) from a non-invasive overhead view, and custom wearable sensors with onboard recording of body motion (full 9-axis IMU), privacy-preserving low-frequency audio (1250 Hz), and Bluetooth-based proximity. Additionally, we developed custom solutions for distributed hardware synchronization at acquisition, and time-efficient continuous annotation of body keypoints and actions at high sampling rates. Our benchmarks showcase some of the open research tasks related to in-the-wild privacy-preserving social data analysis: keypoints detection from overhead camera views, skeleton-based no-audio speaker detection, and F-formation detection.Comment: v2 is the version submitted to Neurips 2022 Datasets and Benchmarks Trac

    Protocol for PD SENSORS:Parkinson’s Disease Symptom Evaluation in a Naturalistic Setting producing Outcomes measuRes using SPHERE technology. An observational feasibility study of multi-modal multi-sensor technology to measure symptoms and activities of daily living in Parkinson’s disease

    Get PDF
    Introduction The impact of disease-modifying agents on disease progression in Parkinson’s disease is largely assessed in clinical trials using clinical rating scales. These scales have drawbacks in terms of their ability to capture the fluctuating nature of symptoms while living in a naturalistic environment. The SPHERE (Sensor Platform for HEalthcare in a Residential Environment) project has designed a multi-sensor platform with multimodal devices designed to allow continuous, relatively inexpensive, unobtrusive sensing of motor, non-motor and activities of daily living metrics in a home or a home-like environment. The aim of this study is to evaluate how the SPHERE technology can measure aspects of Parkinson’s disease.Methods and analysis This is a small-scale feasibility and acceptability study during which 12 pairs of participants (comprising a person with Parkinson’s and a healthy control participant) will stay and live freely for 5 days in a home-like environment embedded with SPHERE technology including environmental, appliance monitoring, wrist-worn accelerometry and camera sensors. These data will be collected alongside clinical rating scales, participant diary entries and expert clinician annotations of colour video images. Machine learning will be used to look for a signal to discriminate between Parkinson’s disease and control, and between Parkinson’s disease symptoms ‘on’ and ‘off’ medications. Additional outcome measures including bradykinesia, activity level, sleep parameters and some activities of daily living will be explored. Acceptability of the technology will be evaluated qualitatively using semi-structured interviews.Ethics and dissemination Ethical approval has been given to commence this study; the results will be disseminated as widely as appropriate

    Real-Time Management of Multimodal Streaming Data for Monitoring of Epileptic Patients

    Get PDF
    This is the Accepted Manuscript version of the following article: I. Mporas, D. Triantafyllopoulos, V. Megalooikonomou, “Real-Time Management of Multimodal Streaming Data for Monitoring of Epileptic Patients”, Journal of Medical Systems, Vol. 40(45), December 2015. The final published versions is available at: https://link.springer.com/article/10.1007%2Fs10916-015-0403-3 © Springer Science+Business Media New York 2015.New generation of healthcare is represented by wearable health monitoring systems, which provide real-time monitoring of patient’s physiological parameters. It is expected that continuous ambulatory monitoring of vital signals will improve treatment of patients and enable proactive personal health management. In this paper, we present the implementation of a multimodal real-time system for epilepsy management. The proposed methodology is based on a data streaming architecture and efficient management of a big flow of physiological parameters. The performance of this architecture is examined for varying spatial resolution of the recorded data.Peer reviewedFinal Accepted Versio

    The Multimodal Tutor: Adaptive Feedback from Multimodal Experiences

    Get PDF
    This doctoral thesis describes the journey of ideation, prototyping and empirical testing of the Multimodal Tutor, a system designed for providing digital feedback that supports psychomotor skills acquisition using learning and multimodal data capturing. The feedback is given in real-time with machine-driven assessment of the learner's task execution. The predictions are tailored by supervised machine learning models trained with human annotated samples. The main contributions of this thesis are: a literature survey on multimodal data for learning, a conceptual model (the Multimodal Learning Analytics Model), a technological framework (the Multimodal Pipeline), a data annotation tool (the Visual Inspection Tool) and a case study in Cardiopulmonary Resuscitation training (CPR Tutor). The CPR Tutor generates real-time, adaptive feedback using kinematic and myographic data and neural networks

    End-to-end Audiovisual Speech Activity Detection with Bimodal Recurrent Neural Models

    Full text link
    Speech activity detection (SAD) plays an important role in current speech processing systems, including automatic speech recognition (ASR). SAD is particularly difficult in environments with acoustic noise. A practical solution is to incorporate visual information, increasing the robustness of the SAD approach. An audiovisual system has the advantage of being robust to different speech modes (e.g., whisper speech) or background noise. Recent advances in audiovisual speech processing using deep learning have opened opportunities to capture in a principled way the temporal relationships between acoustic and visual features. This study explores this idea proposing a \emph{bimodal recurrent neural network} (BRNN) framework for SAD. The approach models the temporal dynamic of the sequential audiovisual data, improving the accuracy and robustness of the proposed SAD system. Instead of estimating hand-crafted features, the study investigates an end-to-end training approach, where acoustic and visual features are directly learned from the raw data during training. The experimental evaluation considers a large audiovisual corpus with over 60.8 hours of recordings, collected from 105 speakers. The results demonstrate that the proposed framework leads to absolute improvements up to 1.2% under practical scenarios over a VAD baseline using only audio implemented with deep neural network (DNN). The proposed approach achieves 92.7% F1-score when it is evaluated using the sensors from a portable tablet under noisy acoustic environment, which is only 1.0% lower than the performance obtained under ideal conditions (e.g., clean speech obtained with a high definition camera and a close-talking microphone).Comment: Submitted to Speech Communicatio

    A transparent framework towards the context-sensitive recognition of conversational engagement

    Get PDF
    Modelling and recognising affective and mental user states is an urging topic in multiple research fields. This work suggests an approach towards adequate recognition of such states by combining state-of-the-art behaviour recognition classifiers in a transparent and explainable modelling framework that also allows to consider contextual aspects in the inference process. More precisely, in this paper we exemplify the idea of our framework with the recognition of conversational engagement in bi-directional conversations. We introduce a multi-modal annotation scheme for conversational engagement. We further introduce our hybrid approach that combines the accuracy of state-of-the art machine learning techniques, such as deep learning, with the capabilities of Bayesian Networks that are inherently interpretable and feature an important aspect that modern approaches are lacking - causal inference. In an evaluation on a large multi-modal corpus of bi-directional conversations, we show that this hybrid approach can even outperform state-of-the-art black-box approaches by considering context information and causal relations

    Toward Emotion Recognition From Physiological Signals in the Wild: Approaching the Methodological Issues in Real-Life Data Collection

    Get PDF
    Emotion, mood, and stress recognition (EMSR) has been studied in laboratory settings for decades. In particular, physiological signals are widely used to detect and classify affective states in lab conditions. However, physiological reactions to emotional stimuli have been found to differ in laboratory and natural settings. Thanks to recent technological progress (e.g., in wearables) the creation of EMSR systems for a large number of consumers during their everyday activities is increasingly possible. Therefore, datasets created in the wild are needed to insure the validity and the exploitability of EMSR models for real-life applications. In this paper, we initially present common techniques used in laboratory settings to induce emotions for the purpose of physiological dataset creation. Next, advantages and challenges of data collection in the wild are discussed. To assess the applicability of existing datasets to real-life applications, we propose a set of categories to guide and compare at a glance different methodologies used by researchers to collect such data. For this purpose, we also introduce a visual tool called Graphical Assessment of Real-life Application-Focused Emotional Dataset (GARAFED). In the last part of the paper, we apply the proposed tool to compare existing physiological datasets for EMSR in the wild and to show possible improvements and future directions of research. We wish for this paper and GARAFED to be used as guidelines for researchers and developers who aim at collecting affect-related data for real-life EMSR-based applications
    corecore