75 research outputs found

    3rd International Workshop on Multisensory Approaches to Human-Food Interaction

    Get PDF
    This is the introduction paper to the third version of the workshop on 'Multisensory Approaches to Human-Food Interaction' organized at the 20th ACM International Conference on Multimodal Interaction in Boulder, Colorado, on October 16th, 2018. This workshop is a space where the fast growing research on Multisensory Human-Food Interaction is presented. Here we summarize the workshop's key objectives and contributions

    EmotiW 2018: Audio-Video, Student Engagement and Group-Level Affect Prediction

    Full text link
    This paper details the sixth Emotion Recognition in the Wild (EmotiW) challenge. EmotiW 2018 is a grand challenge in the ACM International Conference on Multimodal Interaction 2018, Colorado, USA. The challenge aims at providing a common platform to researchers working in the affective computing community to benchmark their algorithms on `in the wild' data. This year EmotiW contains three sub-challenges: a) Audio-video based emotion recognition; b) Student engagement prediction; and c) Group-level emotion recognition. The databases, protocols and baselines are discussed in detail

    Attention-based Audio-Visual Fusion for Robust Automatic Speech Recognition

    Full text link
    Automatic speech recognition can potentially benefit from the lip motion patterns, complementing acoustic speech to improve the overall recognition performance, particularly in noise. In this paper we propose an audio-visual fusion strategy that goes beyond simple feature concatenation and learns to automatically align the two modalities, leading to enhanced representations which increase the recognition accuracy in both clean and noisy conditions. We test our strategy on the TCD-TIMIT and LRS2 datasets, designed for large vocabulary continuous speech recognition, applying three types of noise at different power ratios. We also exploit state of the art Sequence-to-Sequence architectures, showing that our method can be easily integrated. Results show relative improvements from 7% up to 30% on TCD-TIMIT over the acoustic modality alone, depending on the acoustic noise level. We anticipate that the fusion strategy can easily generalise to many other multimodal tasks which involve correlated modalities. Code available online on GitHub: https://github.com/georgesterpu/Sigmedia-AVSRComment: In ICMI'18, October 16-20, 2018, Boulder, CO, USA. Equation (2) corrected on this versio

    Group Interaction Frontiers in Technology

    Get PDF
    Over the last decade, the study of group behavior for multimodal interaction technologies has increased. However, we believe that despite its potential benefits on society, there could be more activity in this area. The aim of this workshop is create a forum for more interdisciplinary dialogue on this topic to enable the acceleration of growth. The workshop has been very successful in attracting submissions addressing important facets in the context of technologies for analyzing and aiding groups. This paper provides a summary of the activities of the workshop and the accepted papers

    I Smell Trouble: Using Multiple Scents To Convey Driving-Relevant Information

    Get PDF
    Cars provide drivers with task-related information (e.g. "Fill gas") mainly using visual and auditory stimuli. However, those stimuli may distract or overwhelm the driver, causing unnecessary stress. Here, we propose olfactory stimulation as a novel feedback modality to support the perception of visual notifications, reducing the visual demand of the driver. Based on previous research, we explore the application of the scents of lavender, peppermint, and lemon to convey three driving-relevant messages (i.e. "Slow down", "Short inter-vehicle distance", "Lane departure"). Our paper is the first to demonstrate the application of olfactory conditioning in the context of driving and to explore how multiple olfactory notifications change the driving behaviour. Our findings demonstrate that olfactory notifications are perceived as less distracting, more comfortable, and more helpful than visual notifications. Drivers also make less driving mistakes when exposed to olfactory notifications. We discuss how these findings inform the design of future in-car user interfaces

    Do I Have Your Attention: A Large Scale Engagement Prediction Dataset and Baselines

    Full text link
    The degree of concentration, enthusiasm, optimism, and passion displayed by individual(s) while interacting with a machine is referred to as `user engagement'. Engagement comprises of behavioral, cognitive, and affect related cues. To create engagement prediction systems that can work in real-world conditions, it is quintessential to learn from rich, diverse datasets. To this end, a large scale multi-faceted engagement in the wild dataset EngageNet is proposed. 31 hours duration data of 127 participants representing different illumination conditions are recorded. Thorough experiments are performed exploring the applicability of different features, action units, eye gaze, head pose, and MARLIN. Data from user interactions (question-answer) are analyzed to understand the relationship between effective learning and user engagement. To further validate the rich nature of the dataset, evaluation is also performed on the EngageWild dataset. The experiments show the usefulness of the proposed dataset. The code, models, and dataset link are publicly available at https://github.com/engagenet/engagenet_baselines
    • …
    corecore