75 research outputs found
3rd International Workshop on Multisensory Approaches to Human-Food Interaction
This is the introduction paper to the third version of the workshop on 'Multisensory Approaches to Human-Food Interaction' organized at the 20th ACM International Conference on Multimodal Interaction in Boulder, Colorado, on October 16th, 2018. This workshop is a space where the fast growing research on Multisensory Human-Food Interaction is presented. Here we summarize the workshop's key objectives and contributions
EmotiW 2018: Audio-Video, Student Engagement and Group-Level Affect Prediction
This paper details the sixth Emotion Recognition in the Wild (EmotiW)
challenge. EmotiW 2018 is a grand challenge in the ACM International Conference
on Multimodal Interaction 2018, Colorado, USA. The challenge aims at providing
a common platform to researchers working in the affective computing community
to benchmark their algorithms on `in the wild' data. This year EmotiW contains
three sub-challenges: a) Audio-video based emotion recognition; b) Student
engagement prediction; and c) Group-level emotion recognition. The databases,
protocols and baselines are discussed in detail
Attention-based Audio-Visual Fusion for Robust Automatic Speech Recognition
Automatic speech recognition can potentially benefit from the lip motion
patterns, complementing acoustic speech to improve the overall recognition
performance, particularly in noise. In this paper we propose an audio-visual
fusion strategy that goes beyond simple feature concatenation and learns to
automatically align the two modalities, leading to enhanced representations
which increase the recognition accuracy in both clean and noisy conditions. We
test our strategy on the TCD-TIMIT and LRS2 datasets, designed for large
vocabulary continuous speech recognition, applying three types of noise at
different power ratios. We also exploit state of the art Sequence-to-Sequence
architectures, showing that our method can be easily integrated. Results show
relative improvements from 7% up to 30% on TCD-TIMIT over the acoustic modality
alone, depending on the acoustic noise level. We anticipate that the fusion
strategy can easily generalise to many other multimodal tasks which involve
correlated modalities. Code available online on GitHub:
https://github.com/georgesterpu/Sigmedia-AVSRComment: In ICMI'18, October 16-20, 2018, Boulder, CO, USA. Equation (2)
corrected on this versio
Group Interaction Frontiers in Technology
Over the last decade, the study of group behavior for multimodal interaction technologies has increased. However, we believe that despite its potential benefits on society, there could be more activity in this area. The aim of this workshop is create a forum for more interdisciplinary dialogue on this topic to enable the acceleration of growth. The workshop has been very successful in attracting submissions addressing important facets in the context of technologies for analyzing and aiding groups. This paper provides a summary of the activities of the workshop and the accepted papers
I Smell Trouble: Using Multiple Scents To Convey Driving-Relevant Information
Cars provide drivers with task-related information (e.g. "Fill gas") mainly using visual and auditory stimuli. However, those stimuli may distract or overwhelm the driver, causing unnecessary stress. Here, we propose olfactory stimulation as a novel feedback modality to support the perception of visual notifications, reducing the visual demand of the driver. Based on previous research, we explore the application of the scents of lavender, peppermint, and lemon to convey three driving-relevant messages (i.e. "Slow down", "Short inter-vehicle distance", "Lane departure"). Our paper is the first to demonstrate the application of olfactory conditioning in the context of driving and to explore how multiple olfactory notifications change the driving behaviour. Our findings demonstrate that olfactory notifications are perceived as less distracting, more comfortable, and more helpful than visual notifications. Drivers also make less driving mistakes when exposed to olfactory notifications. We discuss how these findings inform the design of future in-car user interfaces
Do I Have Your Attention: A Large Scale Engagement Prediction Dataset and Baselines
The degree of concentration, enthusiasm, optimism, and passion displayed by
individual(s) while interacting with a machine is referred to as `user
engagement'. Engagement comprises of behavioral, cognitive, and affect related
cues. To create engagement prediction systems that can work in real-world
conditions, it is quintessential to learn from rich, diverse datasets. To this
end, a large scale multi-faceted engagement in the wild dataset EngageNet is
proposed. 31 hours duration data of 127 participants representing different
illumination conditions are recorded. Thorough experiments are performed
exploring the applicability of different features, action units, eye gaze, head
pose, and MARLIN. Data from user interactions (question-answer) are analyzed to
understand the relationship between effective learning and user engagement. To
further validate the rich nature of the dataset, evaluation is also performed
on the EngageWild dataset. The experiments show the usefulness of the proposed
dataset. The code, models, and dataset link are publicly available at
https://github.com/engagenet/engagenet_baselines
- …