37,385 research outputs found
Multimodal person recognition for human-vehicle interaction
Next-generation vehicles will undoubtedly feature biometric person recognition as part of an effort to improve the driving experience. Today's technology prevents such systems from operating satisfactorily under adverse conditions. A proposed framework for achieving person recognition successfully combines different biometric modalities, borne out in two case studies
CentralNet: a Multilayer Approach for Multimodal Fusion
This paper proposes a novel multimodal fusion approach, aiming to produce
best possible decisions by integrating information coming from multiple media.
While most of the past multimodal approaches either work by projecting the
features of different modalities into the same space, or by coordinating the
representations of each modality through the use of constraints, our approach
borrows from both visions. More specifically, assuming each modality can be
processed by a separated deep convolutional network, allowing to take decisions
independently from each modality, we introduce a central network linking the
modality specific networks. This central network not only provides a common
feature embedding but also regularizes the modality specific networks through
the use of multi-task learning. The proposed approach is validated on 4
different computer vision tasks on which it consistently improves the accuracy
of existing multimodal fusion approaches
Looking Beyond a Clever Narrative: Visual Context and Attention are Primary Drivers of Affect in Video Advertisements
Emotion evoked by an advertisement plays a key role in influencing brand
recall and eventual consumer choices. Automatic ad affect recognition has
several useful applications. However, the use of content-based feature
representations does not give insights into how affect is modulated by aspects
such as the ad scene setting, salient object attributes and their interactions.
Neither do such approaches inform us on how humans prioritize visual
information for ad understanding. Our work addresses these lacunae by
decomposing video content into detected objects, coarse scene structure, object
statistics and actively attended objects identified via eye-gaze. We measure
the importance of each of these information channels by systematically
incorporating related information into ad affect prediction models. Contrary to
the popular notion that ad affect hinges on the narrative and the clever use of
linguistic and social cues, we find that actively attended objects and the
coarse scene structure better encode affective information as compared to
individual scene objects or conspicuous background elements.Comment: Accepted for publication in the Proceedings of 20th ACM International
Conference on Multimodal Interaction, Boulder, CO, US
- …