2,887 research outputs found
Looking Beyond a Clever Narrative: Visual Context and Attention are Primary Drivers of Affect in Video Advertisements
Emotion evoked by an advertisement plays a key role in influencing brand
recall and eventual consumer choices. Automatic ad affect recognition has
several useful applications. However, the use of content-based feature
representations does not give insights into how affect is modulated by aspects
such as the ad scene setting, salient object attributes and their interactions.
Neither do such approaches inform us on how humans prioritize visual
information for ad understanding. Our work addresses these lacunae by
decomposing video content into detected objects, coarse scene structure, object
statistics and actively attended objects identified via eye-gaze. We measure
the importance of each of these information channels by systematically
incorporating related information into ad affect prediction models. Contrary to
the popular notion that ad affect hinges on the narrative and the clever use of
linguistic and social cues, we find that actively attended objects and the
coarse scene structure better encode affective information as compared to
individual scene objects or conspicuous background elements.Comment: Accepted for publication in the Proceedings of 20th ACM International
Conference on Multimodal Interaction, Boulder, CO, US
Expected exponential loss for gaze-based video and volume ground truth annotation
Many recent machine learning approaches used in medical imaging are highly
reliant on large amounts of image and ground truth data. In the context of
object segmentation, pixel-wise annotations are extremely expensive to collect,
especially in video and 3D volumes. To reduce this annotation burden, we
propose a novel framework to allow annotators to simply observe the object to
segment and record where they have looked at with a \$200 eye gaze tracker. Our
method then estimates pixel-wise probabilities for the presence of the object
throughout the sequence from which we train a classifier in semi-supervised
setting using a novel Expected Exponential loss function. We show that our
framework provides superior performances on a wide range of medical image
settings compared to existing strategies and that our method can be combined
with current crowd-sourcing paradigms as well.Comment: 9 pages, 5 figues, MICCAI 2017 - LABELS Worksho
Learn to Interpret Atari Agents
Deep Reinforcement Learning (DeepRL) agents surpass human-level performances
in a multitude of tasks. However, the direct mapping from states to actions
makes it hard to interpret the rationale behind the decision making of agents.
In contrast to previous a-posteriori methods of visualizing DeepRL policies, we
propose an end-to-end trainable framework based on Rainbow, a representative
Deep Q-Network (DQN) agent. Our method automatically learns important regions
in the input domain, which enables characterizations of the decision making and
interpretations for non-intuitive behaviors. Hence we name it Region Sensitive
Rainbow (RS-Rainbow). RS-Rainbow utilizes a simple yet effective mechanism to
incorporate visualization ability into the learning model, not only improving
model interpretability, but leading to improved performance. Extensive
experiments on the challenging platform of Atari 2600 demonstrate the
superiority of RS-Rainbow. In particular, our agent achieves state of the art
at just 25% of the training frames. Demonstrations and code are available at
https://github.com/yz93/Learn-to-Interpret-Atari-Agents
- …