2 research outputs found
A spatiotemporal model with visual attention for video classification
High level understanding of sequential visual input is important for safe and
stable autonomy, especially in localization and object detection. While
traditional object classification and tracking approaches are specifically
designed to handle variations in rotation and scale, current state-of-the-art
approaches based on deep learning achieve better performance. This paper
focuses on developing a spatiotemporal model to handle videos containing moving
objects with rotation and scale changes. Built on models that combine
Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) to
classify sequential data, this work investigates the effectiveness of
incorporating attention modules in the CNN stage for video classification. The
superiority of the proposed spatiotemporal model is demonstrated on the Moving
MNIST dataset augmented with rotation and scaling.Comment: Accepted by Robotics: Science and Systems 2017 Workshop on
Articulated Model Trackin
Towards Robust Image Classification Using Sequential Attention Models
In this paper we propose to augment a modern neural-network architecture with
an attention model inspired by human perception. Specifically, we adversarially
train and analyze a neural model incorporating a human inspired, visual
attention component that is guided by a recurrent top-down sequential process.
Our experimental evaluation uncovers several notable findings about the
robustness and behavior of this new model. First, introducing attention to the
model significantly improves adversarial robustness resulting in
state-of-the-art ImageNet accuracies under a wide range of random targeted
attack strengths. Second, we show that by varying the number of attention steps
(glances/fixations) for which the model is unrolled, we are able to make its
defense capabilities stronger, even in light of stronger attacks --- resulting
in a "computational race" between the attacker and the defender. Finally, we
show that some of the adversarial examples generated by attacking our model are
quite different from conventional adversarial examples --- they contain global,
salient and spatially coherent structures coming from the target class that
would be recognizable even to a human, and work by distracting the attention of
the model away from the main object in the original image