140,857 research outputs found
Recurrent Models of Visual Attention
Applying convolutional neural networks to large images is computationally
expensive because the amount of computation scales linearly with the number of
image pixels. We present a novel recurrent neural network model that is capable
of extracting information from an image or video by adaptively selecting a
sequence of regions or locations and only processing the selected regions at
high resolution. Like convolutional neural networks, the proposed model has a
degree of translation invariance built-in, but the amount of computation it
performs can be controlled independently of the input image size. While the
model is non-differentiable, it can be trained using reinforcement learning
methods to learn task-specific policies. We evaluate our model on several image
classification tasks, where it significantly outperforms a convolutional neural
network baseline on cluttered images, and on a dynamic visual control problem,
where it learns to track a simple object without an explicit training signal
for doing so
Temporal Attention-Gated Model for Robust Sequence Classification
Typical techniques for sequence classification are designed for
well-segmented sequences which have been edited to remove noisy or irrelevant
parts. Therefore, such methods cannot be easily applied on noisy sequences
expected in real-world applications. In this paper, we present the Temporal
Attention-Gated Model (TAGM) which integrates ideas from attention models and
gated recurrent networks to better deal with noisy or unsegmented sequences.
Specifically, we extend the concept of attention model to measure the relevance
of each observation (time step) of a sequence. We then use a novel gated
recurrent network to learn the hidden representation for the final prediction.
An important advantage of our approach is interpretability since the temporal
attention weights provide a meaningful value for the salience of each time step
in the sequence. We demonstrate the merits of our TAGM approach, both for
prediction accuracy and interpretability, on three different tasks: spoken
digit recognition, text-based sentiment analysis and visual event recognition.Comment: Accepted by CVPR 201
Going in circles is the way forward: the role of recurrence in visual inference
Biological visual systems exhibit abundant recurrent connectivity.
State-of-the-art neural network models for visual recognition, by contrast,
rely heavily or exclusively on feedforward computation. Any finite-time
recurrent neural network (RNN) can be unrolled along time to yield an
equivalent feedforward neural network (FNN). This important insight suggests
that computational neuroscientists may not need to engage recurrent
computation, and that computer-vision engineers may be limiting themselves to a
special case of FNN if they build recurrent models. Here we argue, to the
contrary, that FNNs are a special case of RNNs and that computational
neuroscientists and engineers should engage recurrence to understand how brains
and machines can (1) achieve greater and more flexible computational depth, (2)
compress complex computations into limited hardware, (3) integrate priors and
priorities into visual inference through expectation and attention, (4) exploit
sequential dependencies in their data for better inference and prediction, and
(5) leverage the power of iterative computation
- …