764,736 research outputs found
Areas of Attention for Image Captioning
We propose "Areas of Attention", a novel attention-based model for automatic
image captioning. Our approach models the dependencies between image regions,
caption words, and the state of an RNN language model, using three pairwise
interactions. In contrast to previous attention-based approaches that associate
image regions only to the RNN state, our method allows a direct association
between caption words and image regions. During training these associations are
inferred from image-level captions, akin to weakly-supervised object detector
training. These associations help to improve captioning by localizing the
corresponding regions during testing. We also propose and compare different
ways of generating attention areas: CNN activation grids, object proposals, and
spatial transformers nets applied in a convolutional fashion. Spatial
transformers give the best results. They allow for image specific attention
areas, and can be trained jointly with the rest of the network. Our attention
mechanism and spatial transformer attention areas together yield
state-of-the-art results on the MSCOCO dataset.o meaningful latent semantic
structure in the generated captions.Comment: Accepted in ICCV 201
Reversible Recursive Instance-level Object Segmentation
In this work, we propose a novel Reversible Recursive Instance-level Object
Segmentation (R2-IOS) framework to address the challenging instance-level
object segmentation task. R2-IOS consists of a reversible proposal refinement
sub-network that predicts bounding box offsets for refining the object proposal
locations, and an instance-level segmentation sub-network that generates the
foreground mask of the dominant object instance in each proposal. By being
recursive, R2-IOS iteratively optimizes the two sub-networks during joint
training, in which the refined object proposals and improved segmentation
predictions are alternately fed into each other to progressively increase the
network capabilities. By being reversible, the proposal refinement sub-network
adaptively determines an optimal number of refinement iterations required for
each proposal during both training and testing. Furthermore, to handle multiple
overlapped instances within a proposal, an instance-aware denoising autoencoder
is introduced into the segmentation sub-network to distinguish the dominant
object from other distracting instances. Extensive experiments on the
challenging PASCAL VOC 2012 benchmark well demonstrate the superiority of
R2-IOS over other state-of-the-art methods. In particular, the
over classes at IoU achieves , which significantly
outperforms the results of by PFN~\cite{PFN} and
by~\cite{liu2015multi}.Comment: 9 page
- …