808 research outputs found
SANet: Structure-Aware Network for Visual Tracking
Convolutional neural network (CNN) has drawn increasing interest in visual
tracking owing to its powerfulness in feature extraction. Most existing
CNN-based trackers treat tracking as a classification problem. However, these
trackers are sensitive to similar distractors because their CNN models mainly
focus on inter-class classification. To address this problem, we use
self-structure information of object to distinguish it from distractors.
Specifically, we utilize recurrent neural network (RNN) to model object
structure, and incorporate it into CNN to improve its robustness to similar
distractors. Considering that convolutional layers in different levels
characterize the object from different perspectives, we use multiple RNNs to
model object structure in different levels respectively. Extensive experiments
on three benchmarks, OTB100, TC-128 and VOT2015, show that the proposed
algorithm outperforms other methods. Code is released at
http://www.dabi.temple.edu/~hbling/code/SANet/SANet.html.Comment: In CVPR Deep Vision Workshop, 201
ReSeg: A Recurrent Neural Network-based Model for Semantic Segmentation
We propose a structured prediction architecture, which exploits the local
generic features extracted by Convolutional Neural Networks and the capacity of
Recurrent Neural Networks (RNN) to retrieve distant dependencies. The proposed
architecture, called ReSeg, is based on the recently introduced ReNet model for
image classification. We modify and extend it to perform the more challenging
task of semantic segmentation. Each ReNet layer is composed of four RNN that
sweep the image horizontally and vertically in both directions, encoding
patches or activations, and providing relevant global information. Moreover,
ReNet layers are stacked on top of pre-trained convolutional layers, benefiting
from generic local features. Upsampling layers follow ReNet layers to recover
the original image resolution in the final predictions. The proposed ReSeg
architecture is efficient, flexible and suitable for a variety of semantic
segmentation tasks. We evaluate ReSeg on several widely-used semantic
segmentation datasets: Weizmann Horse, Oxford Flower, and CamVid; achieving
state-of-the-art performance. Results show that ReSeg can act as a suitable
architecture for semantic segmentation tasks, and may have further applications
in other structured prediction problems. The source code and model
hyperparameters are available on https://github.com/fvisin/reseg.Comment: In CVPR Deep Vision Workshop, 201
- …