1,208 research outputs found
Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking
Discriminative Correlation Filters (DCF) have demonstrated excellent
performance for visual object tracking. The key to their success is the ability
to efficiently exploit available negative data by including all shifted
versions of a training sample. However, the underlying DCF formulation is
restricted to single-resolution feature maps, significantly limiting its
potential. In this paper, we go beyond the conventional DCF framework and
introduce a novel formulation for training continuous convolution filters. We
employ an implicit interpolation model to pose the learning problem in the
continuous spatial domain. Our proposed formulation enables efficient
integration of multi-resolution deep feature maps, leading to superior results
on three object tracking benchmarks: OTB-2015 (+5.1% in mean OP), Temple-Color
(+4.6% in mean OP), and VOT2015 (20% relative reduction in failure rate).
Additionally, our approach is capable of sub-pixel localization, crucial for
the task of accurate feature point tracking. We also demonstrate the
effectiveness of our learning formulation in extensive feature point tracking
experiments. Code and supplementary material are available at
http://www.cvl.isy.liu.se/research/objrec/visualtracking/conttrack/index.html.Comment: Accepted at ECCV 201
Learning Spatial-Aware Regressions for Visual Tracking
In this paper, we analyze the spatial information of deep features, and
propose two complementary regressions for robust visual tracking. First, we
propose a kernelized ridge regression model wherein the kernel value is defined
as the weighted sum of similarity scores of all pairs of patches between two
samples. We show that this model can be formulated as a neural network and thus
can be efficiently solved. Second, we propose a fully convolutional neural
network with spatially regularized kernels, through which the filter kernel
corresponding to each output channel is forced to focus on a specific region of
the target. Distance transform pooling is further exploited to determine the
effectiveness of each output channel of the convolution layer. The outputs from
the kernelized ridge regression model and the fully convolutional neural
network are combined to obtain the ultimate response. Experimental results on
two benchmark datasets validate the effectiveness of the proposed method.Comment: To appear in CVPR201
Good Features to Correlate for Visual Tracking
During the recent years, correlation filters have shown dominant and
spectacular results for visual object tracking. The types of the features that
are employed in these family of trackers significantly affect the performance
of visual tracking. The ultimate goal is to utilize robust features invariant
to any kind of appearance change of the object, while predicting the object
location as properly as in the case of no appearance change. As the deep
learning based methods have emerged, the study of learning features for
specific tasks has accelerated. For instance, discriminative visual tracking
methods based on deep architectures have been studied with promising
performance. Nevertheless, correlation filter based (CFB) trackers confine
themselves to use the pre-trained networks which are trained for object
classification problem. To this end, in this manuscript the problem of learning
deep fully convolutional features for the CFB visual tracking is formulated. In
order to learn the proposed model, a novel and efficient backpropagation
algorithm is presented based on the loss function of the network. The proposed
learning framework enables the network model to be flexible for a custom
design. Moreover, it alleviates the dependency on the network trained for
classification. Extensive performance analysis shows the efficacy of the
proposed custom design in the CFB tracking framework. By fine-tuning the
convolutional parts of a state-of-the-art network and integrating this model to
a CFB tracker, which is the top performing one of VOT2016, 18% increase is
achieved in terms of expected average overlap, and tracking failures are
decreased by 25%, while maintaining the superiority over the state-of-the-art
methods in OTB-2013 and OTB-2015 tracking datasets.Comment: Accepted version of IEEE Transactions on Image Processin
- …