15 research outputs found
Deep Motion Features for Visual Tracking
Robust visual tracking is a challenging computer vision problem, with many
real-world applications. Most existing approaches employ hand-crafted
appearance features, such as HOG or Color Names. Recently, deep RGB features
extracted from convolutional neural networks have been successfully applied for
tracking. Despite their success, these features only capture appearance
information. On the other hand, motion cues provide discriminative and
complementary information that can improve tracking performance. Contrary to
visual tracking, deep motion features have been successfully applied for action
recognition and video classification tasks. Typically, the motion features are
learned by training a CNN on optical flow images extracted from large amounts
of labeled videos.
This paper presents an investigation of the impact of deep motion features in
a tracking-by-detection framework. We further show that hand-crafted, deep RGB,
and deep motion features contain complementary information. To the best of our
knowledge, we are the first to propose fusing appearance information with deep
motion features for visual tracking. Comprehensive experiments clearly suggest
that our fusion approach with deep motion features outperforms standard methods
relying on appearance information alone.Comment: ICPR 2016. Best paper award in the "Computer Vision and Robot Vision"
trac
Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking
Discriminative Correlation Filters (DCF) have demonstrated excellent
performance for visual object tracking. The key to their success is the ability
to efficiently exploit available negative data by including all shifted
versions of a training sample. However, the underlying DCF formulation is
restricted to single-resolution feature maps, significantly limiting its
potential. In this paper, we go beyond the conventional DCF framework and
introduce a novel formulation for training continuous convolution filters. We
employ an implicit interpolation model to pose the learning problem in the
continuous spatial domain. Our proposed formulation enables efficient
integration of multi-resolution deep feature maps, leading to superior results
on three object tracking benchmarks: OTB-2015 (+5.1% in mean OP), Temple-Color
(+4.6% in mean OP), and VOT2015 (20% relative reduction in failure rate).
Additionally, our approach is capable of sub-pixel localization, crucial for
the task of accurate feature point tracking. We also demonstrate the
effectiveness of our learning formulation in extensive feature point tracking
experiments. Code and supplementary material are available at
http://www.cvl.isy.liu.se/research/objrec/visualtracking/conttrack/index.html.Comment: Accepted at ECCV 201
End-to-end Flow Correlation Tracking with Spatial-temporal Attention
Discriminative correlation filters (DCF) with deep convolutional features
have achieved favorable performance in recent tracking benchmarks. However,
most of existing DCF trackers only consider appearance features of current
frame, and hardly benefit from motion and inter-frame information. The lack of
temporal information degrades the tracking performance during challenges such
as partial occlusion and deformation. In this work, we focus on making use of
the rich flow information in consecutive frames to improve the feature
representation and the tracking accuracy. Firstly, individual components,
including optical flow estimation, feature extraction, aggregation and
correlation filter tracking are formulated as special layers in network. To the
best of our knowledge, this is the first work to jointly train flow and
tracking task in a deep learning framework. Then the historical feature maps at
predefined intervals are warped and aggregated with current ones by the guiding
of flow. For adaptive aggregation, we propose a novel spatial-temporal
attention mechanism. Extensive experiments are performed on four challenging
tracking datasets: OTB2013, OTB2015, VOT2015 and VOT2016, and the proposed
method achieves superior results on these benchmarks.Comment: Accepted in CVPR 201
Remove Cosine Window from Correlation Filter-based Visual Trackers: When and How
Correlation filters (CFs) have been continuously advancing the
state-of-the-art tracking performance and have been extensively studied in the
recent few years. Most of the existing CF trackers adopt a cosine window to
spatially reweight base image to alleviate boundary discontinuity. However,
cosine window emphasizes more on the central region of base image and has the
risk of contaminating negative training samples during model learning. On the
other hand, spatial regularization deployed in many recent CF trackers plays a
similar role as cosine window by enforcing spatial penalty on CF coefficients.
Therefore, we in this paper investigate the feasibility to remove cosine window
from CF trackers with spatial regularization. When simply removing cosine
window, CF with spatial regularization still suffers from small degree of
boundary discontinuity. To tackle this issue, binary and Gaussian shaped mask
functions are further introduced for eliminating boundary discontinuity while
reweighting the estimation error of each training sample, and can be
incorporated with multiple CF trackers with spatial regularization. In
comparison to the counterparts with cosine window, our methods are effective in
handling boundary discontinuity and sample contamination, thereby benefiting
tracking performance. Extensive experiments on three benchmarks show that our
methods perform favorably against the state-of-the-art trackers using either
handcrafted or deep CNN features. The code is publicly available at
https://github.com/lifeng9472/Removing_cosine_window_from_CF_trackers.Comment: 13 pages, 7 figures, submitted to IEEE Transactions on Image
Processin