278 research outputs found
Deep Motion Features for Visual Tracking
Robust visual tracking is a challenging computer vision problem, with many
real-world applications. Most existing approaches employ hand-crafted
appearance features, such as HOG or Color Names. Recently, deep RGB features
extracted from convolutional neural networks have been successfully applied for
tracking. Despite their success, these features only capture appearance
information. On the other hand, motion cues provide discriminative and
complementary information that can improve tracking performance. Contrary to
visual tracking, deep motion features have been successfully applied for action
recognition and video classification tasks. Typically, the motion features are
learned by training a CNN on optical flow images extracted from large amounts
of labeled videos.
This paper presents an investigation of the impact of deep motion features in
a tracking-by-detection framework. We further show that hand-crafted, deep RGB,
and deep motion features contain complementary information. To the best of our
knowledge, we are the first to propose fusing appearance information with deep
motion features for visual tracking. Comprehensive experiments clearly suggest
that our fusion approach with deep motion features outperforms standard methods
relying on appearance information alone.Comment: ICPR 2016. Best paper award in the "Computer Vision and Robot Vision"
trac
Discriminative Scale Space Tracking
Accurate scale estimation of a target is a challenging research problem in
visual object tracking. Most state-of-the-art methods employ an exhaustive
scale search to estimate the target size. The exhaustive search strategy is
computationally expensive and struggles when encountered with large scale
variations. This paper investigates the problem of accurate and robust scale
estimation in a tracking-by-detection framework. We propose a novel scale
adaptive tracking approach by learning separate discriminative correlation
filters for translation and scale estimation. The explicit scale filter is
learned online using the target appearance sampled at a set of different
scales. Contrary to standard approaches, our method directly learns the
appearance change induced by variations in the target scale. Additionally, we
investigate strategies to reduce the computational cost of our approach.
Extensive experiments are performed on the OTB and the VOT2014 datasets.
Compared to the standard exhaustive scale search, our approach achieves a gain
of 2.5% in average overlap precision on the OTB dataset. Additionally, our
method is computationally efficient, operating at a 50% higher frame rate
compared to the exhaustive scale search. Our method obtains the top rank in
performance by outperforming 19 state-of-the-art trackers on OTB and 37
state-of-the-art trackers on VOT2014.Comment: To appear in TPAMI. This is the journal extension of the
VOT2014-winning DSST tracking metho
- …