63,586 research outputs found
Deep Motion Features for Visual Tracking
Robust visual tracking is a challenging computer vision problem, with many
real-world applications. Most existing approaches employ hand-crafted
appearance features, such as HOG or Color Names. Recently, deep RGB features
extracted from convolutional neural networks have been successfully applied for
tracking. Despite their success, these features only capture appearance
information. On the other hand, motion cues provide discriminative and
complementary information that can improve tracking performance. Contrary to
visual tracking, deep motion features have been successfully applied for action
recognition and video classification tasks. Typically, the motion features are
learned by training a CNN on optical flow images extracted from large amounts
of labeled videos.
This paper presents an investigation of the impact of deep motion features in
a tracking-by-detection framework. We further show that hand-crafted, deep RGB,
and deep motion features contain complementary information. To the best of our
knowledge, we are the first to propose fusing appearance information with deep
motion features for visual tracking. Comprehensive experiments clearly suggest
that our fusion approach with deep motion features outperforms standard methods
relying on appearance information alone.Comment: ICPR 2016. Best paper award in the "Computer Vision and Robot Vision"
trac
Visual object tracking performance measures revisited
The problem of visual tracking evaluation is sporting a large variety of
performance measures, and largely suffers from lack of consensus about which
measures should be used in experiments. This makes the cross-paper tracker
comparison difficult. Furthermore, as some measures may be less effective than
others, the tracking results may be skewed or biased towards particular
tracking aspects. In this paper we revisit the popular performance measures and
tracker performance visualizations and analyze them theoretically and
experimentally. We show that several measures are equivalent from the point of
information they provide for tracker comparison and, crucially, that some are
more brittle than the others. Based on our analysis we narrow down the set of
potential measures to only two complementary ones, describing accuracy and
robustness, thus pushing towards homogenization of the tracker evaluation
methodology. These two measures can be intuitively interpreted and visualized
and have been employed by the recent Visual Object Tracking (VOT) challenges as
the foundation for the evaluation methodology
Learning Target-oriented Dual Attention for Robust RGB-T Tracking
RGB-Thermal object tracking attempt to locate target object using
complementary visual and thermal infrared data. Existing RGB-T trackers fuse
different modalities by robust feature representation learning or adaptive
modal weighting. However, how to integrate dual attention mechanism for visual
tracking is still a subject that has not been studied yet. In this paper, we
propose two visual attention mechanisms for robust RGB-T object tracking.
Specifically, the local attention is implemented by exploiting the common
visual attention of RGB and thermal data to train deep classifiers. We also
introduce the global attention, which is a multi-modal target-driven attention
estimation network. It can provide global proposals for the classifier together
with local proposals extracted from previous tracking result. Extensive
experiments on two RGB-T benchmark datasets validated the effectiveness of our
proposed algorithm.Comment: Accepted by IEEE ICIP 201
Semantic-Aware Fine-Grained Correspondence
Establishing visual correspondence across images is a challenging and
essential task. Recently, an influx of self-supervised methods have been
proposed to better learn representations for visual correspondence. However, we
find that these methods often fail to leverage semantic information and
over-rely on the matching of low-level features. In contrast, human vision is
capable of distinguishing between distinct objects as a pretext to tracking.
Inspired by this paradigm, we propose to learn semantic-aware fine-grained
correspondence. Firstly, we demonstrate that semantic correspondence is
implicitly available through a rich set of image-level self-supervised methods.
We further design a pixel-level self-supervised learning objective which
specifically targets fine-grained correspondence. For downstream tasks, we fuse
these two kinds of complementary correspondence representations together,
demonstrating that they boost performance synergistically. Our method surpasses
previous state-of-the-art self-supervised methods using convolutional networks
on a variety of visual correspondence tasks, including video object
segmentation, human pose tracking, and human part tracking.Comment: 26 page
- …