38 research outputs found
Learning Target-oriented Dual Attention for Robust RGB-T Tracking
RGB-Thermal object tracking attempt to locate target object using
complementary visual and thermal infrared data. Existing RGB-T trackers fuse
different modalities by robust feature representation learning or adaptive
modal weighting. However, how to integrate dual attention mechanism for visual
tracking is still a subject that has not been studied yet. In this paper, we
propose two visual attention mechanisms for robust RGB-T object tracking.
Specifically, the local attention is implemented by exploiting the common
visual attention of RGB and thermal data to train deep classifiers. We also
introduce the global attention, which is a multi-modal target-driven attention
estimation network. It can provide global proposals for the classifier together
with local proposals extracted from previous tracking result. Extensive
experiments on two RGB-T benchmark datasets validated the effectiveness of our
proposed algorithm.Comment: Accepted by IEEE ICIP 201
Drone Shadow Tracking
Aerial videos taken by a drone not too far above the surface may contain the
drone's shadow projected on the scene. This deteriorates the aesthetic quality
of videos. With the presence of other shadows, shadow removal cannot be
directly applied, and the shadow of the drone must be tracked. Tracking a
drone's shadow in a video is, however, challenging. The varying size, shape,
change of orientation and drone altitude pose difficulties. The shadow can also
easily disappear over dark areas. However, a shadow has specific properties
that can be leveraged, besides its geometric shape. In this paper, we
incorporate knowledge of the shadow's physical properties, in the form of
shadow detection masks, into a correlation-based tracking algorithm. We capture
a test set of aerial videos taken with different settings and compare our
results to those of a state-of-the-art tracking algorithm.Comment: 5 pages, 4 figure
Deformable Object Tracking with Gated Fusion
The tracking-by-detection framework receives growing attentions through the
integration with the Convolutional Neural Networks (CNNs). Existing
tracking-by-detection based methods, however, fail to track objects with severe
appearance variations. This is because the traditional convolutional operation
is performed on fixed grids, and thus may not be able to find the correct
response while the object is changing pose or under varying environmental
conditions. In this paper, we propose a deformable convolution layer to enrich
the target appearance representations in the tracking-by-detection framework.
We aim to capture the target appearance variations via deformable convolution,
which adaptively enhances its original features. In addition, we also propose a
gated fusion scheme to control how the variations captured by the deformable
convolution affect the original appearance. The enriched feature representation
through deformable convolution facilitates the discrimination of the CNN
classifier on the target object and background. Extensive experiments on the
standard benchmarks show that the proposed tracker performs favorably against
state-of-the-art methods
A Robust Structured Tracker Using Local Deep Features
Deep features extracted from convolutional neural networks have been recently utilized in visual tracking to obtain a generic and semantic representation of target candidates. In this paper, we propose a robust structured tracker using local deep features (STLDF). This tracker exploits the deep features of local patches inside target candidates and sparsely represents them by a set of templates in the particle filter framework. The proposed STLDF utilizes a new optimization model, which employs a group-sparsity regularization term to adopt local and spatial information of the target candidates and attain the spatial layout structure among them. To solve the optimization model, we propose an efficient and fast numerical algorithm that consists of two subproblems with the close-form solutions. Different evaluations in terms of success and precision on the benchmarks of challenging image sequences (e.g., OTB50 and OTB100) demonstrate the superior performance of the STLDF against several state-of-the-art trackers