2,351 research outputs found
Deformable Object Tracking with Gated Fusion
The tracking-by-detection framework receives growing attentions through the
integration with the Convolutional Neural Networks (CNNs). Existing
tracking-by-detection based methods, however, fail to track objects with severe
appearance variations. This is because the traditional convolutional operation
is performed on fixed grids, and thus may not be able to find the correct
response while the object is changing pose or under varying environmental
conditions. In this paper, we propose a deformable convolution layer to enrich
the target appearance representations in the tracking-by-detection framework.
We aim to capture the target appearance variations via deformable convolution,
which adaptively enhances its original features. In addition, we also propose a
gated fusion scheme to control how the variations captured by the deformable
convolution affect the original appearance. The enriched feature representation
through deformable convolution facilitates the discrimination of the CNN
classifier on the target object and background. Extensive experiments on the
standard benchmarks show that the proposed tracker performs favorably against
state-of-the-art methods
Real-time Multiple People Tracking with Deeply Learned Candidate Selection and Person Re-Identification
Online multi-object tracking is a fundamental problem in time-critical video
analysis applications. A major challenge in the popular tracking-by-detection
framework is how to associate unreliable detection results with existing
tracks. In this paper, we propose to handle unreliable detection by collecting
candidates from outputs of both detection and tracking. The intuition behind
generating redundant candidates is that detection and tracks can complement
each other in different scenarios. Detection results of high confidence prevent
tracking drifts in the long term, and predictions of tracks can handle noisy
detection caused by occlusion. In order to apply optimal selection from a
considerable amount of candidates in real-time, we present a novel scoring
function based on a fully convolutional neural network, that shares most
computations on the entire image. Moreover, we adopt a deeply learned
appearance representation, which is trained on large-scale person
re-identification datasets, to improve the identification ability of our
tracker. Extensive experiments show that our tracker achieves real-time and
state-of-the-art performance on a widely used people tracking benchmark.Comment: ICME 201
Learning Adaptive Discriminative Correlation Filters via Temporal Consistency Preserving Spatial Feature Selection for Robust Visual Tracking
With efficient appearance learning models, Discriminative Correlation Filter
(DCF) has been proven to be very successful in recent video object tracking
benchmarks and competitions. However, the existing DCF paradigm suffers from
two major issues, i.e., spatial boundary effect and temporal filter
degradation. To mitigate these challenges, we propose a new DCF-based tracking
method. The key innovations of the proposed method include adaptive spatial
feature selection and temporal consistent constraints, with which the new
tracker enables joint spatial-temporal filter learning in a lower dimensional
discriminative manifold. More specifically, we apply structured spatial
sparsity constraints to multi-channel filers. Consequently, the process of
learning spatial filters can be approximated by the lasso regularisation. To
encourage temporal consistency, the filter model is restricted to lie around
its historical value and updated locally to preserve the global structure in
the manifold. Last, a unified optimisation framework is proposed to jointly
select temporal consistency preserving spatial features and learn
discriminative filters with the augmented Lagrangian method. Qualitative and
quantitative evaluations have been conducted on a number of well-known
benchmarking datasets such as OTB2013, OTB50, OTB100, Temple-Colour, UAV123 and
VOT2018. The experimental results demonstrate the superiority of the proposed
method over the state-of-the-art approaches
Siamese Object Tracking for Unmanned Aerial Vehicle: A Review and Comprehensive Analysis
Unmanned aerial vehicle (UAV)-based visual object tracking has enabled a wide
range of applications and attracted increasing attention in the field of
intelligent transportation systems because of its versatility and
effectiveness. As an emerging force in the revolutionary trend of deep
learning, Siamese networks shine in UAV-based object tracking with their
promising balance of accuracy, robustness, and speed. Thanks to the development
of embedded processors and the gradual optimization of deep neural networks,
Siamese trackers receive extensive research and realize preliminary
combinations with UAVs. However, due to the UAV's limited onboard computational
resources and the complex real-world circumstances, aerial tracking with
Siamese networks still faces severe obstacles in many aspects. To further
explore the deployment of Siamese networks in UAV-based tracking, this work
presents a comprehensive review of leading-edge Siamese trackers, along with an
exhaustive UAV-specific analysis based on the evaluation using a typical UAV
onboard processor. Then, the onboard tests are conducted to validate the
feasibility and efficacy of representative Siamese trackers in real-world UAV
deployment. Furthermore, to better promote the development of the tracking
community, this work analyzes the limitations of existing Siamese trackers and
conducts additional experiments represented by low-illumination evaluations. In
the end, prospects for the development of Siamese tracking for UAV-based
intelligent transportation systems are deeply discussed. The unified framework
of leading-edge Siamese trackers, i.e., code library, and the results of their
experimental evaluations are available at
https://github.com/vision4robotics/SiameseTracking4UAV
RGBT Tracking via Progressive Fusion Transformer with Dynamically Guided Learning
Existing Transformer-based RGBT tracking methods either use cross-attention
to fuse the two modalities, or use self-attention and cross-attention to model
both modality-specific and modality-sharing information. However, the
significant appearance gap between modalities limits the feature representation
ability of certain modalities during the fusion process. To address this
problem, we propose a novel Progressive Fusion Transformer called ProFormer,
which progressively integrates single-modality information into the multimodal
representation for robust RGBT tracking. In particular, ProFormer first uses a
self-attention module to collaboratively extract the multimodal representation,
and then uses two cross-attention modules to interact it with the features of
the dual modalities respectively. In this way, the modality-specific
information can well be activated in the multimodal representation. Finally, a
feed-forward network is used to fuse two interacted multimodal representations
for the further enhancement of the final multimodal representation. In
addition, existing learning methods of RGBT trackers either fuse multimodal
features into one for final classification, or exploit the relationship between
unimodal branches and fused branch through a competitive learning strategy.
However, they either ignore the learning of single-modality branches or result
in one branch failing to be well optimized. To solve these problems, we propose
a dynamically guided learning algorithm that adaptively uses well-performing
branches to guide the learning of other branches, for enhancing the
representation ability of each branch. Extensive experiments demonstrate that
our proposed ProFormer sets a new state-of-the-art performance on RGBT210,
RGBT234, LasHeR, and VTUAV datasets.Comment: 13 pages, 9 figure
- …