35 research outputs found
DroTrack: High-speed Drone-based Object Tracking Under Uncertainty
We present DroTrack, a high-speed visual single-object tracking framework for
drone-captured video sequences. Most of the existing object tracking methods
are designed to tackle well-known challenges, such as occlusion and cluttered
backgrounds. The complex motion of drones, i.e., multiple degrees of freedom in
three-dimensional space, causes high uncertainty. The uncertainty problem leads
to inaccurate location predictions and fuzziness in scale estimations. DroTrack
solves such issues by discovering the dependency between object representation
and motion geometry. We implement an effective object segmentation based on
Fuzzy C Means (FCM). We incorporate the spatial information into the membership
function to cluster the most discriminative segments. We then enhance the
object segmentation by using a pre-trained Convolution Neural Network (CNN)
model. DroTrack also leverages the geometrical angular motion to estimate a
reliable object scale. We discuss the experimental results and performance
evaluation using two datasets of 51,462 drone-captured frames. The combination
of the FCM segmentation and the angular scaling increased DroTrack precision by
up to and decreased the centre location error by pixels on average.
DroTrack outperforms all the high-speed trackers and achieves comparable
results in comparison to deep learning trackers. DroTrack offers high frame
rates up to 1000 frame per second (fps) with the best location precision, more
than a set of state-of-the-art real-time trackers.Comment: 10 pages, 12 figures, FUZZ-IEEE 202
Rethinking Batch Sample Relationships for Data Representation: A Batch-Graph Transformer based Approach
Exploring sample relationships within each mini-batch has shown great
potential for learning image representations. Existing works generally adopt
the regular Transformer to model the visual content relationships, ignoring the
cues of semantic/label correlations between samples. Also, they generally adopt
the "full" self-attention mechanism which are obviously redundant and also
sensitive to the noisy samples. To overcome these issues, in this paper, we
design a simple yet flexible Batch-Graph Transformer (BGFormer) for mini-batch
sample representations by deeply capturing the relationships of image samples
from both visual and semantic perspectives. BGFormer has three main aspects.
(1) It employs a flexible graph model, termed Batch Graph to jointly encode the
visual and semantic relationships of samples within each mini-batch. (2) It
explores the neighborhood relationships of samples by borrowing the idea of
sparse graph representation which thus performs robustly, w.r.t., noisy
samples. (3) It devises a novel Transformer architecture that mainly adopts
dual structure-constrained self-attention (SSA), together with graph
normalization, FFN, etc, to carefully exploit the batch graph information for
sample tokens (nodes) representations. As an application, we apply BGFormer to
the metric learning tasks. Extensive experiments on four popular datasets
demonstrate the effectiveness of the proposed model