3,862 research outputs found
Learning Target-oriented Dual Attention for Robust RGB-T Tracking
RGB-Thermal object tracking attempt to locate target object using
complementary visual and thermal infrared data. Existing RGB-T trackers fuse
different modalities by robust feature representation learning or adaptive
modal weighting. However, how to integrate dual attention mechanism for visual
tracking is still a subject that has not been studied yet. In this paper, we
propose two visual attention mechanisms for robust RGB-T object tracking.
Specifically, the local attention is implemented by exploiting the common
visual attention of RGB and thermal data to train deep classifiers. We also
introduce the global attention, which is a multi-modal target-driven attention
estimation network. It can provide global proposals for the classifier together
with local proposals extracted from previous tracking result. Extensive
experiments on two RGB-T benchmark datasets validated the effectiveness of our
proposed algorithm.Comment: Accepted by IEEE ICIP 201
Hy-Tracker: A Novel Framework for Enhancing Efficiency and Accuracy of Object Tracking in Hyperspectral Videos
Hyperspectral object tracking has recently emerged as a topic of great
interest in the remote sensing community. The hyperspectral image, with its
many bands, provides a rich source of material information of an object that
can be effectively used for object tracking. While most hyperspectral trackers
are based on detection-based techniques, no one has yet attempted to employ
YOLO for detecting and tracking the object. This is due to the presence of
multiple spectral bands, the scarcity of annotated hyperspectral videos, and
YOLO's performance limitation in managing occlusions, and distinguishing object
in cluttered backgrounds. Therefore, in this paper, we propose a novel
framework called Hy-Tracker, which aims to bridge the gap between hyperspectral
data and state-of-the-art object detection methods to leverage the strengths of
YOLOv7 for object tracking in hyperspectral videos. Hy-Tracker not only
introduces YOLOv7 but also innovatively incorporates a refined tracking module
on top of YOLOv7. The tracker refines the initial detections produced by
YOLOv7, leading to improved object-tracking performance. Furthermore, we
incorporate Kalman-Filter into the tracker, which addresses the challenges
posed by scale variation and occlusion. The experimental results on
hyperspectral benchmark datasets demonstrate the effectiveness of Hy-Tracker in
accurately tracking objects across frames
Visual Prompt Multi-Modal Tracking
Visible-modal object tracking gives rise to a series of downstream
multi-modal tracking tributaries. To inherit the powerful representations of
the foundation model, a natural modus operandi for multi-modal tracking is full
fine-tuning on the RGB-based parameters. Albeit effective, this manner is not
optimal due to the scarcity of downstream data and poor transferability, etc.
In this paper, inspired by the recent success of the prompt learning in
language models, we develop Visual Prompt multi-modal Tracking (ViPT), which
learns the modal-relevant prompts to adapt the frozen pre-trained foundation
model to various downstream multimodal tracking tasks. ViPT finds a better way
to stimulate the knowledge of the RGB-based model that is pre-trained at scale,
meanwhile only introducing a few trainable parameters (less than 1% of model
parameters). ViPT outperforms the full fine-tuning paradigm on multiple
downstream tracking tasks including RGB+Depth, RGB+Thermal, and RGB+Event
tracking. Extensive experiments show the potential of visual prompt learning
for multi-modal tracking, and ViPT can achieve state-of-the-art performance
while satisfying parameter efficiency. Code and models are available at
https://github.com/jiawen-zhu/ViPT.Comment: Accepted by CVPR202
- …