Search CORE

3,862 research outputs found

Learning Target-oriented Dual Attention for Robust RGB-T Tracking

Author: Li Chenglong
Tang Jin
Wang Xiao
Yang Rui
Zhu Yabin
Publication venue
Publication date: 12/08/2019
Field of study

RGB-Thermal object tracking attempt to locate target object using complementary visual and thermal infrared data. Existing RGB-T trackers fuse different modalities by robust feature representation learning or adaptive modal weighting. However, how to integrate dual attention mechanism for visual tracking is still a subject that has not been studied yet. In this paper, we propose two visual attention mechanisms for robust RGB-T object tracking. Specifically, the local attention is implemented by exploiting the common visual attention of RGB and thermal data to train deep classifiers. We also introduce the global attention, which is a multi-modal target-driven attention estimation network. It can provide global proposals for the classifier together with local proposals extracted from previous tracking result. Extensive experiments on two RGB-T benchmark datasets validated the effectiveness of our proposed algorithm.Comment: Accepted by IEEE ICIP 201

arXiv.org e-Print Archive

Crossref

Hy-Tracker: A Novel Framework for Enhancing Efficiency and Accuracy of Object Tracking in Hyperspectral Videos

Author: Gao Yongsheng
Islam Mohammad Aminul
Paliwal Kuldip K.
Xing Wangzhi
Zhou Jun
Publication venue
Publication date: 29/11/2023
Field of study

Hyperspectral object tracking has recently emerged as a topic of great interest in the remote sensing community. The hyperspectral image, with its many bands, provides a rich source of material information of an object that can be effectively used for object tracking. While most hyperspectral trackers are based on detection-based techniques, no one has yet attempted to employ YOLO for detecting and tracking the object. This is due to the presence of multiple spectral bands, the scarcity of annotated hyperspectral videos, and YOLO's performance limitation in managing occlusions, and distinguishing object in cluttered backgrounds. Therefore, in this paper, we propose a novel framework called Hy-Tracker, which aims to bridge the gap between hyperspectral data and state-of-the-art object detection methods to leverage the strengths of YOLOv7 for object tracking in hyperspectral videos. Hy-Tracker not only introduces YOLOv7 but also innovatively incorporates a refined tracking module on top of YOLOv7. The tracker refines the initial detections produced by YOLOv7, leading to improved object-tracking performance. Furthermore, we incorporate Kalman-Filter into the tracker, which addresses the challenges posed by scale variation and occlusion. The experimental results on hyperspectral benchmark datasets demonstrate the effectiveness of Hy-Tracker in accurately tracking objects across frames

arXiv.org e-Print Archive

A discriminative deep model with feature fusion and temporal attention for human action recognition

Author: Chin Weihong
Gao Hongwei
Ju Zhaojie
Kubota Naoyuki
Yang Wei
Yu Jiahui
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 02/03/2020
Field of study

Portsmouth University Research Portal (Pure)

Visual Prompt Multi-Modal Tracking

Author: Chen Xin
Lai Simiao
Lu Huchuan
Wang Dong
Zhu Jiawen
Publication venue
Publication date: 24/03/2023
Field of study

Visible-modal object tracking gives rise to a series of downstream multi-modal tracking tributaries. To inherit the powerful representations of the foundation model, a natural modus operandi for multi-modal tracking is full fine-tuning on the RGB-based parameters. Albeit effective, this manner is not optimal due to the scarcity of downstream data and poor transferability, etc. In this paper, inspired by the recent success of the prompt learning in language models, we develop Visual Prompt multi-modal Tracking (ViPT), which learns the modal-relevant prompts to adapt the frozen pre-trained foundation model to various downstream multimodal tracking tasks. ViPT finds a better way to stimulate the knowledge of the RGB-based model that is pre-trained at scale, meanwhile only introducing a few trainable parameters (less than 1% of model parameters). ViPT outperforms the full fine-tuning paradigm on multiple downstream tracking tasks including RGB+Depth, RGB+Thermal, and RGB+Event tracking. Extensive experiments show the potential of visual prompt learning for multi-modal tracking, and ViPT can achieve state-of-the-art performance while satisfying parameter efficiency. Code and models are available at https://github.com/jiawen-zhu/ViPT.Comment: Accepted by CVPR202

arXiv.org e-Print Archive