Search CORE

17 research outputs found

RGB-T Tracking Based on Mixed Attention

Author: Dong Mingtao
Guo Xiqing
Luo Yang
Yu Jin
Publication venue
Publication date: 10/04/2023
Field of study

RGB-T tracking involves the use of images from both visible and thermal modalities. The primary objective is to adaptively leverage the relatively dominant modality in varying conditions to achieve more robust tracking compared to single-modality tracking. An RGB-T tracker based on mixed attention mechanism to achieve complementary fusion of modalities (referred to as MACFT) is proposed in this paper. In the feature extraction stage, we utilize different transformer backbone branches to extract specific and shared information from different modalities. By performing mixed attention operations in the backbone to enable information interaction and self-enhancement between the template and search images, it constructs a robust feature representation that better understands the high-level semantic features of the target. Then, in the feature fusion stage, a modality-adaptive fusion is achieved through a mixed attention-based modality fusion network, which suppresses the low-quality modality noise while enhancing the information of the dominant modality. Evaluation on multiple RGB-T public datasets demonstrates that our proposed tracker outperforms other RGB-T trackers on general evaluation metrics while also being able to adapt to longterm tracking scenarios.Comment: 14 pages, 10 figure

arXiv.org e-Print Archive

The seventh visual object tracking VOT2019 challenge results

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 26/06/2020
Field of study

180The Visual Object Tracking challenge VOT2019 is the seventh annual tracker benchmarking activity organized by the VOT initiative. Results of 81 trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in the recent years. The evaluation included the standard VOT and other popular methodologies for short-term tracking analysis as well as the standard VOT methodology for long-term tracking analysis. The VOT2019 challenge was composed of five challenges focusing on different tracking domains: (i) VOTST2019 challenge focused on short-term tracking in RGB, (ii) VOT-RT2019 challenge focused on 'real-time' shortterm tracking in RGB, (iii) VOT-LT2019 focused on longterm tracking namely coping with target disappearance and reappearance. Two new challenges have been introduced: (iv) VOT-RGBT2019 challenge focused on short-term tracking in RGB and thermal imagery and (v) VOT-RGBD2019 challenge focused on long-term tracking in RGB and depth imagery. The VOT-ST2019, VOT-RT2019 and VOT-LT2019 datasets were refreshed while new datasets were introduced for VOT-RGBT2019 and VOT-RGBD2019. The VOT toolkit has been updated to support both standard shortterm, long-term tracking and tracking with multi-channel imagery. Performance of the tested trackers typically by far exceeds standard baselines. The source code for most of the trackers is publicly available from the VOT page. The dataset, the evaluation kit and the results are publicly available at the challenge website.openopenKristan M.; Matas J.; Leonardis A.; Felsberg M.; Pflugfelder R.; Kamarainen J.-K.; Zajc L.C.; Drbohlav O.; Lukezic A.; Berg A.; Eldesokey A.; Kapyla J.; Fernandez G.; Gonzalez-Garcia A.; Memarmoghadam A.; Lu A.; He A.; Varfolomieiev A.; Chan A.; Tripathi A.S.; Smeulders A.; Pedasingu B.S.; Chen B.X.; Zhang B.; Baoyuanwu B.; Li B.; He B.; Yan B.; Bai B.; Li B.; Li B.; Kim B.H.; Ma C.; Fang C.; Qian C.; Chen C.; Li C.; Zhang C.; Tsai C.-Y.; Luo C.; Micheloni C.; Zhang C.; Tao D.; Gupta D.; Song D.; Wang D.; Gavves E.; Yi E.; Khan F.S.; Zhang F.; Wang F.; Zhao F.; De Ath G.; Bhat G.; Chen G.; Wang G.; Li G.; Cevikalp H.; Du H.; Zhao H.; Saribas H.; Jung H.M.; Bai H.; Yu H.; Peng H.; Lu H.; Li H.; Li J.; Li J.; Fu J.; Chen J.; Gao J.; Zhao J.; Tang J.; Li J.; Wu J.; Liu J.; Wang J.; Qi J.; Zhang J.; Tsotsos J.K.; Lee J.H.; Van De Weijer J.; Kittler J.; Ha Lee J.; Zhuang J.; Zhang K.; Wang K.; Dai K.; Chen L.; Liu L.; Guo L.; Zhang L.; Wang L.; Wang L.; Zhang L.; Wang L.; Zhou L.; Zheng L.; Rout L.; Van Gool L.; Bertinetto L.; Danelljan M.; Dunnhofer M.; Ni M.; Kim M.Y.; Tang M.; Yang M.-H.; Paluru N.; Martinel N.; Xu P.; Zhang P.; Zheng P.; Zhang P.; Torr P.H.S.; Wang Q.Z.Q.; Guo Q.; Timofte R.; Gorthi R.K.; Everson R.; Han R.; Zhang R.; You S.; Zhao S.-C.; Zhao S.; Li S.; Li S.; Ge S.; Bai S.; Guan S.; Xing T.; Xu T.; Yang T.; Zhang T.; Vojir T.; Feng W.; Hu W.; Wang W.; Tang W.; Zeng W.; Liu W.; Chen X.; Qiu X.; Bai X.; Wu X.-J.; Yang X.; Chen X.; Li X.; Sun X.; Chen X.; Tian X.; Tang X.; Zhu X.-F.; Huang Y.; Chen Y.; Lian Y.; Gu Y.; Liu Y.; Chen Y.; Zhang Y.; Xu Y.; Wang Y.; Li Y.; Zhou Y.; Dong Y.; Xu Y.; Zhang Y.; Li Y.; Luo Z.W.Z.; Zhang Z.; Feng Z.-H.; He Z.; Song Z.; Chen Z.; Zhang Z.; Wu Z.; Xiong Z.; Huang Z.; Teng Z.; Ni Z.Kristan, M.; Matas, J.; Leonardis, A.; Felsberg, M.; Pflugfelder, R.; Kamarainen, J. -K.; Zajc, L. C.; Drbohlav, O.; Lukezic, A.; Berg, A.; Eldesokey, A.; Kapyla, J.; Fernandez, G.; Gonzalez-Garcia, A.; Memarmoghadam, A.; Lu, A.; He, A.; Varfolomieiev, A.; Chan, A.; Tripathi, A. S.; Smeulders, A.; Pedasingu, B. S.; Chen, B. X.; Zhang, B.; Baoyuanwu, B.; Li, B.; He, B.; Yan, B.; Bai, B.; Li, B.; Li, B.; Kim, B. H.; Ma, C.; Fang, C.; Qian, C.; Chen, C.; Li, C.; Zhang, C.; Tsai, C. -Y.; Luo, C.; Micheloni, C.; Zhang, C.; Tao, D.; Gupta, D.; Song, D.; Wang, D.; Gavves, E.; Yi, E.; Khan, F. S.; Zhang, F.; Wang, F.; Zhao, F.; De Ath, G.; Bhat, G.; Chen, G.; Wang, G.; Li, G.; Cevikalp, H.; Du, H.; Zhao, H.; Saribas, H.; Jung, H. M.; Bai, H.; Yu, H.; Peng, H.; Lu, H.; Li, H.; Li, J.; Li, J.; Fu, J.; Chen, J.; Gao, J.; Zhao, J.; Tang, J.; Li, J.; Wu, J.; Liu, J.; Wang, J.; Qi, J.; Zhang, J.; Tsotsos, J. K.; Lee, J. H.; Van De Weijer, J.; Kittler, J.; Ha Lee, J.; Zhuang, J.; Zhang, K.; Wang, K.; Dai, K.; Chen, L.; Liu, L.; Guo, L.; Zhang, L.; Wang, L.; Wang, L.; Zhang, L.; Wang, L.; Zhou, L.; Zheng, L.; Rout, L.; Van Gool, L.; Bertinetto, L.; Danelljan, M.; Dunnhofer, M.; Ni, M.; Kim, M. Y.; Tang, M.; Yang, M. -H.; Paluru, N.; Martinel, N.; Xu, P.; Zhang, P.; Zheng, P.; Zhang, P.; Torr, P. H. S.; Wang, Q. Z. Q.; Guo, Q.; Timofte, R.; Gorthi, R. K.; Everson, R.; Han, R.; Zhang, R.; You, S.; Zhao, S. -C.; Zhao, S.; Li, S.; Li, S.; Ge, S.; Bai, S.; Guan, S.; Xing, T.; Xu, T.; Yang, T.; Zhang, T.; Vojir, T.; Feng, W.; Hu, W.; Wang, W.; Tang, W.; Zeng, W.; Liu, W.; Chen, X.; Qiu, X.; Bai, X.; Wu, X. -J.; Yang, X.; Chen, X.; Li, X.; Sun, X.; Chen, X.; Tian, X.; Tang, X.; Zhu, X. -F.; Huang, Y.; Chen, Y.; Lian, Y.; Gu, Y.; Liu, Y.; Chen, Y.; Zhang, Y.; Xu, Y.; Wang, Y.; Li, Y.; Zhou, Y.; Dong, Y.; Xu, Y.; Zhang, Y.; Li, Y.; Luo, Z. W. Z.; Zhang, Z.; Feng, Z. -H.; He, Z.; Song, Z.; Chen, Z.; Zhang, Z.; Wu, Z.; Xiong, Z.; Huang, Z.; Teng, Z.; Ni, Z

Archivio istituzionale della ricerca - Università degli Studi di Udine

RGBT Tracking via Progressive Fusion Transformer with Dynamically Guided Learning

Author: Huang Zhixiang
Li Chenglong
Tang Jin
Wang Xiao
Zhu Yabin
Publication venue
Publication date: 26/03/2023
Field of study

Existing Transformer-based RGBT tracking methods either use cross-attention to fuse the two modalities, or use self-attention and cross-attention to model both modality-specific and modality-sharing information. However, the significant appearance gap between modalities limits the feature representation ability of certain modalities during the fusion process. To address this problem, we propose a novel Progressive Fusion Transformer called ProFormer, which progressively integrates single-modality information into the multimodal representation for robust RGBT tracking. In particular, ProFormer first uses a self-attention module to collaboratively extract the multimodal representation, and then uses two cross-attention modules to interact it with the features of the dual modalities respectively. In this way, the modality-specific information can well be activated in the multimodal representation. Finally, a feed-forward network is used to fuse two interacted multimodal representations for the further enhancement of the final multimodal representation. In addition, existing learning methods of RGBT trackers either fuse multimodal features into one for final classification, or exploit the relationship between unimodal branches and fused branch through a competitive learning strategy. However, they either ignore the learning of single-modality branches or result in one branch failing to be well optimized. To solve these problems, we propose a dynamically guided learning algorithm that adaptively uses well-performing branches to guide the learning of other branches, for enhancing the representation ability of each branch. Extensive experiments demonstrate that our proposed ProFormer sets a new state-of-the-art performance on RGBT210, RGBT234, LasHeR, and VTUAV datasets.Comment: 13 pages, 9 figure

arXiv.org e-Print Archive

Generative-based Fusion Mechanism for Multi-Modal Tracking

Author: Kittler Josef
Tang Zhangyong
Wu Xiao-Jun
Xu Tianyang
Zhu Xuefeng
Publication venue
Publication date: 30/11/2023
Field of study

Generative models (GMs) have received increasing research interest for their remarkable capacity to achieve comprehensive understanding. However, their potential application in the domain of multi-modal tracking has remained relatively unexplored. In this context, we seek to uncover the potential of harnessing generative techniques to address the critical challenge, information fusion, in multi-modal tracking. In this paper, we delve into two prominent GM techniques, namely, Conditional Generative Adversarial Networks (CGANs) and Diffusion Models (DMs). Different from the standard fusion process where the features from each modality are directly fed into the fusion block, we condition these multi-modal features with random noise in the GM framework, effectively transforming the original training samples into harder instances. This design excels at extracting discriminative clues from the features, enhancing the ultimate tracking performance. To quantitatively gauge the effectiveness of our approach, we conduct extensive experiments across two multi-modal tracking tasks, three baseline methods, and three challenging benchmarks. The experimental results demonstrate that the proposed generative-based fusion mechanism achieves state-of-the-art performance, setting new records on LasHeR and RGBD1K

arXiv.org e-Print Archive

Exploring multi-modal spatial-temporal contexts for high-performance RGB-T tracking

Author: Han J.
Jiao Q.
Zhang Q.
Zhang T.
Publication venue: Institute of Electrical and Electronics Engineers
Publication date: 19/07/2024
Field of study

In RGB-T tracking, there exist rich spatial relationships between the target and backgrounds within multi-modal data as well as sound consistencies of spatial relationships among successive frames, which are crucial for boosting the tracking performance. However, most existing RGB-T trackers overlook such multi-modal spatial relationships and temporal consistencies within RGB-T videos, hindering them from robust tracking and practical applications in complex scenarios. In this paper, we propose a novel Multi-modal Spatial-Temporal Context (MMSTC) network for RGB-T tracking, which employs a Transformer architecture for the construction of reliable multi-modal spatial context information and the effective propagation of temporal context information. Specifically, a Multi-modal Transformer Encoder (MMTE) is designed to achieve the encoding of reliable multi-modal spatial contexts as well as the fusion of multi-modal features. Furthermore, a Quality-aware Transformer Decoder (QATD) is proposed to effectively propagate the tracking cues from historical frames to the current frame, which facilitates the object searching process. Moreover, the proposed MMSTC network can be easily extended to various tracking frameworks. New state-of-the-art results on five prevalent RGB-T tracking benchmarks demonstrate the superiorities of our proposed trackers over existing ones

White Rose Research Online