1,205 research outputs found
Dense Feature Aggregation and Pruning for RGBT Tracking
How to perform effective information fusion of different modalities is a core
factor in boosting the performance of RGBT tracking. This paper presents a
novel deep fusion algorithm based on the representations from an end-to-end
trained convolutional neural network. To deploy the complementarity of features
of all layers, we propose a recursive strategy to densely aggregate these
features that yield robust representations of target objects in each modality.
In different modalities, we propose to prune the densely aggregated features of
all modalities in a collaborative way. In a specific, we employ the operations
of global average pooling and weighted random selection to perform channel
scoring and selection, which could remove redundant and noisy features to
achieve more robust feature representation. Experimental results on two RGBT
tracking benchmark datasets suggest that our tracker achieves clear
state-of-the-art against other RGB and RGBT tracking methods.Comment: arXiv admin note: text overlap with arXiv:1811.0985
RGB-T Tracking Based on Mixed Attention
RGB-T tracking involves the use of images from both visible and thermal
modalities. The primary objective is to adaptively leverage the relatively
dominant modality in varying conditions to achieve more robust tracking
compared to single-modality tracking. An RGB-T tracker based on mixed attention
mechanism to achieve complementary fusion of modalities (referred to as MACFT)
is proposed in this paper. In the feature extraction stage, we utilize
different transformer backbone branches to extract specific and shared
information from different modalities. By performing mixed attention operations
in the backbone to enable information interaction and self-enhancement between
the template and search images, it constructs a robust feature representation
that better understands the high-level semantic features of the target. Then,
in the feature fusion stage, a modality-adaptive fusion is achieved through a
mixed attention-based modality fusion network, which suppresses the low-quality
modality noise while enhancing the information of the dominant modality.
Evaluation on multiple RGB-T public datasets demonstrates that our proposed
tracker outperforms other RGB-T trackers on general evaluation metrics while
also being able to adapt to longterm tracking scenarios.Comment: 14 pages, 10 figure
- …