RXFOOD: Plug-in RGB-X Fusion for Object of Interest Detection

Guo, Qing; Li, Jinlong; Lin, Yuewei; Ma, Jin; Yu, Hongkai; Zhang, Tianyun

RXFOOD: Plug-in RGB-X Fusion for Object of Interest Detection

Authors: Qing Guo
Jinlong Li
Yuewei Lin
Jin Ma
Hongkai Yu
Tianyun Zhang
Publication date: 21 June 2023
Publisher

Abstract

The emergence of different sensors (Near-Infrared, Depth, etc.) is a remedy for the limited application scenarios of traditional RGB camera. The RGB-X tasks, which rely on RGB input and another type of data input to resolve specific problems, have become a popular research topic in multimedia. A crucial part in two-branch RGB-X deep neural networks is how to fuse information across modalities. Given the tremendous information inside RGB-X networks, previous works typically apply naive fusion (e.g., average or max fusion) or only focus on the feature fusion at the same scale(s). While in this paper, we propose a novel method called RXFOOD for the fusion of features across different scales within the same modality branch and from different modality branches simultaneously in a unified attention mechanism. An Energy Exchange Module is designed for the interaction of each feature map's energy matrix, who reflects the inter-relationship of different positions and different channels inside a feature map. The RXFOOD method can be easily incorporated to any dual-branch encoder-decoder network as a plug-in module, and help the original backbone network better focus on important positions and channels for object of interest detection. Experimental results on RGB-NIR salient object detection, RGB-D salient object detection, and RGBFrequency image manipulation detection demonstrate the clear effectiveness of the proposed RXFOOD.Comment: 10 page

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2306.12621

Last time updated on 26/06/2023