Search CORE

77 research outputs found

Visual Attention Consistency under Image Transforms for Multi-Label Image Classification

Author: Fan Xiaochuan
Guo Hao
Wang Song
Yu Hongkai
Zheng Kang
Publication venue: ScholarWorks @ UTRGV
Publication date: 01/06/2020
Field of study

Human visual perception shows good consistency for many multi-label image classification tasks under certain spatial transforms, such as scaling, rotation, flipping and translation. This has motivated the data augmentation strategy widely used in CNN classifier training -- transformed images are included for training by assuming the same class labels as their original images. In this paper, we further propose the assumption of perceptual consistency of visual attention regions for classification under such transforms, i.e., the attention region for a classification follows the same transform if the input image is spatially transformed. While the attention regions of CNN classifiers can be derived as an attention heatmap in middle layers of the network, we find that their consistency under many transforms are not preserved. To address this problem, we propose a two-branch network with an original image and its transformed image as inputs and introduce a new attention consistency loss that measures the attention heatmap consistency between two branches. This new loss is then combined with multi-label image classification loss for network training. Experiments on three datasets verify the superiority of the proposed network by achieving new state-of-the-art classification performance

Crossref

Scholarworks@UTRGV Univ. of Texas RioGrande Valley

RXFOOD: Plug-in RGB-X Fusion for Object of Interest Detection

Author: Guo Qing
Li Jinlong
Lin Yuewei
Ma Jin
Yu Hongkai
Zhang Tianyun
Publication venue
Publication date: 21/06/2023
Field of study

The emergence of different sensors (Near-Infrared, Depth, etc.) is a remedy for the limited application scenarios of traditional RGB camera. The RGB-X tasks, which rely on RGB input and another type of data input to resolve specific problems, have become a popular research topic in multimedia. A crucial part in two-branch RGB-X deep neural networks is how to fuse information across modalities. Given the tremendous information inside RGB-X networks, previous works typically apply naive fusion (e.g., average or max fusion) or only focus on the feature fusion at the same scale(s). While in this paper, we propose a novel method called RXFOOD for the fusion of features across different scales within the same modality branch and from different modality branches simultaneously in a unified attention mechanism. An Energy Exchange Module is designed for the interaction of each feature map's energy matrix, who reflects the inter-relationship of different positions and different channels inside a feature map. The RXFOOD method can be easily incorporated to any dual-branch encoder-decoder network as a plug-in module, and help the original backbone network better focus on important positions and channels for object of interest detection. Experimental results on RGB-NIR salient object detection, RGB-D salient object detection, and RGBFrequency image manipulation detection demonstrate the clear effectiveness of the proposed RXFOOD.Comment: 10 page

arXiv.org e-Print Archive

ActionPrompt: Action-Guided 3D Human Pose Estimation With Text and Pose Prompting

Author: Dai Wenrui
Guo Min
Li Han
Shi Bowen
Sun Yu
Wan Botao
Xiong Hongkai
Zheng Hongwei
Publication venue
Publication date: 18/07/2023
Field of study

Recent 2D-to-3D human pose estimation (HPE) utilizes temporal consistency across sequences to alleviate the depth ambiguity problem but ignore the action related prior knowledge hidden in the pose sequence. In this paper, we propose a plug-and-play module named Action Prompt Module (APM) that effectively mines different kinds of action clues for 3D HPE. The highlight is that, the mining scheme of APM can be widely adapted to different frameworks and bring consistent benefits. Specifically, we first present a novel Action-related Text Prompt module (ATP) that directly embeds action labels and transfers the rich language information in the label to the pose sequence. Besides, we further introduce Action-specific Pose Prompt module (APP) to mine the position-aware pose pattern of each action, and exploit the correlation between the mined patterns and input pose sequence for further pose refinement. Experiments show that APM can improve the performance of most video-based 2D-to-3D HPE frameworks by a large margin.Comment: 6 pages, 4 figures, 2023ICM

arXiv.org e-Print Archive

Inclusive $\eta'$ production in B decays and the Enhancement due to charged technipions

Author: Ahmady M R
Anderson S (CLEO Collaboration)
Browder T E (CLEO Collaboration)
Buras A J
Gongru Lu
Hongkai Guo
Kagan A L
Lane K
Linxia Lü
Yuan F
Zhenjun Xiao
Publication venue: 'IOP Publishing'
Publication date: 01/01/1999
Field of study

The new contributions to the charmless B decay

B \to X_{s}\eta^{\prime}

from the unit-charged technipions

P^{\pm}

and

P^{\pm}_{8}

are estimated. The technipions can provide a large enhancement to the inclusive branching ratio:

Br(B \to X_{s}\eta^{\prime}) \sim 7\times 10^{-4}

for

m_{p1}=100GeV

and

m_{p8}=250 \sim 350 GeV

when the effect of QCD gluon anomaly is also taken into account. The new physics effect is essential to interpret the CLEO data.Comment: Latex file, 7 pages with two EPS figure

arXiv.org e-Print Archive

Crossref

CERN Document Server