77 research outputs found
Visual Attention Consistency under Image Transforms for Multi-Label Image Classification
Human visual perception shows good consistency for many multi-label image classification tasks under certain spatial transforms, such as scaling, rotation, flipping and translation. This has motivated the data augmentation strategy widely used in CNN classifier training -- transformed images are included for training by assuming the same class labels as their original images. In this paper, we further propose the assumption of perceptual consistency of visual attention regions for classification under such transforms, i.e., the attention region for a classification follows the same transform if the input image is spatially transformed. While the attention regions of CNN classifiers can be derived as an attention heatmap in middle layers of the network, we find that their consistency under many transforms are not preserved. To address this problem, we propose a two-branch network with an original image and its transformed image as inputs and introduce a new attention consistency loss that measures the attention heatmap consistency between two branches. This new loss is then combined with multi-label image classification loss for network training. Experiments on three datasets verify the superiority of the proposed network by achieving new state-of-the-art classification performance
RXFOOD: Plug-in RGB-X Fusion for Object of Interest Detection
The emergence of different sensors (Near-Infrared, Depth, etc.) is a remedy
for the limited application scenarios of traditional RGB camera. The RGB-X
tasks, which rely on RGB input and another type of data input to resolve
specific problems, have become a popular research topic in multimedia. A
crucial part in two-branch RGB-X deep neural networks is how to fuse
information across modalities. Given the tremendous information inside RGB-X
networks, previous works typically apply naive fusion (e.g., average or max
fusion) or only focus on the feature fusion at the same scale(s). While in this
paper, we propose a novel method called RXFOOD for the fusion of features
across different scales within the same modality branch and from different
modality branches simultaneously in a unified attention mechanism. An Energy
Exchange Module is designed for the interaction of each feature map's energy
matrix, who reflects the inter-relationship of different positions and
different channels inside a feature map. The RXFOOD method can be easily
incorporated to any dual-branch encoder-decoder network as a plug-in module,
and help the original backbone network better focus on important positions and
channels for object of interest detection. Experimental results on RGB-NIR
salient object detection, RGB-D salient object detection, and RGBFrequency
image manipulation detection demonstrate the clear effectiveness of the
proposed RXFOOD.Comment: 10 page
ActionPrompt: Action-Guided 3D Human Pose Estimation With Text and Pose Prompting
Recent 2D-to-3D human pose estimation (HPE) utilizes temporal consistency
across sequences to alleviate the depth ambiguity problem but ignore the action
related prior knowledge hidden in the pose sequence. In this paper, we propose
a plug-and-play module named Action Prompt Module (APM) that effectively mines
different kinds of action clues for 3D HPE. The highlight is that, the mining
scheme of APM can be widely adapted to different frameworks and bring
consistent benefits. Specifically, we first present a novel Action-related Text
Prompt module (ATP) that directly embeds action labels and transfers the rich
language information in the label to the pose sequence. Besides, we further
introduce Action-specific Pose Prompt module (APP) to mine the position-aware
pose pattern of each action, and exploit the correlation between the mined
patterns and input pose sequence for further pose refinement. Experiments show
that APM can improve the performance of most video-based 2D-to-3D HPE
frameworks by a large margin.Comment: 6 pages, 4 figures, 2023ICM
Inclusive production in B decays and the Enhancement due to charged technipions
The new contributions to the charmless B decay
from the unit-charged technipions and are estimated.
The technipions can provide a large enhancement to the inclusive branching
ratio: for
and when the effect of QCD gluon anomaly is also
taken into account. The new physics effect is essential to interpret the CLEO
data.Comment: Latex file, 7 pages with two EPS figure
- …