77 research outputs found

    Visual Attention Consistency under Image Transforms for Multi-Label Image Classification

    Get PDF
    Human visual perception shows good consistency for many multi-label image classification tasks under certain spatial transforms, such as scaling, rotation, flipping and translation. This has motivated the data augmentation strategy widely used in CNN classifier training -- transformed images are included for training by assuming the same class labels as their original images. In this paper, we further propose the assumption of perceptual consistency of visual attention regions for classification under such transforms, i.e., the attention region for a classification follows the same transform if the input image is spatially transformed. While the attention regions of CNN classifiers can be derived as an attention heatmap in middle layers of the network, we find that their consistency under many transforms are not preserved. To address this problem, we propose a two-branch network with an original image and its transformed image as inputs and introduce a new attention consistency loss that measures the attention heatmap consistency between two branches. This new loss is then combined with multi-label image classification loss for network training. Experiments on three datasets verify the superiority of the proposed network by achieving new state-of-the-art classification performance

    RXFOOD: Plug-in RGB-X Fusion for Object of Interest Detection

    Full text link
    The emergence of different sensors (Near-Infrared, Depth, etc.) is a remedy for the limited application scenarios of traditional RGB camera. The RGB-X tasks, which rely on RGB input and another type of data input to resolve specific problems, have become a popular research topic in multimedia. A crucial part in two-branch RGB-X deep neural networks is how to fuse information across modalities. Given the tremendous information inside RGB-X networks, previous works typically apply naive fusion (e.g., average or max fusion) or only focus on the feature fusion at the same scale(s). While in this paper, we propose a novel method called RXFOOD for the fusion of features across different scales within the same modality branch and from different modality branches simultaneously in a unified attention mechanism. An Energy Exchange Module is designed for the interaction of each feature map's energy matrix, who reflects the inter-relationship of different positions and different channels inside a feature map. The RXFOOD method can be easily incorporated to any dual-branch encoder-decoder network as a plug-in module, and help the original backbone network better focus on important positions and channels for object of interest detection. Experimental results on RGB-NIR salient object detection, RGB-D salient object detection, and RGBFrequency image manipulation detection demonstrate the clear effectiveness of the proposed RXFOOD.Comment: 10 page

    ActionPrompt: Action-Guided 3D Human Pose Estimation With Text and Pose Prompting

    Full text link
    Recent 2D-to-3D human pose estimation (HPE) utilizes temporal consistency across sequences to alleviate the depth ambiguity problem but ignore the action related prior knowledge hidden in the pose sequence. In this paper, we propose a plug-and-play module named Action Prompt Module (APM) that effectively mines different kinds of action clues for 3D HPE. The highlight is that, the mining scheme of APM can be widely adapted to different frameworks and bring consistent benefits. Specifically, we first present a novel Action-related Text Prompt module (ATP) that directly embeds action labels and transfers the rich language information in the label to the pose sequence. Besides, we further introduce Action-specific Pose Prompt module (APP) to mine the position-aware pose pattern of each action, and exploit the correlation between the mined patterns and input pose sequence for further pose refinement. Experiments show that APM can improve the performance of most video-based 2D-to-3D HPE frameworks by a large margin.Comment: 6 pages, 4 figures, 2023ICM

    Inclusive η′\eta' production in B decays and the Enhancement due to charged technipions

    Get PDF
    The new contributions to the charmless B decay B→Xsη′B \to X_{s}\eta^{\prime} from the unit-charged technipions P±P^{\pm} and P8±P^{\pm}_{8} are estimated. The technipions can provide a large enhancement to the inclusive branching ratio: Br(B→Xsη′)∼7×10−4Br(B \to X_{s}\eta^{\prime}) \sim 7\times 10^{-4} for mp1=100GeVm_{p1}=100GeV and mp8=250∼350GeVm_{p8}=250 \sim 350 GeV when the effect of QCD gluon anomaly is also taken into account. The new physics effect is essential to interpret the CLEO data.Comment: Latex file, 7 pages with two EPS figure
    • …
    corecore