2 research outputs found
Semantic-Aware Dual Contrastive Learning for Multi-label Image Classification
Extracting image semantics effectively and assigning corresponding labels to
multiple objects or attributes for natural images is challenging due to the
complex scene contents and confusing label dependencies. Recent works have
focused on modeling label relationships with graph and understanding object
regions using class activation maps (CAM). However, these methods ignore the
complex intra- and inter-category relationships among specific semantic
features, and CAM is prone to generate noisy information. To this end, we
propose a novel semantic-aware dual contrastive learning framework that
incorporates sample-to-sample contrastive learning (SSCL) as well as
prototype-to-sample contrastive learning (PSCL). Specifically, we leverage
semantic-aware representation learning to extract category-related local
discriminative features and construct category prototypes. Then based on SSCL,
label-level visual representations of the same category are aggregated
together, and features belonging to distinct categories are separated.
Meanwhile, we construct a novel PSCL module to narrow the distance between
positive samples and category prototypes and push negative samples away from
the corresponding category prototypes. Finally, the discriminative label-level
features related to the image content are accurately captured by the joint
training of the above three parts. Experiments on five challenging large-scale
public datasets demonstrate that our proposed method is effective and
outperforms the state-of-the-art methods. Code and supplementary materials are
released on https://github.com/yu-gi-oh-leilei/SADCL.Comment: 8 pages, 6 figures, accepted by European Conference on Artificial
Intelligence (2023 ECAI
Hybrid attention mechanism of feature fusion for medical image segmentation
Abstract Traditional convolution neural networks (CNN) have achieved good performance in multiāorgan segmentation of medical images. Due to the lack of ability to model longārange dependencies and correlations between image pixels, CNN usually ignores the information of channel dimension. To further improve the performance of multiāorgan segmentation, a hybrid attention mechanism model is proposed. First, a CNN was used to extract multiāscale feature maps and fed into the Channel Attention Enhancement Module (CAEM) to selectively pay attention to target organs in medical images, and the Transformer encoded tokenized image patches from CNN feature maps as the input sequence to model longārange dependencies. Second, the decoder upsampled the output from Transformer and fused with the CAEM features in multiāscale through skip connections. Finally, we introduced a Refinement Module (RM) after the decoder to improve feature correlations of the same organ and the feature discriminability between different organs. The model outperformed on dice coefficient (%) and hd95 on both the synapse multiāorgan segmentation and cardiac diagnosis challenge datasets. The hybrid attention mechanisms exhibited high efficiency and high segmentation accuracy in medicalĀ images