Search CORE

42 research outputs found

Shaping Visual Representations with Attributes for Few-Shot Recognition

Author: Chen Chunlin
Chen Haoxing
Li Huaxiong
Li Yaohui
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 05/06/2022
Field of study

Few-shot recognition aims to recognize novel categories under low-data regimes. Some recent few-shot recognition methods introduce auxiliary semantic modality, i.e., category attribute information, into representation learning, which enhances the feature discrimination and improves the recognition performance. Most of these existing methods only consider the attribute information of support set while ignoring the query set, resulting in a potential loss of performance. In this letter, we propose a novel attribute-shaped learning (ASL) framework, which can jointly perform query attributes generation and discriminative visual representation learning for few-shot recognition. Specifically, a visual-attribute predictor (VAP) is constructed to predict the attributes of queries. By leveraging the attributes information, an attribute-visual attention module (AVAM) is designed, which can adaptively utilize attributes and visual representations to learn more discriminative features. Under the guidance of attribute modality, our method can learn enhanced semantic-aware representation for classification. Experiments demonstrate that our method can achieve competitive results on CUB and SUN benchmarks. Our source code is available at: \url{https://github.com/chenhaoxing/ASL}.Comment: accepted by IEEE Signal Process. Let

arXiv.org e-Print Archive

Sparse Spatial Transformers for Few-Shot Learning

Author: Chen Chunlin
Chen Haoxing
Li Huaxiong
Li Yaohui
Publication venue
Publication date: 05/12/2021
Field of study

Learning from limited data is a challenging task since the scarcity of data leads to a poor generalization of the trained model. The classical global pooled representation is likely to lose useful local information. Recently, many few shot learning methods address this challenge by using deep descriptors and learning a pixel-level metric. However, using deep descriptors as feature representations may lose the contextual information of the image. And most of these methods deal with each class in the support set independently, which cannot sufficiently utilize discriminative information and task-specific embeddings. In this paper, we propose a novel Transformer based neural network architecture called Sparse Spatial Transformers (SSFormers), which can find task-relevant features and suppress task-irrelevant features. Specifically, we first divide each input image into several image patches of different sizes to obtain dense local features. These features retain contextual information while expressing local information. Then, a sparse spatial transformer layer is proposed to find spatial correspondence between the query image and the entire support set to select task-relevant image patches and suppress task-irrelevant image patches. Finally, we propose to use an image patch matching module for calculating the distance between dense local representations, thus to determine which category the query image belongs to in the support set. Extensive experiments on popular few-shot learning benchmarks show that our method achieves the state-of-the-art performance

arXiv.org e-Print Archive

Segment Anything Model Meets Image Harmonization

Author: Chen Haoxing
Gu Zhangxuan
Lan Jun
Li Huaxiong
Li Yaohui
Xu Zhuoer
Publication venue
Publication date: 19/12/2023
Field of study

Image harmonization is a crucial technique in image composition that aims to seamlessly match the background by adjusting the foreground of composite images. Current methods adopt either global-level or pixel-level feature matching. Global-level feature matching ignores the proximity prior, treating foreground and background as separate entities. On the other hand, pixel-level feature matching loses contextual information. Therefore, it is necessary to use the information from semantic maps that describe different objects to guide harmonization. In this paper, we propose Semantic-guided Region-aware Instance Normalization (SRIN) that can utilize the semantic segmentation maps output by a pre-trained Segment Anything Model (SAM) to guide the visual consistency learning of foreground and background features. Abundant experiments demonstrate the superiority of our method for image harmonization over state-of-the-art methods.Comment: Accepted by ICASSP 202

arXiv.org e-Print Archive

Joint Projection Learning and Tensor Decomposition Based Incomplete Multi-view Clustering

Author: Chen Chunlin
Jia Xiuyi
Li Huaxiong
Lv Wei
Zhang Chao
Publication venue
Publication date: 06/10/2023
Field of study

Incomplete multi-view clustering (IMVC) has received increasing attention since it is often that some views of samples are incomplete in reality. Most existing methods learn similarity subgraphs from original incomplete multi-view data and seek complete graphs by exploring the incomplete subgraphs of each view for spectral clustering. However, the graphs constructed on the original high-dimensional data may be suboptimal due to feature redundancy and noise. Besides, previous methods generally ignored the graph noise caused by the inter-class and intra-class structure variation during the transformation of incomplete graphs and complete graphs. To address these problems, we propose a novel Joint Projection Learning and Tensor Decomposition Based method (JPLTD) for IMVC. Specifically, to alleviate the influence of redundant features and noise in high-dimensional data, JPLTD introduces an orthogonal projection matrix to project the high-dimensional features into a lower-dimensional space for compact feature learning.Meanwhile, based on the lower-dimensional space, the similarity graphs corresponding to instances of different views are learned, and JPLTD stacks these graphs into a third-order low-rank tensor to explore the high-order correlations across different views. We further consider the graph noise of projected data caused by missing samples and use a tensor-decomposition based graph filter for robust clustering.JPLTD decomposes the original tensor into an intrinsic tensor and a sparse tensor. The intrinsic tensor models the true data similarities. An effective optimization algorithm is adopted to solve the JPLTD model. Comprehensive experiments on several benchmark datasets demonstrate that JPLTD outperforms the state-of-the-art methods. The code of JPLTD is available at https://github.com/weilvNJU/JPLTD.Comment: IEEE Transactions on Neural Networks and Learning Systems, 202

arXiv.org e-Print Archive

10-Formyl-2,4,6,8,12-pentanitro-2,4,6,8,10,12-hexaazatetracyclo[5.5.0.03,11.05,9]dodecane

Author: Chen Huaxiong
Chen Shusen
Jin Shaohua
Li Lijie
Shi Yanshan
Publication venue: International Union of Crystallography
Publication date: 01/12/2009
Field of study

The title compound, C7H7N11O11 (PNMFIW), is a caged heterocycle substituted with five nitro and one formyl groups. It is related to the hexaazaisowurtzitane family of high-density high-energy polycyclic cage compounds. Four nitro groups are appended to the four N atoms of the two five-membered rings, while a nitro group and a formyl are attached to the two N atoms of the six-membered ring

Directory of Open Access Journals

PubMed Central

Attention-Guided Contrastive Role Representations for Multi-Agent Reinforcement Learning

Author: Chen Chunlin
Ding Hongyu
Hu Zican
Li Huaxiong
Wang Zhi
Zhang Zongzhang
Publication venue
Publication date: 02/03/2024
Field of study

Real-world multi-agent tasks usually involve dynamic team composition with the emergence of roles, which should also be a key to efficient cooperation in multi-agent reinforcement learning (MARL). Drawing inspiration from the correlation between roles and agent's behavior patterns, we propose a novel framework of **A**ttention-guided **CO**ntrastive **R**ole representation learning for **M**ARL (**ACORM**) to promote behavior heterogeneity, knowledge transfer, and skillful coordination across agents. First, we introduce mutual information maximization to formalize role representation learning, derive a contrastive learning objective, and concisely approximate the distribution of negative pairs. Second, we leverage an attention mechanism to prompt the global state to attend to learned role representations in value decomposition, implicitly guiding agent coordination in a skillful role space to yield more expressive credit assignment. Experiments on challenging StarCraft II micromanagement and Google research football tasks demonstrate the state-of-the-art performance of our method and its advantages over existing approaches. Our code is available at [https://github.com/NJU-RL/ACORM](https://github.com/NJU-RL/ACORM)

arXiv.org e-Print Archive

Contrastive Latent Space Reconstruction Learning for Audio-Text Retrieval

Author: Cheng Ning
Li Huaxiong
Luo Kaiyi
Wang Jianzong
Xiao Jing
Zhang Xulong
Publication venue
Publication date: 15/09/2023
Field of study

Cross-modal retrieval (CMR) has been extensively applied in various domains, such as multimedia search engines and recommendation systems. Most existing CMR methods focus on image-to-text retrieval, whereas audio-to-text retrieval, a less explored domain, has posed a great challenge due to the difficulty to uncover discriminative features from audio clips and texts. Existing studies are restricted in the following two ways: 1) Most researchers utilize contrastive learning to construct a common subspace where similarities among data can be measured. However, they considers only cross-modal transformation, neglecting the intra-modal separability. Besides, the temperature parameter is not adaptively adjusted along with semantic guidance, which degrades the performance. 2) These methods do not take latent representation reconstruction into account, which is essential for semantic alignment. This paper introduces a novel audio-text oriented CMR approach, termed Contrastive Latent Space Reconstruction Learning (CLSR). CLSR improves contrastive representation learning by taking intra-modal separability into account and adopting an adaptive temperature control strategy. Moreover, the latent representation reconstruction modules are embedded into the CMR framework, which improves modal interaction. Experiments in comparison with some state-of-the-art methods on two audio-text datasets have validated the superiority of CLSR.Comment: Accepted by The 35th IEEE International Conference on Tools with Artificial Intelligence. (ICTAI 2023

arXiv.org e-Print Archive

10-Formyl-2,4,6,8,12-pentanitro-2,4,6,8,10,12-hexaazatetracyclo[5.5.0.05,9.03,11]dodecane acetone solvate

Author: Chen
Huaxiong Chen
Jin
Lijie Li
Liu
Lu
Ou
Shaohua Jin
Sheldrick
Shusen Chen
Sufen Liu
Publication venue: International Union of Crystallography
Publication date: 01/02/2010
Field of study

The title compound, C7H7N11O11·C3H6O, consisting of one molecule of 10-formyl-2,4,6,8,12-pentanitro-2,4,6,8,10,12-hexaazatetracyclo[5.5.0.05,9.03,11]dodecane (pentanitromonoformylhexaazaisowurtzitane, PNMFIW) and one acetone solvent molecule, is a member of the caged hexaazaisowurtzitane family. PNMFIW has a cage structure which is constructed from one six-membered and two five-membered rings which are linked by a C—C bond, thus creating two seven-membered rings. In the PNMFIW molecule, one formyl group is bonded to the N heteroatom of the six-membered cycle, and five nitro groups are appended to other five N heteroatom of the caged structure. The acetone solvent molecule is arranged beside a five-membered plane of PNMFIW with an O atom and an H atom close (with respect to the sum of the van der Waals radii) to the neighbouring nitro O atom [O⋯O = 2.957 (3) and 2.852 (3) Å; O⋯ H = 2.692 (2), 2.526 (3) and 2.432 (3) Å]

Crossref

Directory of Open Access Journals

PubMed Central