42 research outputs found

    Shaping Visual Representations with Attributes for Few-Shot Recognition

    Full text link
    Few-shot recognition aims to recognize novel categories under low-data regimes. Some recent few-shot recognition methods introduce auxiliary semantic modality, i.e., category attribute information, into representation learning, which enhances the feature discrimination and improves the recognition performance. Most of these existing methods only consider the attribute information of support set while ignoring the query set, resulting in a potential loss of performance. In this letter, we propose a novel attribute-shaped learning (ASL) framework, which can jointly perform query attributes generation and discriminative visual representation learning for few-shot recognition. Specifically, a visual-attribute predictor (VAP) is constructed to predict the attributes of queries. By leveraging the attributes information, an attribute-visual attention module (AVAM) is designed, which can adaptively utilize attributes and visual representations to learn more discriminative features. Under the guidance of attribute modality, our method can learn enhanced semantic-aware representation for classification. Experiments demonstrate that our method can achieve competitive results on CUB and SUN benchmarks. Our source code is available at: \url{https://github.com/chenhaoxing/ASL}.Comment: accepted by IEEE Signal Process. Let

    Sparse Spatial Transformers for Few-Shot Learning

    Full text link
    Learning from limited data is a challenging task since the scarcity of data leads to a poor generalization of the trained model. The classical global pooled representation is likely to lose useful local information. Recently, many few shot learning methods address this challenge by using deep descriptors and learning a pixel-level metric. However, using deep descriptors as feature representations may lose the contextual information of the image. And most of these methods deal with each class in the support set independently, which cannot sufficiently utilize discriminative information and task-specific embeddings. In this paper, we propose a novel Transformer based neural network architecture called Sparse Spatial Transformers (SSFormers), which can find task-relevant features and suppress task-irrelevant features. Specifically, we first divide each input image into several image patches of different sizes to obtain dense local features. These features retain contextual information while expressing local information. Then, a sparse spatial transformer layer is proposed to find spatial correspondence between the query image and the entire support set to select task-relevant image patches and suppress task-irrelevant image patches. Finally, we propose to use an image patch matching module for calculating the distance between dense local representations, thus to determine which category the query image belongs to in the support set. Extensive experiments on popular few-shot learning benchmarks show that our method achieves the state-of-the-art performance

    Segment Anything Model Meets Image Harmonization

    Full text link
    Image harmonization is a crucial technique in image composition that aims to seamlessly match the background by adjusting the foreground of composite images. Current methods adopt either global-level or pixel-level feature matching. Global-level feature matching ignores the proximity prior, treating foreground and background as separate entities. On the other hand, pixel-level feature matching loses contextual information. Therefore, it is necessary to use the information from semantic maps that describe different objects to guide harmonization. In this paper, we propose Semantic-guided Region-aware Instance Normalization (SRIN) that can utilize the semantic segmentation maps output by a pre-trained Segment Anything Model (SAM) to guide the visual consistency learning of foreground and background features. Abundant experiments demonstrate the superiority of our method for image harmonization over state-of-the-art methods.Comment: Accepted by ICASSP 202

    Joint Projection Learning and Tensor Decomposition Based Incomplete Multi-view Clustering

    Full text link
    Incomplete multi-view clustering (IMVC) has received increasing attention since it is often that some views of samples are incomplete in reality. Most existing methods learn similarity subgraphs from original incomplete multi-view data and seek complete graphs by exploring the incomplete subgraphs of each view for spectral clustering. However, the graphs constructed on the original high-dimensional data may be suboptimal due to feature redundancy and noise. Besides, previous methods generally ignored the graph noise caused by the inter-class and intra-class structure variation during the transformation of incomplete graphs and complete graphs. To address these problems, we propose a novel Joint Projection Learning and Tensor Decomposition Based method (JPLTD) for IMVC. Specifically, to alleviate the influence of redundant features and noise in high-dimensional data, JPLTD introduces an orthogonal projection matrix to project the high-dimensional features into a lower-dimensional space for compact feature learning.Meanwhile, based on the lower-dimensional space, the similarity graphs corresponding to instances of different views are learned, and JPLTD stacks these graphs into a third-order low-rank tensor to explore the high-order correlations across different views. We further consider the graph noise of projected data caused by missing samples and use a tensor-decomposition based graph filter for robust clustering.JPLTD decomposes the original tensor into an intrinsic tensor and a sparse tensor. The intrinsic tensor models the true data similarities. An effective optimization algorithm is adopted to solve the JPLTD model. Comprehensive experiments on several benchmark datasets demonstrate that JPLTD outperforms the state-of-the-art methods. The code of JPLTD is available at https://github.com/weilvNJU/JPLTD.Comment: IEEE Transactions on Neural Networks and Learning Systems, 202

    10-Formyl-2,4,6,8,12-penta­nitro-2,4,6,8,10,12-hexa­azatetra­cyclo­[5.5.0.03,11.05,9]dodeca­ne

    Get PDF
    The title compound, C7H7N11O11 (PNMFIW), is a caged heterocycle substituted with five nitro and one formyl groups. It is related to the hexa­azaisowurtzitane family of high-density high-energy polycyclic cage compounds. Four nitro groups are appended to the four N atoms of the two five-membered rings, while a nitro group and a formyl are attached to the two N atoms of the six-membered ring

    Attention-Guided Contrastive Role Representations for Multi-Agent Reinforcement Learning

    Full text link
    Real-world multi-agent tasks usually involve dynamic team composition with the emergence of roles, which should also be a key to efficient cooperation in multi-agent reinforcement learning (MARL). Drawing inspiration from the correlation between roles and agent's behavior patterns, we propose a novel framework of **A**ttention-guided **CO**ntrastive **R**ole representation learning for **M**ARL (**ACORM**) to promote behavior heterogeneity, knowledge transfer, and skillful coordination across agents. First, we introduce mutual information maximization to formalize role representation learning, derive a contrastive learning objective, and concisely approximate the distribution of negative pairs. Second, we leverage an attention mechanism to prompt the global state to attend to learned role representations in value decomposition, implicitly guiding agent coordination in a skillful role space to yield more expressive credit assignment. Experiments on challenging StarCraft II micromanagement and Google research football tasks demonstrate the state-of-the-art performance of our method and its advantages over existing approaches. Our code is available at [https://github.com/NJU-RL/ACORM](https://github.com/NJU-RL/ACORM)

    Contrastive Latent Space Reconstruction Learning for Audio-Text Retrieval

    Full text link
    Cross-modal retrieval (CMR) has been extensively applied in various domains, such as multimedia search engines and recommendation systems. Most existing CMR methods focus on image-to-text retrieval, whereas audio-to-text retrieval, a less explored domain, has posed a great challenge due to the difficulty to uncover discriminative features from audio clips and texts. Existing studies are restricted in the following two ways: 1) Most researchers utilize contrastive learning to construct a common subspace where similarities among data can be measured. However, they considers only cross-modal transformation, neglecting the intra-modal separability. Besides, the temperature parameter is not adaptively adjusted along with semantic guidance, which degrades the performance. 2) These methods do not take latent representation reconstruction into account, which is essential for semantic alignment. This paper introduces a novel audio-text oriented CMR approach, termed Contrastive Latent Space Reconstruction Learning (CLSR). CLSR improves contrastive representation learning by taking intra-modal separability into account and adopting an adaptive temperature control strategy. Moreover, the latent representation reconstruction modules are embedded into the CMR framework, which improves modal interaction. Experiments in comparison with some state-of-the-art methods on two audio-text datasets have validated the superiority of CLSR.Comment: Accepted by The 35th IEEE International Conference on Tools with Artificial Intelligence. (ICTAI 2023

    10-Formyl-2,4,6,8,12-penta­nitro-2,4,6,8,10,12-hexa­azatetra­cyclo­[5.5.0.05,9.03,11]dodecane acetone solvate

    Get PDF
    The title compound, C7H7N11O11·C3H6O, consisting of one mol­ecule of 10-formyl-2,4,6,8,12-penta­nitro-2,4,6,8,10,12-hexa­azatetra­cyclo­[5.5.0.05,9.03,11]dodecane (penta­nitro­mono­form­yl­hexa­aza­isowurtzitane, PNMFIW) and one acetone solvent mol­ecule, is a member of the caged hexa­azaisowurtzitane family. PNMFIW has a cage structure which is constructed from one six-membered and two five-membered rings which are linked by a C—C bond, thus creating two seven-membered rings. In the PNMFIW mol­ecule, one formyl group is bonded to the N heteroatom of the six-membered cycle, and five nitro groups are appended to other five N heteroatom of the caged structure. The acetone solvent mol­ecule is arranged beside a five-membered plane of PNMFIW with an O atom and an H atom close (with respect to the sum of the van der Waals radii) to the neighbouring nitro O atom [O⋯O = 2.957 (3) and 2.852 (3) Å; O⋯ H = 2.692 (2), 2.526 (3) and 2.432 (3) Å]
    corecore