42 research outputs found
Shaping Visual Representations with Attributes for Few-Shot Recognition
Few-shot recognition aims to recognize novel categories under low-data
regimes. Some recent few-shot recognition methods introduce auxiliary semantic
modality, i.e., category attribute information, into representation learning,
which enhances the feature discrimination and improves the recognition
performance. Most of these existing methods only consider the attribute
information of support set while ignoring the query set, resulting in a
potential loss of performance. In this letter, we propose a novel
attribute-shaped learning (ASL) framework, which can jointly perform query
attributes generation and discriminative visual representation learning for
few-shot recognition. Specifically, a visual-attribute predictor (VAP) is
constructed to predict the attributes of queries. By leveraging the attributes
information, an attribute-visual attention module (AVAM) is designed, which can
adaptively utilize attributes and visual representations to learn more
discriminative features. Under the guidance of attribute modality, our method
can learn enhanced semantic-aware representation for classification.
Experiments demonstrate that our method can achieve competitive results on CUB
and SUN benchmarks. Our source code is available at:
\url{https://github.com/chenhaoxing/ASL}.Comment: accepted by IEEE Signal Process. Let
Sparse Spatial Transformers for Few-Shot Learning
Learning from limited data is a challenging task since the scarcity of data
leads to a poor generalization of the trained model. The classical global
pooled representation is likely to lose useful local information. Recently,
many few shot learning methods address this challenge by using deep descriptors
and learning a pixel-level metric. However, using deep descriptors as feature
representations may lose the contextual information of the image. And most of
these methods deal with each class in the support set independently, which
cannot sufficiently utilize discriminative information and task-specific
embeddings. In this paper, we propose a novel Transformer based neural network
architecture called Sparse Spatial Transformers (SSFormers), which can find
task-relevant features and suppress task-irrelevant features. Specifically, we
first divide each input image into several image patches of different sizes to
obtain dense local features. These features retain contextual information while
expressing local information. Then, a sparse spatial transformer layer is
proposed to find spatial correspondence between the query image and the entire
support set to select task-relevant image patches and suppress task-irrelevant
image patches. Finally, we propose to use an image patch matching module for
calculating the distance between dense local representations, thus to determine
which category the query image belongs to in the support set. Extensive
experiments on popular few-shot learning benchmarks show that our method
achieves the state-of-the-art performance
Segment Anything Model Meets Image Harmonization
Image harmonization is a crucial technique in image composition that aims to
seamlessly match the background by adjusting the foreground of composite
images. Current methods adopt either global-level or pixel-level feature
matching. Global-level feature matching ignores the proximity prior, treating
foreground and background as separate entities. On the other hand, pixel-level
feature matching loses contextual information. Therefore, it is necessary to
use the information from semantic maps that describe different objects to guide
harmonization. In this paper, we propose Semantic-guided Region-aware Instance
Normalization (SRIN) that can utilize the semantic segmentation maps output by
a pre-trained Segment Anything Model (SAM) to guide the visual consistency
learning of foreground and background features. Abundant experiments
demonstrate the superiority of our method for image harmonization over
state-of-the-art methods.Comment: Accepted by ICASSP 202
Joint Projection Learning and Tensor Decomposition Based Incomplete Multi-view Clustering
Incomplete multi-view clustering (IMVC) has received increasing attention
since it is often that some views of samples are incomplete in reality. Most
existing methods learn similarity subgraphs from original incomplete multi-view
data and seek complete graphs by exploring the incomplete subgraphs of each
view for spectral clustering. However, the graphs constructed on the original
high-dimensional data may be suboptimal due to feature redundancy and noise.
Besides, previous methods generally ignored the graph noise caused by the
inter-class and intra-class structure variation during the transformation of
incomplete graphs and complete graphs. To address these problems, we propose a
novel Joint Projection Learning and Tensor Decomposition Based method (JPLTD)
for IMVC. Specifically, to alleviate the influence of redundant features and
noise in high-dimensional data, JPLTD introduces an orthogonal projection
matrix to project the high-dimensional features into a lower-dimensional space
for compact feature learning.Meanwhile, based on the lower-dimensional space,
the similarity graphs corresponding to instances of different views are
learned, and JPLTD stacks these graphs into a third-order low-rank tensor to
explore the high-order correlations across different views. We further consider
the graph noise of projected data caused by missing samples and use a
tensor-decomposition based graph filter for robust clustering.JPLTD decomposes
the original tensor into an intrinsic tensor and a sparse tensor. The intrinsic
tensor models the true data similarities. An effective optimization algorithm
is adopted to solve the JPLTD model. Comprehensive experiments on several
benchmark datasets demonstrate that JPLTD outperforms the state-of-the-art
methods. The code of JPLTD is available at https://github.com/weilvNJU/JPLTD.Comment: IEEE Transactions on Neural Networks and Learning Systems, 202
10-Formyl-2,4,6,8,12-pentaÂnitro-2,4,6,8,10,12-hexaÂazatetraÂcycloÂ[5.5.0.03,11.05,9]dodecaÂne
The title compound, C7H7N11O11 (PNMFIW), is a caged heterocycle substituted with five nitro and one formyl groups. It is related to the hexaÂazaisowurtzitane family of high-density high-energy polycyclic cage compounds. Four nitro groups are appended to the four N atoms of the two five-membered rings, while a nitro group and a formyl are attached to the two N atoms of the six-membered ring
Attention-Guided Contrastive Role Representations for Multi-Agent Reinforcement Learning
Real-world multi-agent tasks usually involve dynamic team composition with
the emergence of roles, which should also be a key to efficient cooperation in
multi-agent reinforcement learning (MARL). Drawing inspiration from the
correlation between roles and agent's behavior patterns, we propose a novel
framework of **A**ttention-guided **CO**ntrastive **R**ole representation
learning for **M**ARL (**ACORM**) to promote behavior heterogeneity, knowledge
transfer, and skillful coordination across agents. First, we introduce mutual
information maximization to formalize role representation learning, derive a
contrastive learning objective, and concisely approximate the distribution of
negative pairs. Second, we leverage an attention mechanism to prompt the global
state to attend to learned role representations in value decomposition,
implicitly guiding agent coordination in a skillful role space to yield more
expressive credit assignment. Experiments on challenging StarCraft II
micromanagement and Google research football tasks demonstrate the
state-of-the-art performance of our method and its advantages over existing
approaches. Our code is available at
[https://github.com/NJU-RL/ACORM](https://github.com/NJU-RL/ACORM)
Contrastive Latent Space Reconstruction Learning for Audio-Text Retrieval
Cross-modal retrieval (CMR) has been extensively applied in various domains,
such as multimedia search engines and recommendation systems. Most existing CMR
methods focus on image-to-text retrieval, whereas audio-to-text retrieval, a
less explored domain, has posed a great challenge due to the difficulty to
uncover discriminative features from audio clips and texts. Existing studies
are restricted in the following two ways: 1) Most researchers utilize
contrastive learning to construct a common subspace where similarities among
data can be measured. However, they considers only cross-modal transformation,
neglecting the intra-modal separability. Besides, the temperature parameter is
not adaptively adjusted along with semantic guidance, which degrades the
performance. 2) These methods do not take latent representation reconstruction
into account, which is essential for semantic alignment. This paper introduces
a novel audio-text oriented CMR approach, termed Contrastive Latent Space
Reconstruction Learning (CLSR). CLSR improves contrastive representation
learning by taking intra-modal separability into account and adopting an
adaptive temperature control strategy. Moreover, the latent representation
reconstruction modules are embedded into the CMR framework, which improves
modal interaction. Experiments in comparison with some state-of-the-art methods
on two audio-text datasets have validated the superiority of CLSR.Comment: Accepted by The 35th IEEE International Conference on Tools with
Artificial Intelligence. (ICTAI 2023
10-Formyl-2,4,6,8,12-pentaÂnitro-2,4,6,8,10,12-hexaÂazatetraÂcycloÂ[5.5.0.05,9.03,11]dodecane acetone solvate
The title compound, C7H7N11O11·C3H6O, consisting of one molÂecule of 10-formyl-2,4,6,8,12-pentaÂnitro-2,4,6,8,10,12-hexaÂazatetraÂcycloÂ[5.5.0.05,9.03,11]dodecane (pentaÂnitroÂmonoÂformÂylÂhexaÂazaÂisowurtzitane, PNMFIW) and one acetone solvent molÂecule, is a member of the caged hexaÂazaisowurtzitane family. PNMFIW has a cage structure which is constructed from one six-membered and two five-membered rings which are linked by a C—C bond, thus creating two seven-membered rings. In the PNMFIW molÂecule, one formyl group is bonded to the N heteroatom of the six-membered cycle, and five nitro groups are appended to other five N heteroatom of the caged structure. The acetone solvent molÂecule is arranged beside a five-membered plane of PNMFIW with an O atom and an H atom close (with respect to the sum of the van der Waals radii) to the neighbouring nitro O atom [O⋯O = 2.957 (3) and 2.852 (3) Å; O⋯ H = 2.692 (2), 2.526 (3) and 2.432 (3) Å]