1,014 research outputs found
PEACE: Prototype lEarning Augmented transferable framework for Cross-domain rEcommendation
To help merchants/customers to provide/access a variety of services through
miniapps, online service platforms have occupied a critical position in the
effective content delivery, in which how to recommend items in the new domain
launched by the service provider for customers has become more urgent. However,
the non-negligible gap between the source and diversified target domains poses
a considerable challenge to cross-domain recommendation systems, which often
leads to performance bottlenecks in industrial settings. While entity graphs
have the potential to serve as a bridge between domains, rudimentary
utilization still fail to distill useful knowledge and even induce the negative
transfer issue. To this end, we propose PEACE, a Prototype lEarning Augmented
transferable framework for Cross-domain rEcommendation. For domain gap
bridging, PEACE is built upon a multi-interest and entity-oriented pre-training
architecture which could not only benefit the learning of generalized knowledge
in a multi-granularity manner, but also help leverage more structural
information in the entity graph. Then, we bring the prototype learning into the
pre-training over source domains, so that representations of users and items
are greatly improved by the contrastive prototype learning module and the
prototype enhanced attention mechanism for adaptive knowledge utilization. To
ease the pressure of online serving, PEACE is carefully deployed in a
lightweight manner, and significant performance improvements are observed in
both online and offline environments.Comment: Accepted by WSDM 202
Enhancing Representation in Radiography-Reports Foundation Model: A Granular Alignment Algorithm Using Masked Contrastive Learning
Recently, multi-modal vision-language foundation models have gained
significant attention in the medical field. While these models offer great
opportunities, they still face a number of challenges, such as the requirement
for fine-grained knowledge understanding in computer-aided diagnosis and
capability of utilizing very limited or no task-specific labeled data in
real-world clinical applications. In this study, we present MaCo, a novel
multi-modal medical foundation model that explores masked contrastive learning
to achieve granular alignment and zero-shot learning for a variety of medical
imaging tasks. MaCo incorporates a correlation weighting mechanism to adjust
the correlation between masked image patches and their corresponding reports,
thereby enhancing the representation learning capabilities. We evaluate MaCo on
six well-known open-source X-ray datasets, and the experimental results show it
outperforms seven state-of-the-art approaches for classification, segmentation,
and zero-shot phase grounding, demonstrating its great potential to promote a
wide range of medical image analysis tasks
Instance Adaptive Prototypical Contrastive Embedding for Generalized Zero Shot Learning
Generalized zero-shot learning(GZSL) aims to classify samples from seen and
unseen labels, assuming unseen labels are not accessible during training.
Recent advancements in GZSL have been expedited by incorporating
contrastive-learning-based (instance-based) embedding in generative networks
and leveraging the semantic relationship between data points. However, existing
embedding architectures suffer from two limitations: (1) limited
discriminability of synthetic features' embedding without considering
fine-grained cluster structures; (2) inflexible optimization due to restricted
scaling mechanisms on existing contrastive embedding networks, leading to
overlapped representations in the embedding space. To enhance the quality of
representations in the embedding space, as mentioned in (1), we propose a
margin-based prototypical contrastive learning embedding network that reaps the
benefits of prototype-data (cluster quality enhancement) and implicit data-data
(fine-grained representations) interaction while providing substantial cluster
supervision to the embedding network and the generator. To tackle (2), we
propose an instance adaptive contrastive loss that leads to generalized
representations for unseen labels with increased inter-class margin. Through
comprehensive experimental evaluation, we show that our method can outperform
the current state-of-the-art on three benchmark datasets. Our approach also
consistently achieves the best unseen performance in the GZSL setting.Comment: 7 pages, 4 figures. Accepted in IJCAI 2023 Workshop on Generalizing
from Limited Resources in the Open Worl
Exploiting the Relationship Between Visual and Textual Features in Social Networks for Image Classification with Zero-Shot Deep Learning
One of the main issues related to unsupervised machine learning is the cost of processing and extracting useful information from large datasets. In this work, we propose a classifier ensemble based on the transferable learning capabilities of the CLIP neural network architecture in multimodal environments (image and text) from social media. For this purpose, we used the InstaNY100K dataset and proposed a validation approach based on sampling techniques. Our experiments, based on image classification tasks according to the labels of the Places dataset, are performed by first considering only the visual part, and then adding the associated texts as support. The results obtained demonstrated that trained neural networks such as CLIP can be successfully applied to image classification with little fine-tuning, and considering the associated texts to the images can help to improve the accuracy depending on the goal. The results demonstrated what seems to be a promising research direction.This work was funded by the University of Alicante UAPOSTCOVID19-10 grant for “Collecting and publishing open data for the revival of the tourism sector post-COVID-19” project
GENNAPE: Towards Generalized Neural Architecture Performance Estimators
Predicting neural architecture performance is a challenging task and is
crucial to neural architecture design and search. Existing approaches either
rely on neural performance predictors which are limited to modeling
architectures in a predefined design space involving specific sets of operators
and connection rules, and cannot generalize to unseen architectures, or resort
to zero-cost proxies which are not always accurate. In this paper, we propose
GENNAPE, a Generalized Neural Architecture Performance Estimator, which is
pretrained on open neural architecture benchmarks, and aims to generalize to
completely unseen architectures through combined innovations in network
representation, contrastive pretraining, and fuzzy clustering-based predictor
ensemble. Specifically, GENNAPE represents a given neural network as a
Computation Graph (CG) of atomic operations which can model an arbitrary
architecture. It first learns a graph encoder via Contrastive Learning to
encourage network separation by topological features, and then trains multiple
predictor heads, which are soft-aggregated according to the fuzzy membership of
a neural network. Experiments show that GENNAPE pretrained on NAS-Bench-101 can
achieve superior transferability to 5 different public neural network
benchmarks, including NAS-Bench-201, NAS-Bench-301, MobileNet and ResNet
families under no or minimum fine-tuning. We further introduce 3 challenging
newly labelled neural network benchmarks: HiAML, Inception and Two-Path, which
can concentrate in narrow accuracy ranges. Extensive experiments show that
GENNAPE can correctly discern high-performance architectures in these families.
Finally, when paired with a search algorithm, GENNAPE can find architectures
that improve accuracy while reducing FLOPs on three families.Comment: AAAI 2023 Oral Presentation; includes supplementary materials with
more details on introduced benchmarks; 14 Pages, 6 Figures, 10 Table
Recent Advances of Local Mechanisms in Computer Vision: A Survey and Outlook of Recent Work
Inspired by the fact that human brains can emphasize discriminative parts of
the input and suppress irrelevant ones, substantial local mechanisms have been
designed to boost the development of computer vision. They can not only focus
on target parts to learn discriminative local representations, but also process
information selectively to improve the efficiency. In terms of application
scenarios and paradigms, local mechanisms have different characteristics. In
this survey, we provide a systematic review of local mechanisms for various
computer vision tasks and approaches, including fine-grained visual
recognition, person re-identification, few-/zero-shot learning, multi-modal
learning, self-supervised learning, Vision Transformers, and so on.
Categorization of local mechanisms in each field is summarized. Then,
advantages and disadvantages for every category are analyzed deeply, leaving
room for exploration. Finally, future research directions about local
mechanisms have also been discussed that may benefit future works. To the best
our knowledge, this is the first survey about local mechanisms on computer
vision. We hope that this survey can shed light on future research in the
computer vision field
CAVL: Learning Contrastive and Adaptive Representations of Vision and Language
Visual and linguistic pre-training aims to learn vision and language
representations together, which can be transferred to visual-linguistic
downstream tasks. However, there exists semantic confusion between language and
vision during the pre-training stage. Moreover, current pre-trained models tend
to take lots of computation resources for fine-tuning when transferred to
downstream tasks. In this work, we present a simple but effective approach for
learning Contrastive and Adaptive representations of Vision and Language,
namely CAVL. Specifically, we introduce a pair-wise contrastive loss to learn
alignments between the whole sentence and each image in the same batch during
the pre-training process. At the fine-tuning stage, we introduce two
lightweight adaptation networks to reduce model parameters and increase
training speed for saving computation resources. We evaluate our CAVL on six
main downstream tasks, including Visual Question Answering (VQA), Visual
Commonsense Reasoning (VCR), Natural Language for Visual Reasoning (NLVR),
Region-to-Phrase Grounding (RPG), Text-to-Image Retrieval (TIR), and Zero-shot
Text-to-Image Retrieval (ZS-TIR). Compared to baselines, we achieve superior
performance and reduce the fine-tuning time by a large margin (in particular,
76.17%). Extensive experiments and ablation studies demonstrate the efficiency
of contrastive pre-training and adaptive fine-tuning proposed in our CAVL
CLAMP: A Contrastive Language And Molecule Pre-training Network
This paper highlights a shift in how to approach material generation. Instead
of material-to-material, we propose a language-to-material generation
architecture that utilizes millions of untapped data points. Using a web
scraper to collect crystal text pairs from open-source research papers, a
contrastive model can be trained using a convolutional graph neural network
encoder and a language encoder. This would allow unsupervised zero-shot
classification which can be trained by taking advantage of linguistic
structure. Without any specific training data, an ~82\% accuracy was achieved
and ~75\% accuracy for photocatalyst prediction with an extremely small
dataset. This novel network could ideally be cross-applied to any reaction that
can be described via text, opening completely new methods to think about 3D
chemical framework generation. In the full experiment diffusion models would
likely be incorporated to fully exploit the latent space.Comment: 3 pages, 1 figure, Presenting @ NeurIPS23 & Workshop - source @
https://github.com/neelr/clamp - dataset @
https://www.kaggle.com/datasets/programgeek01/cif-summary-dat
- …