116 research outputs found
Prompt Tuning on Graph-augmented Low-resource Text Classification
Text classification is a fundamental problem in information retrieval with
many real-world applications, such as predicting the topics of online articles
and the categories of e-commerce product descriptions. However, low-resource
text classification, with no or few labeled samples, presents a serious concern
for supervised learning. Meanwhile, many text data are inherently grounded on a
network structure, such as a hyperlink/citation network for online articles,
and a user-item purchase network for e-commerce products. These graph
structures capture rich semantic relationships, which can potentially augment
low-resource text classification. In this paper, we propose a novel model
called Graph-Grounded Pre-training and Prompting (G2P2) to address low-resource
text classification in a two-pronged approach. During pre-training, we propose
three graph interaction-based contrastive strategies to jointly pre-train a
graph-text model; during downstream classification, we explore handcrafted
discrete prompts and continuous prompt tuning for the jointly pre-trained model
to achieve zero- and few-shot classification, respectively. Besides, for
generalizing continuous prompts to unseen classes, we propose conditional
prompt tuning on graphs (G2P2). Extensive experiments on four real-world
datasets demonstrate the strength of G2P2 in zero- and few-shot low-resource
text classification tasks, and illustrate the advantage of G2P2 in dealing
with unseen classes.Comment: 14 pages, journal under review. arXiv admin note: substantial text
overlap with arXiv:2305.0332
Localized Sparse Incomplete Multi-view Clustering
Incomplete multi-view clustering, which aims to solve the clustering problem
on the incomplete multi-view data with partial view missing, has received more
and more attention in recent years. Although numerous methods have been
developed, most of the methods either cannot flexibly handle the incomplete
multi-view data with arbitrary missing views or do not consider the negative
factor of information imbalance among views. Moreover, some methods do not
fully explore the local structure of all incomplete views. To tackle these
problems, this paper proposes a simple but effective method, named localized
sparse incomplete multi-view clustering (LSIMVC). Different from the existing
methods, LSIMVC intends to learn a sparse and structured consensus latent
representation from the incomplete multi-view data by optimizing a sparse
regularized and novel graph embedded multi-view matrix factorization model.
Specifically, in such a novel model based on the matrix factorization, a l1
norm based sparse constraint is introduced to obtain the sparse low-dimensional
individual representations and the sparse consensus representation. Moreover, a
novel local graph embedding term is introduced to learn the structured
consensus representation. Different from the existing works, our local graph
embedding term aggregates the graph embedding task and consensus representation
learning task into a concise term. Furthermore, to reduce the imbalance factor
of incomplete multi-view learning, an adaptive weighted learning scheme is
introduced to LSIMVC. Finally, an efficient optimization strategy is given to
solve the optimization problem of our proposed model. Comprehensive
experimental results performed on six incomplete multi-view databases verify
that the performance of our LSIMVC is superior to the state-of-the-art IMC
approaches. The code is available in https://github.com/justsmart/LSIMVC.Comment: Published in IEEE Transactions on Multimedia (TMM). The code is
available at Github https://github.com/justsmart/LSIMV
Voucher Abuse Detection with Prompt-based Fine-tuning on Graph Neural Networks
Voucher abuse detection is an important anomaly detection problem in
E-commerce. While many GNN-based solutions have emerged, the supervised
paradigm depends on a large quantity of labeled data. A popular alternative is
to adopt self-supervised pre-training using label-free data, and further
fine-tune on a downstream task with limited labels. Nevertheless, the
"pre-train, fine-tune" paradigm is often plagued by the objective gap between
pre-training and downstream tasks. Hence, we propose VPGNN, a prompt-based
fine-tuning framework on GNNs for voucher abuse detection. We design a novel
graph prompting function to reformulate the downstream task into a similar
template as the pretext task in pre-training, thereby narrowing the objective
gap. Extensive experiments on both proprietary and public datasets demonstrate
the strength of VPGNN in both few-shot and semi-supervised scenarios. Moreover,
an online deployment of VPGNN in a production environment shows a 23.4%
improvement over two existing deployed models.Comment: 7 pages, Accepted by CIKM23 Applied Research Trac
Information Recovery-Driven Deep Incomplete Multiview Clustering Network
Incomplete multi-view clustering is a hot and emerging topic. It is well
known that unavoidable data incompleteness greatly weakens the effective
information of multi-view data. To date, existing incomplete multi-view
clustering methods usually bypass unavailable views according to prior missing
information, which is considered as a second-best scheme based on evasion.
Other methods that attempt to recover missing information are mostly applicable
to specific two-view datasets. To handle these problems, in this paper, we
propose an information recovery-driven deep incomplete multi-view clustering
network, termed as RecFormer. Concretely, a two-stage autoencoder network with
the self-attention structure is built to synchronously extract high-level
semantic representations of multiple views and recover the missing data.
Besides, we develop a recurrent graph reconstruction mechanism that cleverly
leverages the restored views to promote the representation learning and the
further data reconstruction. Visualization of recovery results are given and
sufficient experimental results confirm that our RecFormer has obvious
advantages over other top methods.Comment: Accepted by TNNLS 2023. Please contact me if you have any questions:
[email protected]. The code is available at:
https://github.com/justsmart/RecForme
Learning to Detect Noisy Labels Using Model-Based Features
Label noise is ubiquitous in various machine learning scenarios such as
self-labeling with model predictions and erroneous data annotation. Many
existing approaches are based on heuristics such as sample losses, which might
not be flexible enough to achieve optimal solutions. Meta learning based
methods address this issue by learning a data selection function, but can be
hard to optimize. In light of these pros and cons, we propose
Selection-Enhanced Noisy label Training (SENT) that does not rely on meta
learning while having the flexibility of being data-driven. SENT transfers the
noise distribution to a clean set and trains a model to distinguish noisy
labels from clean ones using model-based features. Empirically, on a wide range
of tasks including text classification and speech recognition, SENT improves
performance over strong baselines under the settings of self-training and label
corruption
Activity-assisted barrier-crossing of self-propelled colloids over parallel microgrooves
We report a systematic study of the dynamics of self-propelled particles
(SPPs) over a one-dimensional periodic potential landscape, which is fabricated
on a microgroove-patterned polydimethylsiloxane (PDMS) substrate. From the
measured non-equilibrium probability density function of the SPPs, we find that
the escape dynamics of the slow-rotating SPPs across the potential landscape
can be described by an effective potential, once the self-propulsion force is
included into the potential under the fixed angle approximation. This work
demonstrates that the parallel microgrooves provide a versatile platform for a
quantitative understanding of the interplay among the self-propulsion force,
spatial confinement by the potential landscape, and thermal noise, as well as
its effects on activity-assisted escape dynamics and transport of the SPPs
Host-Guest Complexation of Amphiphilic Molecules at the Air-Water Interface Prevents Oxidation by Hydroxyl Radicals and Singlet Oxygen
The oxidation of antioxidants by oxidizers imposes great challenges to both living organisms and the food industry. Here we show that the hostâguest complexation of the carefully designed, positively charged, amphiphilic guanidinocalix[5]arene pentadodecyl ether (GC5Aâ12C) and negatively charged oleic acid (OA), a wellâknown cell membrane antioxidant, prevents the oxidation of the complex monolayers at the airâwater interface from two potent oxidizers hydroxyl radicals (OH) and singlet delta oxygen (SDO). OH is generated from the gas phase and attacks from the top of the monolayer, while SDO is generated inside the monolayer and attacks amphiphiles from a lateral direction. Fieldâinduced droplet ionization mass spectrometry results have demonstrated that the hostâguest complexation achieves steric shielding and prevents both types of oxidation as a result of the tight and âsleeved inâ physical arrangement, rather than the chemical reactivity, of the complexes
- âŠ