116 research outputs found

    Prompt Tuning on Graph-augmented Low-resource Text Classification

    Full text link
    Text classification is a fundamental problem in information retrieval with many real-world applications, such as predicting the topics of online articles and the categories of e-commerce product descriptions. However, low-resource text classification, with no or few labeled samples, presents a serious concern for supervised learning. Meanwhile, many text data are inherently grounded on a network structure, such as a hyperlink/citation network for online articles, and a user-item purchase network for e-commerce products. These graph structures capture rich semantic relationships, which can potentially augment low-resource text classification. In this paper, we propose a novel model called Graph-Grounded Pre-training and Prompting (G2P2) to address low-resource text classification in a two-pronged approach. During pre-training, we propose three graph interaction-based contrastive strategies to jointly pre-train a graph-text model; during downstream classification, we explore handcrafted discrete prompts and continuous prompt tuning for the jointly pre-trained model to achieve zero- and few-shot classification, respectively. Besides, for generalizing continuous prompts to unseen classes, we propose conditional prompt tuning on graphs (G2P2∗^*). Extensive experiments on four real-world datasets demonstrate the strength of G2P2 in zero- and few-shot low-resource text classification tasks, and illustrate the advantage of G2P2∗^* in dealing with unseen classes.Comment: 14 pages, journal under review. arXiv admin note: substantial text overlap with arXiv:2305.0332

    Localized Sparse Incomplete Multi-view Clustering

    Full text link
    Incomplete multi-view clustering, which aims to solve the clustering problem on the incomplete multi-view data with partial view missing, has received more and more attention in recent years. Although numerous methods have been developed, most of the methods either cannot flexibly handle the incomplete multi-view data with arbitrary missing views or do not consider the negative factor of information imbalance among views. Moreover, some methods do not fully explore the local structure of all incomplete views. To tackle these problems, this paper proposes a simple but effective method, named localized sparse incomplete multi-view clustering (LSIMVC). Different from the existing methods, LSIMVC intends to learn a sparse and structured consensus latent representation from the incomplete multi-view data by optimizing a sparse regularized and novel graph embedded multi-view matrix factorization model. Specifically, in such a novel model based on the matrix factorization, a l1 norm based sparse constraint is introduced to obtain the sparse low-dimensional individual representations and the sparse consensus representation. Moreover, a novel local graph embedding term is introduced to learn the structured consensus representation. Different from the existing works, our local graph embedding term aggregates the graph embedding task and consensus representation learning task into a concise term. Furthermore, to reduce the imbalance factor of incomplete multi-view learning, an adaptive weighted learning scheme is introduced to LSIMVC. Finally, an efficient optimization strategy is given to solve the optimization problem of our proposed model. Comprehensive experimental results performed on six incomplete multi-view databases verify that the performance of our LSIMVC is superior to the state-of-the-art IMC approaches. The code is available in https://github.com/justsmart/LSIMVC.Comment: Published in IEEE Transactions on Multimedia (TMM). The code is available at Github https://github.com/justsmart/LSIMV

    Voucher Abuse Detection with Prompt-based Fine-tuning on Graph Neural Networks

    Full text link
    Voucher abuse detection is an important anomaly detection problem in E-commerce. While many GNN-based solutions have emerged, the supervised paradigm depends on a large quantity of labeled data. A popular alternative is to adopt self-supervised pre-training using label-free data, and further fine-tune on a downstream task with limited labels. Nevertheless, the "pre-train, fine-tune" paradigm is often plagued by the objective gap between pre-training and downstream tasks. Hence, we propose VPGNN, a prompt-based fine-tuning framework on GNNs for voucher abuse detection. We design a novel graph prompting function to reformulate the downstream task into a similar template as the pretext task in pre-training, thereby narrowing the objective gap. Extensive experiments on both proprietary and public datasets demonstrate the strength of VPGNN in both few-shot and semi-supervised scenarios. Moreover, an online deployment of VPGNN in a production environment shows a 23.4% improvement over two existing deployed models.Comment: 7 pages, Accepted by CIKM23 Applied Research Trac

    Information Recovery-Driven Deep Incomplete Multiview Clustering Network

    Full text link
    Incomplete multi-view clustering is a hot and emerging topic. It is well known that unavoidable data incompleteness greatly weakens the effective information of multi-view data. To date, existing incomplete multi-view clustering methods usually bypass unavailable views according to prior missing information, which is considered as a second-best scheme based on evasion. Other methods that attempt to recover missing information are mostly applicable to specific two-view datasets. To handle these problems, in this paper, we propose an information recovery-driven deep incomplete multi-view clustering network, termed as RecFormer. Concretely, a two-stage autoencoder network with the self-attention structure is built to synchronously extract high-level semantic representations of multiple views and recover the missing data. Besides, we develop a recurrent graph reconstruction mechanism that cleverly leverages the restored views to promote the representation learning and the further data reconstruction. Visualization of recovery results are given and sufficient experimental results confirm that our RecFormer has obvious advantages over other top methods.Comment: Accepted by TNNLS 2023. Please contact me if you have any questions: [email protected]. The code is available at: https://github.com/justsmart/RecForme

    Learning to Detect Noisy Labels Using Model-Based Features

    Full text link
    Label noise is ubiquitous in various machine learning scenarios such as self-labeling with model predictions and erroneous data annotation. Many existing approaches are based on heuristics such as sample losses, which might not be flexible enough to achieve optimal solutions. Meta learning based methods address this issue by learning a data selection function, but can be hard to optimize. In light of these pros and cons, we propose Selection-Enhanced Noisy label Training (SENT) that does not rely on meta learning while having the flexibility of being data-driven. SENT transfers the noise distribution to a clean set and trains a model to distinguish noisy labels from clean ones using model-based features. Empirically, on a wide range of tasks including text classification and speech recognition, SENT improves performance over strong baselines under the settings of self-training and label corruption

    Activity-assisted barrier-crossing of self-propelled colloids over parallel microgrooves

    Full text link
    We report a systematic study of the dynamics of self-propelled particles (SPPs) over a one-dimensional periodic potential landscape, which is fabricated on a microgroove-patterned polydimethylsiloxane (PDMS) substrate. From the measured non-equilibrium probability density function of the SPPs, we find that the escape dynamics of the slow-rotating SPPs across the potential landscape can be described by an effective potential, once the self-propulsion force is included into the potential under the fixed angle approximation. This work demonstrates that the parallel microgrooves provide a versatile platform for a quantitative understanding of the interplay among the self-propulsion force, spatial confinement by the potential landscape, and thermal noise, as well as its effects on activity-assisted escape dynamics and transport of the SPPs

    Host-Guest Complexation of Amphiphilic Molecules at the Air-Water Interface Prevents Oxidation by Hydroxyl Radicals and Singlet Oxygen

    Get PDF
    The oxidation of antioxidants by oxidizers imposes great challenges to both living organisms and the food industry. Here we show that the host–guest complexation of the carefully designed, positively charged, amphiphilic guanidinocalix[5]arene pentadodecyl ether (GC5A‐12C) and negatively charged oleic acid (OA), a well‐known cell membrane antioxidant, prevents the oxidation of the complex monolayers at the air–water interface from two potent oxidizers hydroxyl radicals (OH) and singlet delta oxygen (SDO). OH is generated from the gas phase and attacks from the top of the monolayer, while SDO is generated inside the monolayer and attacks amphiphiles from a lateral direction. Field‐induced droplet ionization mass spectrometry results have demonstrated that the host–guest complexation achieves steric shielding and prevents both types of oxidation as a result of the tight and “sleeved in” physical arrangement, rather than the chemical reactivity, of the complexes
    • 

    corecore