Search CORE

1,721 research outputs found

SMILE : Twitter emotion classification using domain adaptation

Author: Jensen Eric
Liakata Maria
Procter Rob
Wang Bo
Zubiaga Arkaitz
Publication venue: Sun SITE Central Europe
Publication date: 01/07/2016
Field of study

Despite the widely spread research interest in social media sentiment analysis, sentiment and emotion classification across different domains and on Twitter data remains a challenging task. Here we set out to find an effective approach for tackling a cross-domain emotion classification task on a set of Twitter data involving social media discourse around arts and cultural experiences, in the context of museums. While most existing work in domain adaptation has focused on feature-based or/and instance-based adaptation methods, in this work we study a model-based adaptive SVM approach as we believe its flexibility and efficiency is more suitable for the task at hand. We conduct a series of experiments and compare our system with a set of baseline methods. Our results not only show a superior performance in terms of accuracy and computational efficiency compared to the baselines, but also shed light on how different ratios of labelled target-domain data used for adaptation can affect classification performance

Warwick Research Archives Portal Repository

Cross-Domain Labeled LDA for Cross-Domain Text Classification

Author: Jing Baoyu
Lu Chenwei
Niu Cheng
Wang Deqing
Zhuang Fuzhen
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/09/2018
Field of study

Cross-domain text classification aims at building a classifier for a target domain which leverages data from both source and target domain. One promising idea is to minimize the feature distribution differences of the two domains. Most existing studies explicitly minimize such differences by an exact alignment mechanism (aligning features by one-to-one feature alignment, projection matrix etc.). Such exact alignment, however, will restrict models' learning ability and will further impair models' performance on classification tasks when the semantic distributions of different domains are very different. To address this problem, we propose a novel group alignment which aligns the semantics at group level. In addition, to help the model learn better semantic groups and semantics within these groups, we also propose a partial supervision for model's learning in source domain. To this end, we embed the group alignment and a partial supervision into a cross-domain topic model, and propose a Cross-Domain Labeled LDA (CDL-LDA). On the standard 20Newsgroup and Reuters dataset, extensive quantitative (classification, perplexity etc.) and qualitative (topic detection) experiments are conducted to show the effectiveness of the proposed group alignment and partial supervision.Comment: ICDM 201

arXiv.org e-Print Archive

Crossref

Natural Language Processing for Financial Regulation

Author: Achitouv Ixandra
Gorduza Dragos
Jacquier Antoine
Publication venue
Publication date: 14/11/2023
Field of study

This article provides an understanding of Natural Language Processing techniques in the framework of financial regulation, more specifically in order to perform semantic matching search between rules and policy when no dataset is available for supervised learning. We outline how to outperform simple pre-trained sentences-transformer models using freely available resources and explain the mathematical concepts behind the key building blocks of Natural Language Processing.Comment: 20 pages, 3 figure

arXiv.org e-Print Archive

Efficiently Reusing Natural Language Processing Models for Phenotype Identification in Free-text Electronic Medical Records: Methodological Study

Author: Dobson RJB
Dyson S
Hodgson K
Ibrahim ZM
Iqbal E
Morley KI
Stewart R
Sudlow C
Wu H
Publication venue: 'JMIR Publications Inc.'
Publication date: 17/12/2019
Field of study

Background: Many efforts have been put into the use of automated approaches, such as natural language processing (NLP), to mine or extract data from free-text medical records to construct comprehensive patient profiles for delivering better health-care. Reusing NLP models in new settings, however, remains cumbersome - requiring validation and/or retraining on new data iteratively to achieve convergent results. Objective: The aim of this work is to minimise the effort involved in reusing NLP models on free-text medical records. Methods: We formally define and analyse the model adaptation problem in phenotype identification tasks. We identify “duplicate waste” and “imbalance waste”, which collectively impede efficient model reuse. We propose a concept embedding based approach to minimise these sources of waste without the need for labelled data from new settings. Results: We conduct experiments on data from a large mental health registry to reuse NLP models in four phenotype identification tasks. The proposed approach can choose the best model for a new task, identifying up to 76% of phenotype mentions without the need for validation and model retraining, and with very good performance (93-97% accuracy). It can also provide guidance for validating and retraining the selected model for novel language patterns in new tasks, saving around 80% of the effort required in “blind” model-adaptation approaches. Conclusions: Adapting pre-trained NLP models for new tasks can be more efficient and effective if the language pattern landscapes of old settings and new settings can be made explicit and comparable. Our experiments show that the phenotype embedding approach is an effective way to model language patterns for phenotype identification tasks and that its use can guide efficient NLP model reuse

UCL Discovery

Semi-supervised Domain Adaptation on Graphs with Contrastive Learning and Minimax Entropy

Author: Dai Jing
Dai Quanyu
Kwok Ka-Wai
Lam James
Shen Xiao
Xiao Jiaren
Xie Xiaochen
Publication venue
Publication date: 13/09/2023
Field of study

Label scarcity in a graph is frequently encountered in real-world applications due to the high cost of data labeling. To this end, semi-supervised domain adaptation (SSDA) on graphs aims to leverage the knowledge of a labeled source graph to aid in node classification on a target graph with limited labels. SSDA tasks need to overcome the domain gap between the source and target graphs. However, to date, this challenging research problem has yet to be formally considered by the existing approaches designed for cross-graph node classification. To tackle the SSDA problem on graphs, a novel method called SemiGCL is proposed, which benefits from graph contrastive learning and minimax entropy training. SemiGCL generates informative node representations by contrasting the representations learned from a graph's local and global views. Additionally, SemiGCL is adversarially optimized with the entropy loss of unlabeled target nodes to reduce domain divergence. Experimental results on benchmark datasets demonstrate that SemiGCL outperforms the state-of-the-art baselines on the SSDA tasks

arXiv.org e-Print Archive

Learn2Weight: Parameter Adaptation against Similar-domain Adversarial Attacks

Author: Datta Siddhartha
Publication venue
Publication date: 20/09/2022
Field of study

Recent work in black-box adversarial attacks for NLP systems has attracted much attention. Prior black-box attacks assume that attackers can observe output labels from target models based on selected inputs. In this work, inspired by adversarial transferability, we propose a new type of black-box NLP adversarial attack that an attacker can choose a similar domain and transfer the adversarial examples to the target domain and cause poor performance in target model. Based on domain adaptation theory, we then propose a defensive strategy, called Learn2Weight, which trains to predict the weight adjustments for a target model in order to defend against an attack of similar-domain adversarial examples. Using Amazon multi-domain sentiment classification datasets, we empirically show that Learn2Weight is effective against the attack compared to standard black-box defense methods such as adversarial training and defensive distillation. This work contributes to the growing literature on machine learning safety.Comment: Accepted in COLING 202

arXiv.org e-Print Archive

Learning Transferable Features for Unsupervised Domain Adaptation in Natural Language Processing

Author: Cui Xia
Publication venue
Publication date
Field of study

University of Liverpool Repository