Search CORE

8 research outputs found

Transductive Learning with String Kernels for Cross-Domain Text Classification

Author: AM Fernández
D Bollegala
G Ifrim
H Lodhi
J Shawe-Taylor
M Franco-Salvador
M Long
Marius Popescu
RT Ionescu
RT Ionescu
RT Ionescu
TG Dietterich
Publication venue
Publication date: 02/11/2018
Field of study

For many text classification tasks, there is a major problem posed by the lack of labeled data in a target domain. Although classifiers for a target domain can be trained on labeled text data from a related source domain, the accuracy of such classifiers is usually lower in the cross-domain setting. Recently, string kernels have obtained state-of-the-art results in various text classification tasks such as native language identification or automatic essay scoring. Moreover, classifiers based on string kernels have been found to be robust to the distribution gap between different domains. In this paper, we formally describe an algorithm composed of two simple yet effective transductive learning approaches to further improve the results of string kernels in cross-domain settings. By adapting string kernels to the test set without using the ground-truth test labels, we report significantly better accuracy rates in cross-domain English polarity classification.Comment: Accepted at ICONIP 2018. arXiv admin note: substantial text overlap with arXiv:1808.0840

arXiv.org e-Print Archive

Crossref

Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing

Author: Berzak Yevgeni
Korhonen Anna
O'Horan Helen
Poibeau Thierry
Ponti Edoardo Maria
Reichart Roi
Shutova Ekaterina
Vulić Ivan
Publication venue
Publication date: 27/02/2019
Field of study

Linguistic typology aims to capture structural and semantic variation across the world's languages. A large-scale typology could provide excellent guidance for multilingual Natural Language Processing (NLP), particularly for languages that suffer from the lack of human labeled resources. We present an extensive literature survey on the use of typological information in the development of NLP techniques. Our survey demonstrates that to date, the use of information in existing typological databases has resulted in consistent but modest improvements in system performance. We show that this is due to both intrinsic limitations of databases (in terms of coverage and feature granularity) and under-employment of the typological features included in them. We advocate for a new approach that adapts the broad and discrete nature of typological categories to the contextual and continuous nature of machine learning algorithms used in contemporary NLP. In particular, we suggest that such approach could be facilitated by recent developments in data-driven induction of typological knowledge

arXiv.org e-Print Archive

Edinburgh Research Explorer

Apollo (Cambridge)

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Distributional correspondence indexing for cross-lingual and cross-domain sentiment classification

Author: Alejandro Fernández
Andrea Esuli
Fabrizio Sebastiani
Publication venue
Publication date: 20/01/2016
Field of study

Domain Adaptation (DA) techniques aim at enabling machine learning methods learn effective classifiers for a "target'' domain when the only available training data belongs to a different "source'' domain. In this paper we present the Distributional Correspondence Indexing (DCI) method for domain adaptation in sentiment classification. DCI derives term representations in a vector space common to both domains where each dimension reflects its distributional correspondence to a pivot, i.e., to a highly predictive term that behaves similarly across domains. Term correspondence is quantified by means of a distributional correspondence function (DCF). We propose a number of efficient DCFs that are motivated by the distributional hypothesis, i.e., the hypothesis according to which terms with similar meaning tend to have similar distributions in text. Experiments show that DCI obtains better performance than current state-of-the-art techniques for cross-lingual and cross-domain sentiment classification. DCI also brings about a significantly reduced computational cost, and requires a smaller amount of human intervention. As a final contribution, we discuss a more challenging formulation of the domain adaptation problem, in which both the cross-domain and cross-lingual dimensions are tackled simultaneously

Open Access Repository

Distributional Correspondence Indexing for Cross-Lingual and Cross-Domain Sentiment Classification.

Author
Publication venue: 'AI Access Foundation'
Publication date
Field of study

Crossref