Search CORE

728 research outputs found

Cross-lingual Transfer Can Worsen Bias in Sentiment Analysis

Author: Goldfarb-Tarrant Seraphina
Lopez Adam
Ross Björn
Publication venue
Publication date: 06/12/2023
Field of study

Sentiment analysis (SA) systems are widely deployed in many of the world's languages, and there is well-documented evidence of demographic bias in these systems. In languages beyond English, scarcer training data is often supplemented with transfer learning using pre-trained models, including multilingual models trained on other languages. In some cases, even supervision data comes from other languages. Does cross-lingual transfer also import new biases? To answer this question, we use counterfactual evaluation to test whether gender or racial biases are imported when using cross-lingual transfer, compared to a monolingual transfer setting. Across five languages, we find that systems using cross-lingual transfer usually become more biased than their monolingual counterparts. We also find racial biases to be much more prevalent than gender biases. To spur further research on this topic, we release the sentiment models we used for this study, and the intermediate checkpoints throughout training, yielding 1,525 distinct models; we also release our evaluation code

Edinburgh Research Explorer

Bleaching Text: Abstract Features for Cross-lingual Gender Prediction

Author: Ljubesi Nikola
Matroos Ian
Nissim Malvina
Plank Barbara
van der Goot Rob
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2018
Field of study

Gender prediction has typically focused on lexical and social network features, yielding good performance, but making systems highly language-, topic-, and platform-dependent. Cross-lingual embeddings circumvent some of these limitations, but capture gender-specific style less. We propose an alternative: bleaching text, i.e., transforming lexical strings into more abstract features. This study provides evidence that such features allow for better transfer across languages. Moreover, we present a first study on the ability of humans to perform cross-lingual gender prediction. We find that human predictive power proves similar to that of our bleached models, and both perform better than lexical models.Comment: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistic

arXiv.org e-Print Archive

Crossref

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

The IT University of Copenhagen's Repository

Dissertations of the University of Groningen

Cross-lingual Offensive Language Detection: A Systematic Review of Datasets, Transfer Approaches and Challenges

Author: Jiang Aiqi
Zubiaga Arkaitz
Publication venue
Publication date: 17/01/2024
Field of study

The growing prevalence and rapid evolution of offensive language in social media amplify the complexities of detection, particularly highlighting the challenges in identifying such content across diverse languages. This survey presents a systematic and comprehensive exploration of Cross-Lingual Transfer Learning (CLTL) techniques in offensive language detection in social media. Our study stands as the first holistic overview to focus exclusively on the cross-lingual scenario in this domain. We analyse 67 relevant papers and categorise these studies across various dimensions, including the characteristics of multilingual datasets used, the cross-lingual resources employed, and the specific CLTL strategies implemented. According to "what to transfer", we also summarise three main CLTL transfer approaches: instance, feature, and parameter transfer. Additionally, we shed light on the current challenges and future research opportunities in this field. Furthermore, we have made our survey resources available online, including two comprehensive tables that provide accessible references to the multilingual datasets and CLTL methods used in the reviewed literature.Comment: 35 pages, 7 figure

arXiv.org e-Print Archive

How Different is Stereotypical Bias in Different Languages?

Author: Öztürk Ibrahim Tolga
Publication venue
Publication date: 28/09/2022
Field of study

Open Access LMU

SexWEs: Domain-Aware Word Embeddings via Cross-Lingual Semantic Specialisation for Chinese Sexism Detection in Social Media

Author: Jiang A
Seventeenth International AAAI Conference on Web and Social Media
Zubiaga A
Publication venue: Association for the Advancement of Artificial Intelligence (AAAI)
Publication date: 02/06/2023
Field of study

Queen Mary Research Online