Search CORE

6 research outputs found

Disembodied Machine Learning: On the Illusion of Objectivity in NLP

Author: Augenstein Isabelle
Bingel Joachim
Lulz Smarika
Waseem Zeerak
Publication venue
Publication date: 01/01/2020
Field of study

Machine Learning seeks to identify and encode bodies of knowledge within provided datasets. However, data encodes subjective content, which determines the possible outcomes of the models trained on it. Because such subjectivity enables marginalisation of parts of society, it is termed (social) `bias' and sought to be removed. In this paper, we contextualise this discourse of bias in the ML community against the subjective choices in the development process. Through a consideration of how choices in data and model development construct subjectivity, or biases that are represented in a model, we argue that addressing and mitigating biases is near-impossible. This is because both data and ML models are objects for which meaning is made in each step of the development pipeline, from data selection over annotation to model training and analysis. Accordingly, we find the prevalent discourse of bias limiting in its ability to address social marginalisation. We recommend to be conscientious of this, and to accept that de-biasing methods only correct for a fraction of biases.Comment: In revie

arXiv.org e-Print Archive

Copenhagen University Research Information System

Automatic Translation of Hate Speech to Non-hate Speech in Social Media Texts

Author: Kolesnikova Olga
Kostiuk Yevhen
Sidorov Grigori
Tonja Atnafu Lambebo
Publication venue
Publication date: 02/06/2023
Field of study

In this paper, we investigate the issue of hate speech by presenting a novel task of translating hate speech into non-hate speech text while preserving its meaning. As a case study, we use Spanish texts. We provide a dataset and several baselines as a starting point for further research in the task. We evaluated our baseline results using multiple metrics, including BLEU scores. The aim of this study is to contribute to the development of more effective methods for reducing the spread of hate speech in online communities

arXiv.org e-Print Archive

Simplifying, reading, and machine translating health content: an empirical investigation of usability

Author: Rossetti Alessandra
Publication venue: Dublin City University. Centre for Translation and Textual Studies (CTTS)
Publication date: 01/11/2019
Field of study

Text simplification, through plain language (PL) or controlled language (CL), is adopted to increase readability, comprehension and machine translatability of (health) content. Cochrane is a non-profit organisation where volunteer authors summarise and simplify health-related English texts on the impact of treatments and interventions into plain language summaries (PLS), which are then disseminated online to the lay audience and translated. Cochrane’s simplification approach is non-automated, and involves the manual checking and implementation of different sets of PL guidelines, which can be an unsatisfactory, challenging and time-consuming task. This thesis examined if using the Acrolinx CL checker to automatically and consistently check PLS for readability and translatability issues would increase the usability of Cochrane’s simplification approach and, more precisely: (i) authors’ satisfaction; and (ii) authors’ effectiveness in terms of readability, comprehensibility, and machine translatability into Spanish. Data on satisfaction were collected from twelve Cochrane authors by means of the System Usability Scale and follow-up preference questions. Readability was analysed through the computational tool Coh-Metrix. Evidence on comprehensibility was gathered through ratings and recall protocols produced by lay readers, both native and non-native speakers of English. Machine translatability was assessed in terms of adequacy and fluency with forty-one Cochrane contributors, all native speakers of Spanish. Authors seemed to welcome the introduction of Acrolinx, and the adoption of this CL checker reduced word length, sentence length, and syntactic complexity. No significant impact on comprehensibility and machine translatability was identified. We observed that reading skills and characteristics other than simplified language (e.g. formatting) might influence comprehension. Machine translation quality was relatively high, with mainly style issues. This thesis presented an environment that could boost volunteer authors’ satisfaction and foster their adoption of simple language. We also discussed strategies to increase the accessibility of online health content among lay readers with different skills and language backgrounds

Irish Universities

DCU Online Research Access Service