Search CORE

27 research outputs found

Getting Gender Right in Neural Machine Translation

Author: Hardmeier Christian
Vanmassenhove Eva
Way Andy
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2018
Field of study

Speakers of different languages must attend to and encode strikingly different aspects of the world in order to use their language correctly (Sapir, 1921; Slobin, 1996). One such difference is related to the way gender is expressed in a language. Saying “I am happy” in English, does not encode any additional knowledge of the speaker that uttered the sentence. However, many other languages do have grammatical gender systems and so such knowledge would be encoded. In order to correctly translate such a sentence into, say, French, the inherent gender information needs to be retained/recovered. The same sentence would become either “Je suis heureux”, for a male speaker or “Je suis heureuse” for a female one. Apart from morphological agreement, demographic factors (gender, age, etc.) also influence our use of language in terms of word choices or even on the level of syntactic constructions (Tannen, 1991; Pennebaker et al., 2003). We integrate gender information into NMT systems. Our contribution is twofold: (1) the compilation of large datasets with speaker information for 20 language pairs, and (2) a simple set of experiments that incorporate gender information into NMT for multiple language pairs. Our experiments show that adding a gender feature to an NMT system significantly improves the translation quality for some language pairs

arXiv.org e-Print Archive

Crossref

Irish Universities

Publikationer från Uppsala Universitet

Edinburgh Research Explorer

Digitala Vetenskapliga Arkivet - Academic Archive On-line

DCU Online Research Access Service

GENder-IT: An Annotated English-Italian Parallel Challenge Set for Cross-Linguistic Natural Gender Phenomena

Author: Eva Vanmassenhove
Monti Johanna
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2021
Field of study

Università degli Studi di Napoli L'Orientale: CINECA IRIS

Joint translation and unit conversion for end-to-end localization

Author: Al-Onaizan Yaser
Dinu Georgiana
Federico Marcello
Lauly Stanislas
Mathur Prashant
Publication venue
Publication date: 01/01/2020
Field of study

A variety of natural language tasks require processing of textual data which contains a mix of natural language and formal languages such as mathematical expressions. In this paper, we take unit conversions as an example and propose a data augmentation technique which leads to models learning both translation and conversion tasks as well as how to adequately switch between them for end-to-end localization

arXiv.org e-Print Archive

Crossref

Disembodied Machine Learning: On the Illusion of Objectivity in NLP

Author: Augenstein Isabelle
Bingel Joachim
Lulz Smarika
Waseem Zeerak
Publication venue
Publication date: 01/01/2020
Field of study

Machine Learning seeks to identify and encode bodies of knowledge within provided datasets. However, data encodes subjective content, which determines the possible outcomes of the models trained on it. Because such subjectivity enables marginalisation of parts of society, it is termed (social) `bias' and sought to be removed. In this paper, we contextualise this discourse of bias in the ML community against the subjective choices in the development process. Through a consideration of how choices in data and model development construct subjectivity, or biases that are represented in a model, we argue that addressing and mitigating biases is near-impossible. This is because both data and ML models are objects for which meaning is made in each step of the development pipeline, from data selection over annotation to model training and analysis. Accordingly, we find the prevalent discourse of bias limiting in its ability to address social marginalisation. We recommend to be conscientious of this, and to accept that de-biasing methods only correct for a fraction of biases.Comment: In revie

arXiv.org e-Print Archive

Copenhagen University Research Information System

Gender Representation in Open Source Speech Resources

Author: Besacier Laurent
Garnerin Mahault
Rossato Solange
Publication venue
Publication date: 18/03/2020
Field of study

With the rise of artificial intelligence (AI) and the growing use of deep-learning architectures, the question of ethics, transparency and fairness of AI systems has become a central concern within the research community. We address transparency and fairness in spoken language systems by proposing a study about gender representation in speech resources available through the Open Speech and Language Resource platform. We show that finding gender information in open source corpora is not straightforward and that gender balance depends on other corpus characteristics (elicited/non elicited speech, low/high resource language, speech task targeted). The paper ends with recommendations about metadata and gender information for researchers in order to assure better transparency of the speech systems built using such corpora.Comment: accepted to LREC202

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes