Search CORE

1,479 research outputs found

Evaluating Gender Bias in Speech Translation

Author: Basta Christine
Costa-jussà Marta R.
Gállego Gerard I.
Publication venue
Publication date: 02/07/2021
Field of study

The scientific community is increasingly aware of the necessity to embrace pluralism and consistently represent major and minor social groups. Currently, there are no standard evaluation techniques for different types of biases. Accordingly, there is an urgent need to provide evaluation sets and protocols to measure existing biases in our automatic systems. Evaluating the biases should be an essential step towards mitigating them in the systems. This paper introduces WinoST, a new freely available challenge set for evaluating gender bias in speech translation. WinoST is the speech version of WinoMT which is a MT challenge set and both follow an evaluation protocol to measure gender accuracy. Using a state-of-the-art end-to-end speech translation system, we report the gender bias evaluation on four language pairs and we show that gender accuracy in speech translation is more than 23% lower than in MT.Comment: Preprin

arXiv.org e-Print Archive

UPCommons. Portal del coneixement obert de la UPC

Investigating Gender Bias in Machine Translation. A Case Study between English and Italian

Author: Alessandra Luccioli Ester Dolei, Chiara Xausa
Publication venue
Publication date: 01/01/2020
Field of study

Neural machine translation systems have substantially improved the quality of translation output, yet many issues still need to be addressed: one major problem to be addressed concerns the presence of gender bias, the prejudice against one gender based on the perception that women and men are not equal. In this work, we will manually evaluate the translation of a sentence pattern previously employed for similar purposes by Escud\ue9 Font and Costa-juss\ue0 (2019) in the English-Italian language combination using two of the most popular MT systems, DeepL and Google Translate. The sets of sentences include 40 male- and female-dominated occupations and three adjectives, beautiful, wise and strong. The aim of this study is to evaluate gender bias, that becomes apparent when translating from a gender-neutral language to a gender-marked language, and to verify whether adjectives usually associated with female or male entities can affect the final MT output. Furthermore, we provide some relevant insights about gender bias in MT for post-editors and MT users, with a particular focus on the under-representation of women in the Italian language

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Addressing the Blind Spots in Spoken Language Processing

Author: Moryossef Amit
Publication venue
Publication date: 01/09/2023
Field of study

This paper explores the critical but often overlooked role of non-verbal cues, including co-speech gestures and facial expressions, in human communication and their implications for Natural Language Processing (NLP). We argue that understanding human communication requires a more holistic approach that goes beyond textual or spoken words to include non-verbal elements. Borrowing from advances in sign language processing, we propose the development of universal automatic gesture segmentation and transcription models to transcribe these non-verbal cues into textual form. Such a methodology aims to bridge the blind spots in spoken language understanding, enhancing the scope and applicability of NLP models. Through motivating examples, we demonstrate the limitations of relying solely on text-based models. We propose a computationally efficient and flexible approach for incorporating non-verbal cues, which can seamlessly integrate with existing NLP pipelines. We conclude by calling upon the research community to contribute to the development of universal transcription methods and to validate their effectiveness in capturing the complexities of real-world, multi-modal interactions

ZORA

Addressing the Blind Spots in Spoken Language Processing

Author: Moryossef Amit
Publication venue
Publication date: 06/09/2023
Field of study

arXiv.org e-Print Archive

In the Name of Fairness: Assessing the Bias in Clinical Record De-identification

Author: Ghassemi Marzyeh
Lim Shulammite
Pollard Tom Joseph
Xiao Yuxin
Publication venue
Publication date: 02/01/2024
Field of study

Data sharing is crucial for open science and reproducible research, but the legal sharing of clinical data requires the removal of protected health information from electronic health records. This process, known as de-identification, is often achieved through the use of machine learning algorithms by many commercial and open-source systems. While these systems have shown compelling results on average, the variation in their performance across different demographic groups has not been thoroughly examined. In this work, we investigate the bias of de-identification systems on names in clinical notes via a large-scale empirical analysis. To achieve this, we create 16 name sets that vary along four demographic dimensions: gender, race, name popularity, and the decade of popularity. We insert these names into 100 manually curated clinical templates and evaluate the performance of nine public and private de-identification methods. Our findings reveal that there are statistically significant performance gaps along a majority of the demographic dimensions in most methods. We further illustrate that de-identification quality is affected by polysemy in names, gender context, and clinical note characteristics. To mitigate the identified gaps, we propose a simple and method-agnostic solution by fine-tuning de-identification methods with clinical context and diverse names. Overall, it is imperative to address the bias in existing methods immediately so that downstream stakeholders can build high-quality systems to serve all demographic parties fairly.Comment: Accepted by FAccT 2023; updated appendix with the de-identification performance of GPT-

arXiv.org e-Print Archive

Gender in Danger? Evaluating Speech Translation Technology on the MuST-SHE Corpus

Author: Bentivogli Luisa
Cattoni Roldano
Di Gangi Mattia Antonino
Negri Matteo
Savoldi Beatrice
Turchi Marco
Publication venue
Publication date: 10/06/2020
Field of study

Translating from languages without productive grammatical gender like English into gender-marked languages is a well-known difficulty for machines. This difficulty is also due to the fact that the training data on which models are built typically reflect the asymmetries of natural languages, gender bias included. Exclusively fed with textual data, machine translation is intrinsically constrained by the fact that the input sentence does not always contain clues about the gender identity of the referred human entities. But what happens with speech translation, where the input is an audio signal? Can audio provide additional information to reduce gender bias? We present the first thorough investigation of gender bias in speech translation, contributing with: i) the release of a benchmark useful for future studies, and ii) the comparison of different technologies (cascade and end-to-end) on two language directions (English-Italian/French).Comment: 9 pages of content, accepted at ACL 202

arXiv.org e-Print Archive

Archivio della ricerca - Fondazione Bruno Kessler

Gender Bias in Machine Translation and The Era of Large Language Models

Author: Vanmassenhove Eva
Publication venue
Publication date: 18/01/2024
Field of study

This chapter examines the role of Machine Translation in perpetuating gender bias, highlighting the challenges posed by cross-linguistic settings and statistical dependencies. A comprehensive overview of relevant existing work related to gender bias in both conventional Neural Machine Translation approaches and Generative Pretrained Transformer models employed as Machine Translation systems is provided. Through an experiment using ChatGPT (based on GPT-3.5) in an English-Italian translation context, we further assess ChatGPT's current capacity to address gender bias. The findings emphasize the ongoing need for advancements in mitigating bias in Machine Translation systems and underscore the importance of fostering fairness and inclusivity in language technologies.Comment: 24 page

arXiv.org e-Print Archive

Monolingual and Multilingual Reduction of Gender Bias in Contextualized Representations

Author: Dufter Philipp
Liang Sheng
Schütze Hinrich
Publication venue
Publication date: 01/01/2020
Field of study

Crossref

Open Access LMU