21 research outputs found
Good, but not always Fair: An Evaluation of Gender Bias for three Commercial Machine Translation Systems
Machine Translation (MT) continues to make significant strides in quality and is increasingly adopted on a larger scale. Consequently, analyses have been redirected to more nuanced aspects, intricate phenomena, as well as potential risks that may arise from the widespread use of MT tools. Along this line, this paper offers a meticulous assessment of three commercial MT systems - Google Translate, DeepL, and Modern MT - with a specific focus on gender translation and bias. For three language pairs (English-Spanish, English-Italian, and English-French), we scrutinize the behavior of such systems at several levels of granularity and on a variety of naturally occurring gender phenomena in translation. Our study takes stock of the current state of online MT tools, by revealing significant discrepancies in the gender translation of the three systems, with each system displaying varying degrees of bias despite their overall translation quality
Good, but not always Fair: An Evaluation of Gender Bias for three commercial Machine Translation Systems
Machine Translation (MT) continues to make significant strides in quality and
is increasingly adopted on a larger scale. Consequently, analyses have been
redirected to more nuanced aspects, intricate phenomena, as well as potential
risks that may arise from the widespread use of MT tools. Along this line, this
paper offers a meticulous assessment of three commercial MT systems - Google
Translate, DeepL, and Modern MT - with a specific focus on gender translation
and bias. For three language pairs (English/Spanish, English/Italian, and
English/French), we scrutinize the behavior of such systems at several levels
of granularity and on a variety of naturally occurring gender phenomena in
translation. Our study takes stock of the current state of online MT tools, by
revealing significant discrepancies in the gender translation of the three
systems, with each system displaying varying degrees of bias despite their
overall translation quality.Comment: Under review at HERMES Journa
Test Suites Task: Evaluation of Gender Fairness in MT with MuST-SHE and INES
As part of the WMT-2023 “Test suites” shared task, in this paper we summarize the results of two test suites evaluations: MuST-SHEWMT23 and INES. By focusing on the en-de and de-en language pairs, we rely on these newly created test suites to investigate systems’ ability to translate feminine and masculine gender and produce gender-inclusive translations. Furthermore we discuss metrics associated with our test suites and validate them by means of human evaluations. Our results indicate that systems achieve reasonable and comparable performance in correctly translating both feminine and masculine gender forms for naturalistic gender phenomena. Instead, the generation of inclusive language forms in translation emerges as a challenging task for all the evaluated MT models, indicating room for future improvements and research on the topic. We make MuST-SHEWMT23 and INES freely available
Test Suites Task: Evaluation of Gender Fairness in MT with MuST-SHE and INES
As part of the WMT-2023 "Test suites" shared task, in this paper we summarize
the results of two test suites evaluations: MuST-SHE-WMT23 and INES. By
focusing on the en-de and de-en language pairs, we rely on these newly created
test suites to investigate systems' ability to translate feminine and masculine
gender and produce gender-inclusive translations. Furthermore we discuss
metrics associated with our test suites and validate them by means of human
evaluations. Our results indicate that systems achieve reasonable and
comparable performance in correctly translating both feminine and masculine
gender forms for naturalistic gender phenomena. Instead, the generation of
inclusive language forms in translation emerges as a challenging task for all
the evaluated MT models, indicating room for future improvements and research
on the topic.Comment: Accepted at WMT 202
On the Dynamics of Gender Learning in Speech Translation
Due to the complexity of bias and the opaque nature of current neural approaches, there is a rising interest in auditing language technologies. In this work, we contribute to such a line of inquiry by exploring the emergence of gender bias in Speech Translation (ST). As a new perspective, rather than focusing on the final systems only, we examine their evolution over the course of training. In this way, we are able to account for different variables related to the learning dynamics of gender translation, and investigate when and how gender divides emerge in ST. Accordingly, for three language pairs (en ? es, fr, it) we compare how ST systems behave for masculine and feminine translation at several levels of granularity. We find that masculine and feminine curves are dissimilar, with the feminine one being characterized by more erratic behaviour and late improvements over the course of training. Also, depending on the considered phenomena, their learning trends can be either antiphase or parallel. Overall, we show how such a progressive analysis can inform on the reliability and time-wise acquisition of gender, which is concealed by static evaluations and standard metrics
Gender Neutralization for an Inclusive Machine Translation: from Theoretical Foundations to Open Challenges
Gender inclusivity in language technologies has become a prominent research topic. In this study, we explore gender-neutral translation (GNT) as a form of gender inclusivity and a goal to be achieved by machine translation (MT) models, which have been found to perpetuate gender bias and discrimination. Specifically, we focus on translation from English into Italian, a language pair representative of salient gender-related linguistic transfer problems. To define GNT, we review a selection of relevant institutional guidelines for gender-inclusive language, discuss its scenarios of use, and examine the technical challenges of performing GNT in MT, concluding with a discussion of potential solutions to encourage advancements toward greater inclusivity in MT
Good, but not always Fair: An Evaluation of Gender Bias for three Commercial Machine Translation Systems
Machine Translation (MT) continues to make significant strides in quality and is increasingly adopted on a larger scale. Consequently, analyses have been redirected to more nuanced aspects, intricate phenomena, as well as potential risks that may arise from the widespread use of MT tools. Along this line, this paper offers a meticulous assessment of three commercial MT systems - Google Translate, DeepL, and Modern MT - with a specific focus on gender translation and bias. For three language pairs (English-Spanish, English-Italian, and English-French), we scrutinize the behavior of such systems at several levels of granularity and on a variety of naturally occurring gender phenomena in translation. Our study takes stock of the current state of online MT tools, by revealing significant discrepancies in the gender translation of the three systems, with each system displaying varying degrees of bias despite their overall translation quality
Gender in Danger? Evaluating Speech Translation Technology on the MuST-SHE Corpus
Translating from languages without productive grammatical gender like English
into gender-marked languages is a well-known difficulty for machines. This
difficulty is also due to the fact that the training data on which models are
built typically reflect the asymmetries of natural languages, gender bias
included. Exclusively fed with textual data, machine translation is
intrinsically constrained by the fact that the input sentence does not always
contain clues about the gender identity of the referred human entities. But
what happens with speech translation, where the input is an audio signal? Can
audio provide additional information to reduce gender bias? We present the
first thorough investigation of gender bias in speech translation, contributing
with: i) the release of a benchmark useful for future studies, and ii) the
comparison of different technologies (cascade and end-to-end) on two language
directions (English-Italian/French).Comment: 9 pages of content, accepted at ACL 202