5 research outputs found
Automatic Reference-Based Evaluation of Pronoun Translation Misses the Point
We compare the performance of the APT and AutoPRF metrics for pronoun
translation against a manually annotated dataset comprising human judgements as
to the correctness of translations of the PROTEST test suite. Although there is
some correlation with the human judgements, a range of issues limit the
performance of the automated metrics. Instead, we recommend the use of
semi-automatic metrics and test suites in place of fully automatic metrics.Comment: EMNLP 201
Arabic and English Relative Clauses and Machine Translation Challenges
The study aims at performing an error analysis as well as providing an evaluation of the quality of neural machine translation (NMT) represented by Google Translate when translating relative clauses. The study uses two test suites are composed of sentences that contain relative clauses. The first test suite composes of 108 pair sentences that are translated from English to Arabic whereas the second composes of 72 Arabic sentences that are translated into English. Errors annotation is performed by 6 professional annotators. The study presents a list of the annotated errors divided into accuracy and fluency errors that occur based on MQM. Manual evaluation is also performed by the six professionals along with a BLEU automatic evaluation using the Tilde Me platform. The results show that fluency errors are more frequent than accuracy errors. They also show that the frequency of errors and MT quality when translating from English into Arabic is lower than the frequency of errors and MT quality when translating from Arabic into English is also presented. Based on the performed error analysis and both manual and automatic evaluation, it is pointed out that the gap between MT and professional human translation is still large
Toward Gender-Inclusive Coreference Resolution
Correctly resolving textual mentions of people fundamentally entails making
inferences about those people. Such inferences raise the risk of systemic
biases in coreference resolution systems, including biases that can harm binary
and non-binary trans and cis stakeholders. To better understand such biases, we
foreground nuanced conceptualizations of gender from sociology and
sociolinguistics, and develop two new datasets for interrogating bias in crowd
annotations and in existing coreference resolution systems. Through these
studies, conducted on English text, we confirm that without acknowledging and
building systems that recognize the complexity of gender, we build systems that
lead to many potential harms.Comment: 28 pages; ACL versio