Search CORE

650 research outputs found

Optimising Multiple Metrics with MERT

Author: Schwenk Holger
Servan Christophe
Publication venue: HAL CCSD
Publication date: 01/01/2011
Field of study

International audienceThe main metric used for SMT systems evaluation an optimisation is BLEU score but this metric is questioned about its relevance to human evaluation. Some other metrics already exist but none of them are in perfect harmony with human evaluation. On the other hand, most evaluations use multiple metrics (BLEU, TER, METEOR, etc.). Systems can optimise toward other metrics than BLEU. But optimisation with other metrics tends to decrease BLEU score. As Machine Translation evaluations still use BLEU as main metric, it is important to min-imise the decrease of BLEU. We propose to optimise toward a metric combination like BLEU-TER. This proposition includes two new open source scorers for MERT, the SMT optimisation tool. The first one is a TER scorer that allows us to optimise toward TER; the second one is a combination scorer. The latter one enables the combination of two or more metrics for the optimisation process. This paper also presents some experiments on the MERT optimisation in the Statistical Machine Translation system Moses with the TER and the BLEU metrics and some metric combinations

HAL Descartes

Masked Language Model Scoring

Author: Kirchhoff Katrin
Liang Davis
Nguyen Toan Q.
Salazar Julian
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2020
Field of study

Pretrained masked language models (MLMs) require finetuning for most NLP tasks. Instead, we evaluate MLMs out of the box via their pseudo-log-likelihood scores (PLLs), which are computed by masking tokens one by one. We show that PLLs outperform scores from autoregressive language models like GPT-2 in a variety of tasks. By rescoring ASR and NMT hypotheses, RoBERTa reduces an end-to-end LibriSpeech model's WER by 30% relative and adds up to +1.7 BLEU on state-of-the-art baselines for low-resource translation pairs, with further gains from domain adaptation. We attribute this success to PLL's unsupervised expression of linguistic acceptability without a left-to-right bias, greatly improving on scores from GPT-2 (+10 points on island effects, NPI licensing in BLiMP). One can finetune MLMs to give scores without masking, enabling computation in a single inference pass. In all, PLLs and their associated pseudo-perplexities (PPPLs) enable plug-and-play use of the growing number of pretrained MLMs; e.g., we use a single cross-lingual model to rescore translations in multiple languages. We release our library for language model scoring at https://github.com/awslabs/mlm-scoring.Comment: ACL 2020 camera-ready (presented July 2020

arXiv.org e-Print Archive

Crossref

Recent Trends in Computational Intelligence

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

Traditional models struggle to cope with complexity, noise, and the existence of a changing environment, while Computational Intelligence (CI) offers solutions to complicated problems as well as reverse problems. The main feature of CI is adaptability, spanning the fields of machine learning and computational neuroscience. CI also comprises biologically-inspired technologies such as the intellect of swarm as part of evolutionary computation and encompassing wider areas such as image processing, data collection, and natural language processing. This book aims to discuss the usage of CI for optimal solving of various applications proving its wide reach and relevance. Bounding of optimization methods and data mining strategies make a strong and reliable prediction tool for handling real-life applications

Directory of Open Access Books (DOAB)

Findings of the IWSLT 2022 Evaluation Campaign.

Author: Alexander Waibel
Anna Currey
Antonios Anastasopoulos
Barry Haddow
Benjamin Hsu
Changhan Wang
Christian Federmann
Clara Emmanuel
Dávid Javorský
Elizabeth Salesky
Georgiana Dinu
Hongyu Gong
Jan Niehues
Jiatong Shi
John Ortega
Juan Pino
Katsuhito Sudoh
Kenton Murray
Kevin Duh
Loc Barrault
Luisa Bentivogli
Maha Elbayad
Marcello Federico
Marcely Zanon Boito
Marco Turchi
Maria Nǎdejde
Matteo Negri
Matthias Sperber
Ondřej Bojar
Paul McNamee
Prashant Mathur
Roldano Cattoni
Roman Grundkiewicz
Satoshi Nakamura
Sebastian Stüker
Shinji Watanabe
Souhir Gahbiche
Surafel Lakew
Vĕra Kloudová
Xing Niu
Xutai Ma
Yannick Estève
Yogesh Virkar
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2022
Field of study

The evaluation campaign of the 19th International Conference on Spoken Language Translation featured eight shared tasks: (i) Simultaneous speech translation, (ii) Offline speech translation, (iii) Speech to speech translation, (iv) Low-resource speech translation, (v) Multilingual speech translation, (vi) Dialect speech translation, (vii) Formality control for speech translation, (viii) Isometric speech translation. A total of 27 teams participated in at least one of the shared tasks. This paper details, for each shared task, the purpose of the task, the data that were released, the evaluation metrics that were applied, the submissions that were received and the results that were achieved

Archivio della ricerca - Fondazione Bruno Kessler

A Systematic Study of Performance Disparities in Multilingual Task-Oriented Dialogue Systems

Author: Gritta Milan
Hu Songbo
Iacobacci Ignacio
Korhonen Anna
Vulić Ivan
Yuan Moy
Zhang Guchun
Zhou Han
Publication venue
Publication date: 19/10/2023
Field of study

Achieving robust language technologies that can perform well across the world's many languages is a central goal of multilingual NLP. In this work, we take stock of and empirically analyse task performance disparities that exist between multilingual task-oriented dialogue (ToD) systems. We first define new quantitative measures of absolute and relative equivalence in system performance, capturing disparities across languages and within individual languages. Through a series of controlled experiments, we demonstrate that performance disparities depend on a number of factors: the nature of the ToD task at hand, the underlying pretrained language model, the target language, and the amount of ToD annotated data. We empirically prove the existence of the adaptation and intrinsic biases in current ToD systems: e.g., ToD systems trained for Arabic or Turkish using annotated ToD data fully parallel to English ToD data still exhibit diminished ToD task performance. Beyond providing a series of insights into the performance disparities of ToD systems in different languages, our analyses offer practical tips on how to approach ToD data collection and system development for new languages.Comment: Accepted to EMNLP 202

arXiv.org e-Print Archive

Findings of the IWSLT 2022 Evaluation Campaign

Author: Anastasopoulos Antonios
Barrault Loı̈c
Bentivogli Luisa
Bojar Ondřej
Cattoni Roldano
Currey Anna
Dinu Georgiana
Duh Kevin
Elbayad Maha
Emmanuel Clara
Estève Yannick
Federico Marcello
Federmann Christian
Gahbiche Souhir
Gong Hongyu
Grundkiewicz Roman
Haddow Barry
Hsu Benjamin
Javorský Dávid
Kloudová Vĕra
Lakew Surafel
Ma Xutai
Mathur Prashant
McNamee Paul
Murray Kenton
Nakamura Satoshi
Negri Matteo
Niehues Jan
Niu Xing
Nǎdejde Maria
Ortega John
Pino Juan
Salesky Elizabeth
Shi Jiatong
Sperber Matthias
Stüker Sebastian
Sudoh Katsuhito
Turchi Marco
Virkar Yogesh
Waibel Alexander
Wang Changhan
Watanabe Shinji
Zanon Boito Marcely
Publication venue: Association for Computational Linguistics
Publication date: 21/06/2022
Field of study

KITopen

Comparative Evaluation of Translation Memory (TM) and Machine Translation (MT) Systems in Translation between Arabic and English

Author: KHALED MILAD
Publication venue: 'Swansea University'
Publication date: 01/01/2021
Field of study

In general, advances in translation technology tools have enhanced translation quality significantly. Unfortunately, however, it seems that this is not the case for all language pairs. A concern arises when the users of translation tools want to work between different language families such as Arabic and English. The main problems facing ArabicEnglish translation tools lie in Arabic’s characteristic free word order, richness of word inflection – including orthographic ambiguity – and optionality of diacritics, in addition to a lack of data resources. The aim of this study is to compare the performance of translation memory (TM) and machine translation (MT) systems in translating between Arabic and English.The research evaluates the two systems based on specific criteria relating to needs and expected results. The first part of the thesis evaluates the performance of a set of well-known TM systems when retrieving a segment of text that includes an Arabic linguistic feature. As it is widely known that TM matching metrics are based solely on the use of edit distance string measurements, it was expected that the aforementioned issues would lead to a low match percentage. The second part of the thesis evaluates multiple MT systems that use the mainstream neural machine translation (NMT) approach to translation quality. Due to a lack of training data resources and its rich morphology, it was anticipated that Arabic features would reduce the translation quality of this corpus-based approach. The systems’ output was evaluated using both automatic evaluation metrics including BLEU and hLEPOR, and TAUS human quality ranking criteria for adequacy and fluency.The study employed a black-box testing methodology to experimentally examine the TM systems through a test suite instrument and also to translate Arabic English sentences to collect the MT systems’ output. A translation threshold was used to evaluate the fuzzy matches of TM systems, while an online survey was used to collect participants’ responses to the quality of MT system’s output. The experiments’ input of both systems was extracted from ArabicEnglish corpora, which was examined by means of quantitative data analysis. The results show that, when retrieving translations, the current TM matching metrics are unable to recognise Arabic features and score them appropriately. In terms of automatic translation, MT produced good results for adequacy, especially when translating from Arabic to English, but the systems’ output appeared to need post-editing for fluency. Moreover, when retrievingfrom Arabic, it was found that short sentences were handled much better by MT than by TM. The findings may be given as recommendations to software developers

Cronfa at Swansea University

A study of the translation of sentiment in user-generated text

Author: Saadany Hadeel
Publication venue: University of Wolverhampton
Publication date: 01/10/2022
Field of study

A thesis submitted in partial ful filment of the requirements of the University of Wolverhampton for the degree of Doctor of Philosophy.Emotions are biological states of feeling that humans may verbally express to communicate their negative or positive mood, influence others, or even afflict harm. Although emotions such as anger, happiness, affection, or fear are supposedly universal experiences, the lingual realisation of the emotional experience may vary in subtle ways across different languages. For this reason, preserving the original sentiment of the source text has always been a challenging task that draws in a translator's competence and fi nesse. In the professional translation industry, an incorrect translation of the sentiment-carrying lexicon is considered a critical error as it can be either misleading or in some cases harmful since it misses the fundamental aspect of the source text, i.e. the author's sentiment. Since the advent of Neural Machine Translation (NMT), there has been a tremendous improvement in the quality of automatic translation. This has lead to an extensive use of NMT online tools to translate User-Generated Text (UGT) such as reviews, tweets, and social media posts, where the main message is often the author's positive or negative attitude towards an entity. In such scenarios, the process of translating the user's sentiment is entirely automatic with no human intervention, neither for post-editing nor for accuracy checking. However, NMT output still lacks accuracy in some low-resource languages and sometimes makes critical translation errors that may not only distort the sentiment but at times flips the polarity of the source text to its exact opposite. In this thesis, we tackle the translation of sentiment in UGT by NMT systems from two perspectives: analytical and experimental. First, the analytical approach introduces a list of linguistic features that can lead to a mistranslation of ne-grained emotions between different language pairs in the UGT domain. It also presents an error-typology specifi c to Arabic UGT illustrating the main linguistic phenomena that can cause mistranslation of sentiment polarity when translating Arabic UGT into English by NMT systems. Second, the experimental approach attempts to improve the translation of sentiment by addressing some of the linguistic challenges identifi ed in the analysis as causing mistranslation of sentiment both on the word-level and on the sentence-level. On the word-level, we propose a Transformer NMT model trained on a sentiment-oriented vector space model (VSM) of UGT data that is capable of translating the correct sentiment polarity of challenging contronyms. On the sentence-level, we propose a semi-supervised approach to overcome the problem of translating sentiment expressed by dialectical language in UGT data. We take the translation of dialectical Arabic UGT into English as a case study. Our semi-supervised AR-EN NMT model shows improved performance over the online MT Twitter tool in translating dialectical Arabic UGT not only in terms of translation quality but also in the preservation of the sentiment polarity of the source text. The experimental section also presents an empirical method to quantify the notion of sentiment transfer by an MT system and, more concretely, to modify automatic metrics such that its MT ranking comes closer to a human judgement of a poor or good translation of sentiment

Wolverhampton Intellectual Repository and E-theses