Search CORE

130 research outputs found

Results of the WMT19 metrics shared task: segment-level and strong MT systems pose big challenges

Author: Bojar Ondřej
Graham Yvette
Ma Qingsong
Wei Johnny Tian-Zheng
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/08/2019
Field of study

This paper presents the results of the WMT19 Metrics Shared Task. Participants were asked to score the outputs of the translations systems competing in the WMT19 News Translation Task with automatic metrics. 13 research groups submitted 24 metrics, 10 of which are reference-less "metrics" and constitute submissions to the joint task with WMT19 Quality Estimation Task, "QE as a Metric". In addition, we computed 11 baseline metrics, with 8 commonly applied baselines (BLEU, SentBLEU, NIST, WER, PER, TER, CDER, and chrF) and 3 reimplementations (chrF+, sacreBLEU-BLEU, and sacreBLEU-chrF). Metrics were evaluated on the system level, how well a given metric correlates with the WMT19 official manual ranking, and segment level, how well the metric correlates with human judgements of segment quality. This year, we use direct assessment (DA) as our only form of manual evaluation

Irish Universities

DCU Online Research Access Service

Permutation forests for modeling word order in machine translation

Author: Stanojević M.
Publication venue
Publication date: 01/01/2017
Field of study

International Migration, Integration and Social Cohesion online publications

NLG-Metricverse: An End-to-End Library for Evaluating Natural Language Generation

Author: Andrea Zammarchi
Antonella Carbonaro
Giacomo Frisoni
Gianluca Moro
Marco Avagnano
Publication venue: International Committee on Computational Linguistics
Publication date: 01/01/2022
Field of study

Driven by deep learning breakthroughs, natural language generation (NLG) models have been at the center of steady progress in the last few years, with a ubiquitous task influence. However, since our ability to generate human-indistinguishable artificial text lags behind our capacity to assess it, it is paramount to develop and apply even better automatic evaluation metrics. To facilitate researchers to judge the effectiveness of their models broadly, we introduce NLG-Metricverse—an end-to-end open-source library for NLG evaluation based on Python. Our framework provides a living collection of NLG metrics in a unified and easy-to-use environment, supplying tools to efficiently apply, analyze, compare, and visualize them. This includes (i) the extensive support to heterogeneous automatic metrics with n-arity management, (ii) the meta-evaluation upon individual performance, metric-metric and metric-human correlations, (iii) graphical interpretations for helping humans better gain score intuitions, (iv) formal categorization and convenient documentation to accelerate metrics understanding. NLG-Metricverse aims to increase the comparability and replicability of NLG research, hopefully stimulating new contributions in the area

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

A Survey on Machine Reading Comprehension: Tasks, Evaluation Metrics, and Benchmark Datasets

Author: Hu Jianjun
Hu Jie
Li Qin
Li Shaobo
Zeng Chengchang
Publication venue
Publication date: 21/06/2020
Field of study

Machine Reading Comprehension (MRC) is a challenging NLP research field with wide real world applications. The great progress of this field in recent years is mainly due to the emergence of large-scale datasets and deep learning. At present, a lot of MRC models have already surpassed the human performance on many datasets despite the obvious giant gap between existing MRC models and genuine human-level reading comprehension. This shows the need of improving existing datasets, evaluation metrics and models to move the MRC models toward 'real' understanding. To address this lack of comprehensive survey of existing MRC tasks, evaluation metrics and datasets, herein, (1) we analyzed 57 MRC tasks and datasets; proposed a more precise classification method of MRC tasks with 4 different attributes (2) we summarized 9 evaluation metrics of MRC tasks and (3) 7 attributes and 10 characteristics of MRC datasets; (4) We also discussed some open issues in MRC research and highlight some future research directions. In addition, to help the community, we have collected, organized, and published our data on a companion website(https://mrc-datasets.github.io/) where MRC researchers could directly access each MRC dataset, papers, baseline projects and browse the leaderboard.Comment: 59 page

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

ProseBot: Expanding Textual Diversity through Natural Language Processing

Author: Alexandra Matilde Santos Ferreira
Publication venue
Publication date: 18/07/2023
Field of study

Repositório Aberto da Universidade do Porto