27,257 research outputs found
Findings of the Third Workshop on Neural Generation and Translation
This document describes the findings of the Third Workshop on Neural
Generation and Translation, held in concert with the annual conference of the
Empirical Methods in Natural Language Processing (EMNLP 2019). First, we
summarize the research trends of papers presented in the proceedings. Second,
we describe the results of the two shared tasks 1) efficient neural machine
translation (NMT) where participants were tasked with creating NMT systems that
are both accurate and efficient, and 2) document-level generation and
translation (DGT) where participants were tasked with developing systems that
generate summaries from structured data, potentially with assistance from text
in another language.Comment: Fixed the metadata (author list
Referenceless Quality Estimation for Natural Language Generation
Traditional automatic evaluation measures for natural language generation
(NLG) use costly human-authored references to estimate the quality of a system
output. In this paper, we propose a referenceless quality estimation (QE)
approach based on recurrent neural networks, which predicts a quality score for
a NLG system output by comparing it to the source meaning representation only.
Our method outperforms traditional metrics and a constant baseline in most
respects; we also show that synthetic data helps to increase correlation
results by 21% compared to the base system. Our results are comparable to
results obtained in similar QE tasks despite the more challenging setting.Comment: Accepted as a regular paper to 1st Workshop on Learning to Generate
Natural Language (LGNL), Sydney, 10 August 201
Does Multimodality Help Human and Machine for Translation and Image Captioning?
This paper presents the systems developed by LIUM and CVC for the WMT16
Multimodal Machine Translation challenge. We explored various comparative
methods, namely phrase-based systems and attentional recurrent neural networks
models trained using monomodal or multimodal data. We also performed a human
evaluation in order to estimate the usefulness of multimodal data for human
machine translation and image description generation. Our systems obtained the
best results for both tasks according to the automatic evaluation metrics BLEU
and METEOR.Comment: 7 pages, 2 figures, v4: Small clarification in section 4 title and
conten
Identifying Semantic Divergences in Parallel Text without Annotations
Recognizing that even correct translations are not always semantically
equivalent, we automatically detect meaning divergences in parallel sentence
pairs with a deep neural model of bilingual semantic similarity which can be
trained for any parallel corpus without any manual annotation. We show that our
semantic model detects divergences more accurately than models based on surface
features derived from word alignments, and that these divergences matter for
neural machine translation.Comment: Accepted as a full paper to NAACL 201
RankME: Reliable Human Ratings for Natural Language Generation
Human evaluation for natural language generation (NLG) often suffers from
inconsistent user ratings. While previous research tends to attribute this
problem to individual user preferences, we show that the quality of human
judgements can also be improved by experimental design. We present a novel
rank-based magnitude estimation method (RankME), which combines the use of
continuous scales and relative assessments. We show that RankME significantly
improves the reliability and consistency of human ratings compared to
traditional evaluation methods. In addition, we show that it is possible to
evaluate NLG systems according to multiple, distinct criteria, which is
important for error analysis. Finally, we demonstrate that RankME, in
combination with Bayesian estimation of system quality, is a cost-effective
alternative for ranking multiple NLG systems.Comment: Accepted to NAACL 2018 (The 2018 Conference of the North American
Chapter of the Association for Computational Linguistics
- …