4,949 research outputs found
An Analysis of Source-Side Grammatical Errors in NMT
The quality of Neural Machine Translation (NMT) has been shown to
significantly degrade when confronted with source-side noise. We present the
first large-scale study of state-of-the-art English-to-German NMT on real
grammatical noise, by evaluating on several Grammar Correction corpora. We
present methods for evaluating NMT robustness without true references, and we
use them for extensive analysis of the effects that different grammatical
errors have on the NMT output. We also introduce a technique for visualizing
the divergence distribution caused by a source-side error, which allows for
additional insights.Comment: Accepted and to be presented at BlackboxNLP 201
Referenceless Quality Estimation for Natural Language Generation
Traditional automatic evaluation measures for natural language generation
(NLG) use costly human-authored references to estimate the quality of a system
output. In this paper, we propose a referenceless quality estimation (QE)
approach based on recurrent neural networks, which predicts a quality score for
a NLG system output by comparing it to the source meaning representation only.
Our method outperforms traditional metrics and a constant baseline in most
respects; we also show that synthetic data helps to increase correlation
results by 21% compared to the base system. Our results are comparable to
results obtained in similar QE tasks despite the more challenging setting.Comment: Accepted as a regular paper to 1st Workshop on Learning to Generate
Natural Language (LGNL), Sydney, 10 August 201
System Combination via Quality Estimation for Grammatical Error Correction
Quality estimation models have been developed to assess the corrections made
by grammatical error correction (GEC) models when the reference or
gold-standard corrections are not available. An ideal quality estimator can be
utilized to combine the outputs of multiple GEC systems by choosing the best
subset of edits from the union of all edits proposed by the GEC base systems.
However, we found that existing GEC quality estimation models are not good
enough in differentiating good corrections from bad ones, resulting in a low
F0.5 score when used for system combination. In this paper, we propose GRECO, a
new state-of-the-art quality estimation model that gives a better estimate of
the quality of a corrected sentence, as indicated by having a higher
correlation to the F0.5 score of a corrected sentence. It results in a combined
GEC system with a higher F0.5 score. We also propose three methods for
utilizing GEC quality estimation models for system combination with varying
generality: model-agnostic, model-agnostic with voting bias, and
model-dependent method. The combined GEC system outperforms the state of the
art on the CoNLL-2014 test set and the BEA-2019 test set, achieving the highest
F0.5 scores published to date.Comment: EMNLP 202
- …