26,666 research outputs found
System Combination via Quality Estimation for Grammatical Error Correction
Quality estimation models have been developed to assess the corrections made
by grammatical error correction (GEC) models when the reference or
gold-standard corrections are not available. An ideal quality estimator can be
utilized to combine the outputs of multiple GEC systems by choosing the best
subset of edits from the union of all edits proposed by the GEC base systems.
However, we found that existing GEC quality estimation models are not good
enough in differentiating good corrections from bad ones, resulting in a low
F0.5 score when used for system combination. In this paper, we propose GRECO, a
new state-of-the-art quality estimation model that gives a better estimate of
the quality of a corrected sentence, as indicated by having a higher
correlation to the F0.5 score of a corrected sentence. It results in a combined
GEC system with a higher F0.5 score. We also propose three methods for
utilizing GEC quality estimation models for system combination with varying
generality: model-agnostic, model-agnostic with voting bias, and
model-dependent method. The combined GEC system outperforms the state of the
art on the CoNLL-2014 test set and the BEA-2019 test set, achieving the highest
F0.5 scores published to date.Comment: EMNLP 202
Minimum Bayes' Risk Decoding for System Combination of Grammatical Error Correction Systems
For sequence-to-sequence tasks it is challenging to combine individual system
outputs. Further, there is also often a mismatch between the decoding criterion
and the one used for assessment. Minimum Bayes' Risk (MBR) decoding can be used
to combine system outputs in a manner that encourages better alignment with the
final assessment criterion. This paper examines MBR decoding for Grammatical
Error Correction (GEC) systems, where performance is usually evaluated in terms
of edits and an associated F-score. Hence, we propose a novel MBR loss function
directly linked to this form of criterion. Furthermore, an approach to expand
the possible set of candidate sentences is described. This builds on a current
max-voting combination scheme, as well as individual edit-level selection.
Experiments on three popular GEC datasets and with state-of-the-art GEC systems
demonstrate the efficacy of the proposed MBR approach. Additionally, the paper
highlights how varying reward metrics within the MBR decoding framework can
provide control over precision, recall, and the F-score in combined GEC
systems
RedPenNet for Grammatical Error Correction: Outputs to Tokens, Attentions to Spans
The text editing tasks, including sentence fusion, sentence splitting and
rephrasing, text simplification, and Grammatical Error Correction (GEC), share
a common trait of dealing with highly similar input and output sequences. This
area of research lies at the intersection of two well-established fields: (i)
fully autoregressive sequence-to-sequence approaches commonly used in tasks
like Neural Machine Translation (NMT) and (ii) sequence tagging techniques
commonly used to address tasks such as Part-of-speech tagging, Named-entity
recognition (NER), and similar. In the pursuit of a balanced architecture,
researchers have come up with numerous imaginative and unconventional
solutions, which we're discussing in the Related Works section. Our approach to
addressing text editing tasks is called RedPenNet and is aimed at reducing
architectural and parametric redundancies presented in specific
Sequence-To-Edits models, preserving their semi-autoregressive advantages. Our
models achieve scores of 77.60 on the BEA-2019 (test), which can be
considered as state-of-the-art the only exception for system combination and
67.71 on the UAGEC+Fluency (test) benchmarks.
This research is being conducted in the context of the UNLP 2023 workshop,
where it was presented as a paper as a paper for the Shared Task in Grammatical
Error Correction (GEC) for Ukrainian. This study aims to apply the RedPenNet
approach to address the GEC problem in the Ukrainian language
- …