3 research outputs found
Learning to combine Grammatical Error Corrections
The field of Grammatical Error Correction (GEC) has produced various systems
to deal with focused phenomena or general text editing. We propose an automatic
way to combine black-box systems. Our method automatically detects the strength
of a system or the combination of several systems per error type, improving
precision and recall while optimizing score directly. We show consistent
improvement over the best standalone system in all the configurations tested.
This approach also outperforms average ensembling of different RNN models with
random initializations.
In addition, we analyze the use of BERT for GEC - reporting promising results
on this end. We also present a spellchecker created for this task which
outperforms standard spellcheckers tested on the task of spellchecking.
This paper describes a system submission to Building Educational Applications
2019 Shared Task: Grammatical Error Correction.
Combining the output of top BEA 2019 shared task systems using our approach,
currently holds the highest reported score in the open phase of the BEA 2019
shared task, improving F0.5 by 3.7 points over the best result reported.Comment: BEA 201
SERRANT: a syntactic classifier for English Grammatical Error Types
SERRANT is a system and code for automatic classification of English
grammatical errors that combines SErCl and ERRANT. SERRANT uses ERRANT's
annotations when they are informative and those provided by SErCl otherwise.Comment: Code library in: https://github.com/matanel-oren/serran
A Comprehensive Survey of Grammar Error Correction
Grammar error correction (GEC) is an important application aspect of natural
language processing techniques. The past decade has witnessed significant
progress achieved in GEC for the sake of increasing popularity of machine
learning and deep learning, especially in late 2010s when near human-level GEC
systems are available. However, there is no prior work focusing on the whole
recapitulation of the progress. We present the first survey in GEC for a
comprehensive retrospect of the literature in this area. We first give the
introduction of five public datasets, data annotation schema, two important
shared tasks and four standard evaluation metrics. More importantly, we discuss
four kinds of basic approaches, including statistical machine translation based
approach, neural machine translation based approach, classification based
approach and language model based approach, six commonly applied performance
boosting techniques for GEC systems and two data augmentation methods. Since
GEC is typically viewed as a sister task of machine translation, many GEC
systems are based on neural machine translation (NMT) approaches, where the
neural sequence-to-sequence model is applied. Similarly, some performance
boosting techniques are adapted from machine translation and are successfully
combined with GEC systems for enhancement on the final performance.
Furthermore, we conduct an analysis in level of basic approaches, performance
boosting techniques and integrated GEC systems based on their experiment
results respectively for more clear patterns and conclusions. Finally, we
discuss five prospective directions for future GEC researches