Search CORE

1,744 research outputs found

Neural Sequence-Labelling Models for Grammatical Error Correction

Author: Andersen OE
Giannakoudaki E
Rei M
Yuan Zheng
Publication venue: Proceedings of the 2017 Conference on Empirical Methods in natural Language Processing
Publication date: 30/09/2017
Field of study

We propose an approach to N-best list reranking using neural sequence-labelling models. We train a compositional model for error detection that calculates the probability of each token in a sentence being correct or incorrect, utilising the full sentence as context. Using the error detection model, we then re-rank the N best hypotheses generated by statistical machine translation systems. Our approach achieves state-of-the-art results on error correction for three different datasets, and it has the additional advantage of only using a small set of easily computed features that require no linguistic input

Wronging a Right: Generating Better Errors to Improve Grammatical Error Detection

Author: Kasewa Sudhanshu
Riedel Sebastian
Stenetorp Pontus
Publication venue
Publication date: 01/01/2018
Field of study

Grammatical error correction, like other machine learning tasks, greatly benefits from large quantities of high quality training data, which is typically expensive to produce. While writing a program to automatically generate realistic grammatical errors would be difficult, one could learn the distribution of naturallyoccurring errors and attempt to introduce them into other datasets. Initial work on inducing errors in this way using statistical machine translation has shown promise; we investigate cheaply constructing synthetic samples, given a small corpus of human-annotated data, using an off-the-rack attentive sequence-to-sequence model and a straight-forward post-processing procedure. Our approach yields error-filled artificial data that helps a vanilla bi-directional LSTM to outperform the previous state of the art at grammatical error detection, and a previously introduced model to gain further improvements of over 5%

F_{0.5}

score. When attempting to determine if a given sentence is synthetic, a human annotator at best achieves 39.39

F_1

score, indicating that our model generates mostly human-like instances.Comment: Accepted as a short paper at EMNLP 201

arXiv.org e-Print Archive

EliCoDe at MultiGED2023: fine-tuning XLM-RoBERTa for multilingual grammatical error detection

Author: Colla Davide
Delsanto Matteo
Di Nuovo Elisa
Publication venue: Linköping University Electronic Press
Publication date: 01/01/2023
Field of study

Grammatical Error Correction: A Survey of the State of the Art

Author: Briscoe Ted
Bryant Christopher
Cao Hannan
Ng Hwee Tou
Qorib Muhammad Reza
Yuan Zheng
Publication venue
Publication date: 25/03/2023
Field of study

Grammatical Error Correction (GEC) is the task of automatically detecting and correcting errors in text. The task not only includes the correction of grammatical errors, such as missing prepositions and mismatched subject-verb agreement, but also orthographic and semantic errors, such as misspellings and word choice errors respectively. The field has seen significant progress in the last decade, motivated in part by a series of five shared tasks, which drove the development of rule-based methods, statistical classifiers, statistical machine translation, and finally neural machine translation systems which represent the current dominant state of the art. In this survey paper, we condense the field into a single article and first outline some of the linguistic challenges of the task, introduce the most popular datasets that are available to researchers (for both English and other languages), and summarise the various methods and techniques that have been developed with a particular focus on artificial error generation. We next describe the many different approaches to evaluation as well as concerns surrounding metric reliability, especially in relation to subjective human judgements, before concluding with an overview of recent progress and suggestions for future work and remaining challenges. We hope that this survey will serve as comprehensive resource for researchers who are new to the field or who want to be kept apprised of recent developments

arXiv.org e-Print Archive