9,700 research outputs found
A Nested Attention Neural Hybrid Model for Grammatical Error Correction
Grammatical error correction (GEC) systems strive to correct both global
errors in word order and usage, and local errors in spelling and inflection.
Further developing upon recent work on neural machine translation, we propose a
new hybrid neural model with nested attention layers for GEC. Experiments show
that the new model can effectively correct errors of both types by
incorporating word and character-level information,and that the model
significantly outperforms previous neural models for GEC as measured on the
standard CoNLL-14 benchmark dataset. Further analysis also shows that the
superiority of the proposed model can be largely attributed to the use of the
nested attention mechanism, which has proven particularly effective in
correcting local errors that involve small edits in orthography
GrammarGPT: Exploring Open-Source LLMs for Native Chinese Grammatical Error Correction with Supervised Fine-Tuning
Grammatical error correction aims to correct ungrammatical sentences
automatically. Recently, some work has demonstrated the excellent capabilities
of closed-source Large Language Models (LLMs, e.g., ChatGPT) in grammatical
error correction. However, the potential of open-source LLMs remains
unexplored. In this paper, we introduced GrammarGPT, an open-source LLM, to
preliminary explore its potential for native Chinese grammatical error
correction. The core recipe of GrammarGPT is to leverage the hybrid dataset of
ChatGPT-generated and human-annotated. For grammatical errors with clues, we
proposed a heuristic method to guide ChatGPT to generate ungrammatical
sentences by providing those clues. For grammatical errors without clues, we
collected ungrammatical sentences from publicly available websites and manually
corrected them. In addition, we employed an error-invariant augmentation method
to enhance the ability of the model to correct native Chinese grammatical
errors. We ultimately constructed about 1k parallel data and utilized these
data to fine-tune open-source LLMs (e.g., Phoenix, released by The Chinese
University of Hong Kong, Shenzhen) with instruction tuning. The experimental
results show that GrammarGPT outperforms the existing SOTA system
significantly. Although model parameters are 20x larger than the SOTA baseline,
the required amount of data for instruction tuning is 1200x smaller,
illustrating the potential of open-source LLMs on native CGEC. Our GrammarGPT
ranks on NLPCC2023 SharedTask1, demonstrating our approach's
effectiveness. The code and data are available at
\url{https://github.com/FreedomIntelligence/GrammarGPT}
Grammatical error correction using hybrid systems and type filtering
This paper describes our submission to the CoNLL 2014 shared task on grammatical error correction using a hybrid approach, which includes both a rule-based and an SMT system augmented by a large webbased
language model. Furthermore, we demonstrate that correction type estimation can be used to remove unnecessary corrections, improving precision without harming recall. Our best hybrid system achieves state of-the-art results, ranking first on the original test set and second on the test set with alternative annotations.[We would like to thank] Cambridge English Language Assessment, a division of Cambridge Assessment, for supporting this research
JFLEG: A Fluency Corpus and Benchmark for Grammatical Error Correction
We present a new parallel corpus, JHU FLuency-Extended GUG corpus (JFLEG) for
developing and evaluating grammatical error correction (GEC). Unlike other
corpora, it represents a broad range of language proficiency levels and uses
holistic fluency edits to not only correct grammatical errors but also make the
original text more native sounding. We describe the types of corrections made
and benchmark four leading GEC systems on this corpus, identifying specific
areas in which they do well and how they can improve. JFLEG fulfills the need
for a new gold standard to properly assess the current state of GEC.Comment: To appear in EACL 2017 (short papers
- …