9,994 research outputs found
Automatic correction of grammatical errors in non-native English text
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Cataloged from PDF version of thesis.Includes bibliographical references (p. 99-107).Learning a foreign language requires much practice outside of the classroom. Computer-assisted language learning systems can help fill this need, and one desirable capability of such systems is the automatic correction of grammatical errors in texts written by non-native speakers. This dissertation concerns the correction of non-native grammatical errors in English text, and the closely related task of generating test items for language learning, using a combination of statistical and linguistic methods. We show that syntactic analysis enables extraction of more salient features. We address issues concerning robustness in feature extraction from non-native texts; and also design a framework for simultaneous correction of multiple error types. Our proposed methods are applied on some of the most common usage errors, including prepositions, verb forms, and articles. The methods are evaluated on sentences with synthetic and real errors, and in both restricted and open domains. A secondary theme of this dissertation is that of user customization. We perform a detailed analysis on a non-native corpus, illustrating the utility of an error model based on the mother tongue. We study the benefits of adjusting the correction models based on the quality of the input text; and also present novel methods to generate high-quality multiple-choice items that are tailored to the interests of the user.by John Sie Yuen Lee.Ph.D
JFLEG: A Fluency Corpus and Benchmark for Grammatical Error Correction
We present a new parallel corpus, JHU FLuency-Extended GUG corpus (JFLEG) for
developing and evaluating grammatical error correction (GEC). Unlike other
corpora, it represents a broad range of language proficiency levels and uses
holistic fluency edits to not only correct grammatical errors but also make the
original text more native sounding. We describe the types of corrections made
and benchmark four leading GEC systems on this corpus, identifying specific
areas in which they do well and how they can improve. JFLEG fulfills the need
for a new gold standard to properly assess the current state of GEC.Comment: To appear in EACL 2017 (short papers
GenERRate: generating errors for use in grammatical error detection
This paper explores the issue of automatically generated ungrammatical data and its use in error detection, with a focus on the task of classifying a sentence as grammatical or ungrammatical. We present an error generation tool called GenERRate and show how GenERRate can be used to improve the performance of a classifier on learner data. We describe
initial attempts to replicate Cambridge Learner Corpus errors using GenERRate
Automated Detection of Usage Errors in non-native English Writing
In an investigation of the use of a novelty detection algorithm for identifying inappropriate word
combinations in a raw English corpus, we employ an
unsupervised detection algorithm based on the one-
class support vector machines (OC-SVMs) and extract
sentences containing word sequences whose frequency
of appearance is significantly low in native English
writing. Combined with n-gram language models and
document categorization techniques, the OC-SVM classifier assigns given sentences into two different
groups; the sentences containing errors and those
without errors. Accuracies are 79.30 % with bigram
model, 86.63 % with trigram model, and 34.34 % with four-gram model
An Analysis of Source-Side Grammatical Errors in NMT
The quality of Neural Machine Translation (NMT) has been shown to
significantly degrade when confronted with source-side noise. We present the
first large-scale study of state-of-the-art English-to-German NMT on real
grammatical noise, by evaluating on several Grammar Correction corpora. We
present methods for evaluating NMT robustness without true references, and we
use them for extensive analysis of the effects that different grammatical
errors have on the NMT output. We also introduce a technique for visualizing
the divergence distribution caused by a source-side error, which allows for
additional insights.Comment: Accepted and to be presented at BlackboxNLP 201
- …