4,840 research outputs found
Automated Detection of Usage Errors in non-native English Writing
In an investigation of the use of a novelty detection algorithm for identifying inappropriate word
combinations in a raw English corpus, we employ an
unsupervised detection algorithm based on the one-
class support vector machines (OC-SVMs) and extract
sentences containing word sequences whose frequency
of appearance is significantly low in native English
writing. Combined with n-gram language models and
document categorization techniques, the OC-SVM classifier assigns given sentences into two different
groups; the sentences containing errors and those
without errors. Accuracies are 79.30 % with bigram
model, 86.63 % with trigram model, and 34.34 % with four-gram model
Recommended from our members
Automatic Grammatical Error Detection of Non-native Spoken Learner English
Automatic language assessment and learning systems are required to support the global growth in English language learning. They need to be able to provide reliable and meaningful feedback to help learners develop their skills. This paper considers the question of detecting grammatical errors in non-native spoken English as a first step to providing feedback on a learner's use of the language. A state-of-the-art deep learning based grammatical error detection (GED) system designed for written texts is investigated on free speaking tasks across the full range of proficiency grades with a mix of first languages (L1s). This presents a number of challenges. Free speech contains disfluencies that disrupt the spoken language flow but are not grammatical errors. The lower the level of the learner the more these both will occur which makes the underlying task of automatic transcription harder. The baseline written GED system is seen to perform less well on manually transcribed spoken language. When the GED model is fine-tuned to free speech data from the target domain the spoken system is able to match the written performance. Given the current state-of-the-art in ASR, however, and the ability to detect disfluencies grammatical error feedback from automated transcriptions remains a challenge.This paper reports on research supported by Cambridge Assessment, University of Cambridge. Thanks to Cambridge English Language Assessment for supporting this research and providing access to the BULATS dat
Wronging a Right: Generating Better Errors to Improve Grammatical Error Detection
Grammatical error correction, like other machine learning tasks, greatly
benefits from large quantities of high quality training data, which is
typically expensive to produce. While writing a program to automatically
generate realistic grammatical errors would be difficult, one could learn the
distribution of naturallyoccurring errors and attempt to introduce them into
other datasets. Initial work on inducing errors in this way using statistical
machine translation has shown promise; we investigate cheaply constructing
synthetic samples, given a small corpus of human-annotated data, using an
off-the-rack attentive sequence-to-sequence model and a straight-forward
post-processing procedure. Our approach yields error-filled artificial data
that helps a vanilla bi-directional LSTM to outperform the previous state of
the art at grammatical error detection, and a previously introduced model to
gain further improvements of over 5% score. When attempting to
determine if a given sentence is synthetic, a human annotator at best achieves
39.39 score, indicating that our model generates mostly human-like
instances.Comment: Accepted as a short paper at EMNLP 201
Compositional sequence labeling models for error detection in learner writing
© 2016 Association for Computational Linguistics. In this paper, we present the first experiments using neural network models for the task of error detection in learner writing. We perform a systematic comparison of alternative compositional architectures and propose a framework for error detection based on bidirectional LSTMs. Experiments on the CoNLL-14 shared task dataset show the model is able to outperform other participants on detecting errors in learner writing. Finally, the model is integrated with a publicly deployed self-assessment system, leading to performance comparable to human annotators
DEUCE : a test-bed for evaluating ESL competence criteria
This paper describes work in progress to apply a Web-based facility for evaluating differing criteria for English language competence. The proposed system, Discriminated Evaluation of User's Competence with English (DEUCE), addresses the problem of determining the efficacy of individual criteria for competence in English as a Second Language (ESL). We describe the rationale, design and application of DEUCE and outline its potential as a discriminator for ESL competence criteria and as a basis for low cost mass ESL competence testing
- …