Search CORE

4,840 research outputs found

Automated Detection of Usage Errors in non-native English Writing

Author: Fujishima Satoru
Ishizaki Shun
Publication venue
Publication date: 26/10/2011
Field of study

In an investigation of the use of a novelty detection algorithm for identifying inappropriate word combinations in a raw English corpus, we employ an unsupervised detection algorithm based on the one- class support vector machines (OC-SVMs) and extract sentences containing word sequences whose frequency of appearance is significantly low in native English writing. Combined with n-gram language models and document categorization techniques, the OC-SVM classifier assigns given sentences into two different groups; the sentences containing errors and those without errors. Accuracies are 79.30 % with bigram model, 86.63 % with trigram model, and 34.34 % with four-gram model

EEPIS Repository

Recommended from our members

Automatic Grammatical Error Detection of Non-native Spoken Learner English

Author: Caines AP
Gales MJF
Knill KM
Manakul PP
Publication venue: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Publication date: 01/01/2019
Field of study

Automatic language assessment and learning systems are required to support the global growth in English language learning. They need to be able to provide reliable and meaningful feedback to help learners develop their skills. This paper considers the question of detecting grammatical errors in non-native spoken English as a first step to providing feedback on a learner's use of the language. A state-of-the-art deep learning based grammatical error detection (GED) system designed for written texts is investigated on free speaking tasks across the full range of proficiency grades with a mix of first languages (L1s). This presents a number of challenges. Free speech contains disfluencies that disrupt the spoken language flow but are not grammatical errors. The lower the level of the learner the more these both will occur which makes the underlying task of automatic transcription harder. The baseline written GED system is seen to perform less well on manually transcribed spoken language. When the GED model is fine-tuned to free speech data from the target domain the spoken system is able to match the written performance. Given the current state-of-the-art in ASR, however, and the ability to detect disfluencies grammatical error feedback from automated transcriptions remains a challenge.This paper reports on research supported by Cambridge Assessment, University of Cambridge. Thanks to Cambridge English Language Assessment for supporting this research and providing access to the BULATS dat

Apollo (Cambridge)

Wronging a Right: Generating Better Errors to Improve Grammatical Error Detection

Author: Kasewa Sudhanshu
Riedel Sebastian
Stenetorp Pontus
Publication venue
Publication date: 01/01/2018
Field of study

Grammatical error correction, like other machine learning tasks, greatly benefits from large quantities of high quality training data, which is typically expensive to produce. While writing a program to automatically generate realistic grammatical errors would be difficult, one could learn the distribution of naturallyoccurring errors and attempt to introduce them into other datasets. Initial work on inducing errors in this way using statistical machine translation has shown promise; we investigate cheaply constructing synthetic samples, given a small corpus of human-annotated data, using an off-the-rack attentive sequence-to-sequence model and a straight-forward post-processing procedure. Our approach yields error-filled artificial data that helps a vanilla bi-directional LSTM to outperform the previous state of the art at grammatical error detection, and a previously introduced model to gain further improvements of over 5%

F_{0.5}

score. When attempting to determine if a given sentence is synthetic, a human annotator at best achieves 39.39

F_1

score, indicating that our model generates mostly human-like instances.Comment: Accepted as a short paper at EMNLP 201

arXiv.org e-Print Archive

Crossref

UCL Discovery

Compositional sequence labeling models for error detection in learner writing

Author: Rei M
Yannakoudakis H
Publication venue: 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Long Papers
Publication date: 20/07/2016
Field of study

© 2016 Association for Computational Linguistics. In this paper, we present the first experiments using neural network models for the task of error detection in learner writing. We perform a systematic comparison of alternative compositional architectures and propose a framework for error detection based on bidirectional LSTMs. Experiments on the CoNLL-14 shared task dataset show the model is able to outperform other participants on detecting errors in learner writing. Finally, the model is integrated with a publicly deployed self-assessment system, leading to performance comparable to human annotators

arXiv.org e-Print Archive

Apollo (Cambridge)

King's Research Portal

DEUCE : a test-bed for evaluating ESL competence criteria

Author: Ozasa T.
Weir G.R.S.
Publication venue
Publication date: 01/01/2003
Field of study

This paper describes work in progress to apply a Web-based facility for evaluating differing criteria for English language competence. The proposed system, Discriminated Evaluation of User's Competence with English (DEUCE), addresses the problem of determining the efficacy of individual criteria for competence in English as a Second Language (ESL). We describe the rationale, design and application of DEUCE and outline its potential as a discriminator for ESL competence criteria and as a basis for low cost mass ESL competence testing

University of Strathclyde Institutional Repository