273 research outputs found
Recommended from our members
HOO 2012 Error Recognition and Correction Shared Task: Cambridge University Submission Report
Previous work on automated error recognition and correction of texts written by learners of English as a Second Language has demonstrated experimentally that training classifiers on error-annotated ESL text generally outperforms training on native text alone and that adaptation of error correction models to the native language (L1) of the writer improves performance. Nevertheless, most extant models have poor precision, particularly when attempting error correction, and this limits their usefulness in practical applications requiring feedback. We experiment with various feature types, varying quantities of error-corrected data, and generic versus L1-specific adaptation to typical errors using Naïve Bayes (NB) classifiers and develop one model which maximizes precision. We report and discuss the results for 8 models, 5 trained on the HOO data and 3 (partly) on the full error-coded Cambridge Learner Corpus, from which the HOO data is drawn.We thank Cambridge ESOL, a division of Cambridge Assessment for a partial grant to the first author and a research contract with iLexIR Ltd. We also thank them and Cambridge University Press for granting us access to the CLC for research purposes
Judging grammaticality: experiments in sentence classification
A classifier which is capable of distinguishing a syntactically well formed sentence from a syntactically ill formed one has the potential to be useful in an L2 language-learning context. In this article, we describe a classifier which classifies English sentences as either well formed or ill formed using information gleaned from three different natural language processing techniques. We describe the issues involved in acquiring data to train such a classifier and present experimental results for this classifier on a variety of ill formed sentences. We demonstrate that (a) the combination of information from a variety of linguistic sources is helpful, (b) the trade-off between accuracy on well formed sentences and accuracy on ill formed sentences can be fine tuned by training multiple classifiers in a voting scheme, and (c) the performance of the classifier is varied, with better performance on transcribed spoken sentences produced by less advanced language learners
Recommended from our members
Automatic Detection of Preposition Errors in Learner Writing
In this article, we present an approach to the automatic correction of preposition errors in L2 English. Our system, based on a maximum entropy classifier, achieves average precision of 42% and recall of 35% on this task. The discussion of results obtained on correct and incorrect data aims to establish what characteristics of L2 writing prove particularly problematic in this task
Correcting Preposition Errors in Learner English Using Error Case Frames and Feedback Messages
Abstract This paper presents a novel framework called error case frames for correcting preposition errors. They are case frames specially designed for describing and correcting preposition errors. Their most distinct advantage is that they can correct errors with feedback messages explaining why the preposition is erroneous. This paper proposes a method for automatically generating them by comparing learner and native corpora. Experiments show (i) automatically generated error case frames achieve a performance comparable to conventional methods; (ii) error case frames are intuitively interpretable and manually modifiable to improve them; (iii) feedback messages provided by error case frames are effective in language learning assistance. Considering these advantages and the fact that it has been difficult to provide feedback messages by automatically generated rules, error case frames will likely be one of the major approaches for preposition error correction
Recommended from our members
Automatic annotation of error types for grammatical error correction
Grammatical Error Correction (GEC) is the task of automatically detecting and correcting
grammatical errors in text. Although previous work has focused on developing systems that
target specific error types, the current state of the art uses machine translation to correct all error
types simultaneously. A significant disadvantage of this approach is that machine translation
does not produce annotated output and so error type information is lost. This means we can only
evaluate a system in terms of overall performance and cannot carry out a more detailed analysis
of different aspects of system performance.
In this thesis, I develop a system to automatically annotate parallel original and corrected
sentence pairs with explicit edits and error types. In particular, I first extend the Damerau-
Levenshtein alignment algorithm to make use of linguistic information when aligning parallel
sentences, and supplement this alignment with a set of merging rules to handle multi-token
edits. The output from this algorithm surpasses other edit extraction approaches in terms of
approximating human edit annotations and is the current state of the art. Having extracted the
edits, I next classify them according to a new rule-based error type framework that depends only
on automatically obtained linguistic properties of the data, such as part-of-speech tags. This
framework was inspired by existing frameworks, and human judges rated the appropriateness
of the predicted error types as ‘Good’ (85%) or ‘Acceptable’ (10%) in a random sample of 200
edits. The whole system is called the ERRor ANnotation Toolkit (ERRANT) and is the first
toolkit capable of automatically annotating parallel sentences with error types.
I demonstrate the value of ERRANT by applying it to the system output produced by the participants of the CoNLL-2014 shared task, and carry out a detailed error type analysis of
system performance for the first time. I also develop a simple language model based approach
to GEC, that does not require annotated training data, and show how it can be improved using
ERRANT error types
- …