Search CORE

69 research outputs found

Recommended from our members

HOO 2012 Error Recognition and Correction Shared Task: Cambridge University Submission Report

Author: Andersen O
Briscoe E
Kochmar E
Publication venue: http://aclweb.org/anthology/W12-2028
Publication date: 01/06/2012
Field of study

Previous work on automated error recognition and correction of texts written by learners of English as a Second Language has demonstrated experimentally that training classifiers on error-annotated ESL text generally outperforms training on native text alone and that adaptation of error correction models to the native language (L1) of the writer improves performance. Nevertheless, most extant models have poor precision, particularly when attempting error correction, and this limits their usefulness in practical applications requiring feedback. We experiment with various feature types, varying quantities of error-corrected data, and generic versus L1-specific adaptation to typical errors using Naïve Bayes (NB) classifiers and develop one model which maximizes precision. We report and discuss the results for 8 models, 5 trained on the HOO data and 3 (partly) on the full error-coded Cambridge Learner Corpus, from which the HOO data is drawn.We thank Cambridge ESOL, a division of Cambridge Assessment for a partial grant to the first author and a research contract with iLexIR Ltd. We also thank them and Cambridge University Press for granting us access to the CLC for research purposes

Apollo (Cambridge)

A Statistical Approach to Grammatical Error Correction

Author: DANIEL HERMANN RICHARD DAHLMEIER
Publication venue
Publication date: 25/01/2013
Field of study

Ph.DDOCTOR OF PHILOSOPH

ScholarBank@NUS

Recommended from our members

Auxiliary Objectives for Neural Error Detection Models

Author: Giannakoudaki E
Rei M
Publication venue
Publication date: 08/09/2017
Field of study

We investigate the utility of different auxiliary objectives and training strategies within a neural sequence labeling approach to error detection in learner writing. Auxiliary costs provide the model with additional linguistic information, allowing it to learn general-purpose compositional features that can then be exploited for other objectives. Our experiments show that a joint learning approach trained with parallel labels on in-domain data improves performance over the previous best error detection system. While the resulting model has the same number of parameters, the additional objectives allow it to be optimised more efficiently and achieve better performance

Apollo (Cambridge)

Computational Models of Problems with Writing of English as a Second Language Learners

Author: Xue Huichao
Publication venue
Publication date: 01/01/2015
Field of study

Learning a new language is a challenging endeavor. As a student attempts to master the grammar usage and mechanics of the new language, they make many mistakes. Detailed feedback and corrections from language tutors are invaluable to student learning, but it is time consuming to provide such feedback. In this thesis, I investigate the feasibility of building computer programs to help to reduce the efforts of English as a Second Language (ESL) tutors. Specifically, I consider three problems: (1) whether a program can identify areas that may need the tutor’s attention, such as places where the learners have used redundant words; (2) whether a program can auto-complete a tutor’s corrections by inferring the location and reason for the correction; (3) for detecting misusages of prepositions, a common ESL error type, whether a program can automatically construct a set of potential corrections by finding words that are more likely to be confused with each other (known as a confusion set). The viability of these programs depends on whether aspects of the English language and common ESL mistakes can be described by computational models. For each task, building computational models faces unique challenges: (1) In highlighting redundant areas, it is difficult to precisely define “redundancy” in a computer’s language. (2) In auto-completing tutors’ annotations, it is difficult for computers to correctly interpret how many writing problems were addressed during revision. (3) In confusion set construction, it is difficult to infer which words are more likely confused with the given word. To address these challenges, this thesis presents different model alternatives for each task. Empirical experiments demonstrate the degrees of success to which computational models can help with detecting and correcting ESL writing problem

D-Scholarship@Pitt

ProQuest OAI Repository

Recommended from our members

Automatic annotation of error types for grammatical error correction

Author: Bryant Christopher Jack
Publication venue: University of Cambridge
Publication date: 18/06/2019
Field of study

Grammatical Error Correction (GEC) is the task of automatically detecting and correcting grammatical errors in text. Although previous work has focused on developing systems that target specific error types, the current state of the art uses machine translation to correct all error types simultaneously. A significant disadvantage of this approach is that machine translation does not produce annotated output and so error type information is lost. This means we can only evaluate a system in terms of overall performance and cannot carry out a more detailed analysis of different aspects of system performance. In this thesis, I develop a system to automatically annotate parallel original and corrected sentence pairs with explicit edits and error types. In particular, I first extend the Damerau- Levenshtein alignment algorithm to make use of linguistic information when aligning parallel sentences, and supplement this alignment with a set of merging rules to handle multi-token edits. The output from this algorithm surpasses other edit extraction approaches in terms of approximating human edit annotations and is the current state of the art. Having extracted the edits, I next classify them according to a new rule-based error type framework that depends only on automatically obtained linguistic properties of the data, such as part-of-speech tags. This framework was inspired by existing frameworks, and human judges rated the appropriateness of the predicted error types as ‘Good’ (85%) or ‘Acceptable’ (10%) in a random sample of 200 edits. The whole system is called the ERRor ANnotation Toolkit (ERRANT) and is the first toolkit capable of automatically annotating parallel sentences with error types. I demonstrate the value of ERRANT by applying it to the system output produced by the participants of the CoNLL-2014 shared task, and carry out a detailed error type analysis of system performance for the first time. I also develop a simple language model based approach to GEC, that does not require annotated training data, and show how it can be improved using ERRANT error types

Apollo (Cambridge)

A Light Rule-based Approach to English Subject-Verb Agreement Errors on the Third Person Singular Forms

Author: Shi Dan
Wang Yuzhu
Zhao Hai
Publication venue
Publication date: 01/01/2015
Field of study

Waseda University Repository

Recommended from our members

Neural Sequence-Labelling Models for Grammatical Error Correction

Author: Andersen OE
Giannakoudaki E
Rei M
Yuan Zheng
Publication venue: Proceedings of the 2017 Conference on Empirical Methods in natural Language Processing
Publication date: 30/09/2017
Field of study

We propose an approach to N-best list reranking using neural sequence-labelling models. We train a compositional model for error detection that calculates the probability of each token in a sentence being correct or incorrect, utilising the full sentence as context. Using the error detection model, we then re-rank the N best hypotheses generated by statistical machine translation systems. Our approach achieves state-of-the-art results on error correction for three different datasets, and it has the additional advantage of only using a small set of easily computed features that require no linguistic input

Apollo (Cambridge)