5 research outputs found

    Problems in Evaluating Grammatical Error Detection Systems

    Get PDF
    ABSTRACT Many evaluation issues for grammatical error detection have previously been overlooked, making it hard to draw meaningful comparisons between different approaches, even when they are evaluated on the same corpus. To begin with, the three-way contingency between a writer's sentence, the annotator's correction, and the system's output makes evaluation more complex than in some other NLP tasks, which we address by presenting an intuitive evaluation scheme. Of particular importance to error detection is the skew of the data -the low frequency of errors as compared to non-errors -which distorts some traditional measures of performance and limits their usefulness, leading us to recommend the reporting of raw measurements (true positives, false negatives, false positives, true negatives). Other issues that are particularly vexing for error detection focus on defining these raw measurements: specifying the size or scope of an error, properly treating errors as graded rather than discrete phenomena, and counting non-errors. We discuss recommendations for best practices with regard to reporting the results of system evaluation for these cases, recommendations which depend upon making clear one's assumptions and applications for error detection. By highlighting the problems with current error detection evaluation, the field will be better able to move forward

    Computational Models of Problems with Writing of English as a Second Language Learners

    Get PDF
    Learning a new language is a challenging endeavor. As a student attempts to master the grammar usage and mechanics of the new language, they make many mistakes. Detailed feedback and corrections from language tutors are invaluable to student learning, but it is time consuming to provide such feedback. In this thesis, I investigate the feasibility of building computer programs to help to reduce the efforts of English as a Second Language (ESL) tutors. Specifically, I consider three problems: (1) whether a program can identify areas that may need the tutor’s attention, such as places where the learners have used redundant words; (2) whether a program can auto-complete a tutor’s corrections by inferring the location and reason for the correction; (3) for detecting misusages of prepositions, a common ESL error type, whether a program can automatically construct a set of potential corrections by finding words that are more likely to be confused with each other (known as a confusion set). The viability of these programs depends on whether aspects of the English language and common ESL mistakes can be described by computational models. For each task, building computational models faces unique challenges: (1) In highlighting redundant areas, it is difficult to precisely define “redundancy” in a computer’s language. (2) In auto-completing tutors’ annotations, it is difficult for computers to correctly interpret how many writing problems were addressed during revision. (3) In confusion set construction, it is difficult to infer which words are more likely confused with the given word. To address these challenges, this thesis presents different model alternatives for each task. Empirical experiments demonstrate the degrees of success to which computational models can help with detecting and correcting ESL writing problem

    Detecting grammatical errors with treebank-induced, probabilistic parsers

    Get PDF
    Today's grammar checkers often use hand-crafted rule systems that define acceptable language. The development of such rule systems is labour-intensive and has to be repeated for each language. At the same time, grammars automatically induced from syntactically annotated corpora (treebanks) are successfully employed in other applications, for example text understanding and machine translation. At first glance, treebank-induced grammars seem to be unsuitable for grammar checking as they massively over-generate and fail to reject ungrammatical input due to their high robustness. We present three new methods for judging the grammaticality of a sentence with probabilistic, treebank-induced grammars, demonstrating that such grammars can be successfully applied to automatically judge the grammaticality of an input string. Our best-performing method exploits the differences between parse results for grammars trained on grammatical and ungrammatical treebanks. The second approach builds an estimator of the probability of the most likely parse using grammatical training data that has previously been parsed and annotated with parse probabilities. If the estimated probability of an input sentence (whose grammaticality is to be judged by the system) is higher by a certain amount than the actual parse probability, the sentence is flagged as ungrammatical. The third approach extracts discriminative parse tree fragments in the form of CFG rules from parsed grammatical and ungrammatical corpora and trains a binary classifier to distinguish grammatical from ungrammatical sentences. The three approaches are evaluated on a large test set of grammatical and ungrammatical sentences. The ungrammatical test set is generated automatically by inserting common grammatical errors into the British National Corpus. The results are compared to two traditional approaches, one that uses a hand-crafted, discriminative grammar, the XLE ParGram English LFG, and one based on part-of-speech n-grams. In addition, the baseline methods and the new methods are combined in a machine learning-based framework, yielding further improvements
    corecore