7 research outputs found

    Problems in Evaluating Grammatical Error Detection Systems

    Get PDF
    ABSTRACT Many evaluation issues for grammatical error detection have previously been overlooked, making it hard to draw meaningful comparisons between different approaches, even when they are evaluated on the same corpus. To begin with, the three-way contingency between a writer's sentence, the annotator's correction, and the system's output makes evaluation more complex than in some other NLP tasks, which we address by presenting an intuitive evaluation scheme. Of particular importance to error detection is the skew of the data -the low frequency of errors as compared to non-errors -which distorts some traditional measures of performance and limits their usefulness, leading us to recommend the reporting of raw measurements (true positives, false negatives, false positives, true negatives). Other issues that are particularly vexing for error detection focus on defining these raw measurements: specifying the size or scope of an error, properly treating errors as graded rather than discrete phenomena, and counting non-errors. We discuss recommendations for best practices with regard to reporting the results of system evaluation for these cases, recommendations which depend upon making clear one's assumptions and applications for error detection. By highlighting the problems with current error detection evaluation, the field will be better able to move forward

    Robust Text Correction for Grammar and Fluency

    Get PDF
    Grammar is one of the most important properties of natural language. It is a set of structural (i.e., syntactic and morphological) rules that are shared among native speakers in order to engage smooth communication. Automated grammatical error correction (GEC) is a natural language processing (NLP) application, which aims to correct grammatical errors in a given source sentence by computational models. Since the data-driven statistical methods began in 1990s and early 2000s, the GEC com- munity has worked on establishing a common framework for its evaluation (i.e., dataset and metric for benchmarking) in order to compare GEC models’ performance quantitatively. A series of shared tasks since early 2010s is a good example of this. In the first half of this thesis, I propose character-level and token-level error correction algorithms. For the character-level error correction, I introduce a semi-character recurrent neural network, which is motivated by a finding in psycholinguistics, called the Cmabrigde Uinervtisy (Cambridge University) effect or typoglycemia. For word-level error correc- tion, I propose an error-repair dependency parsing algorithm for ungrammatical texts. The algorithm can parse sentences and correct grammatical errors simultaneously. However, it is important to note that grammatical errors are not usually limited to mor- phological or syntactic errors. For example, collocational errors such as *quick/fast food and *fast/quick meal are not fully explained by only syntactic rules. This is another im- portant property of natural language, called fluency (or acceptability). Fluency is a level of mastery that goes beyond knowledge of how to follow the rules, and includes know- ing when they can be broken or flouted. In fact, the GEC community has also extended the scope of error types from closed class errors (e.g., noun numbers, verb forms) to the fluency-oriented errors. The second half of this thesis investigates GEC while considering fluency as well as grammaticality. When it comes to “whole-sentence” correction, by extending the scope of errors considering fluency as well as grammaticality, the GEC community has overlooked the reliability and validity of the task scheme (i.e., evaluation metric and dataset for bench- marking). Thus, I reassess the goals of GEC as a “whole-sentence” rewriting task while considering fluency. Following the fluency-oriented GEC framework, I introduce a new benchmark corpus that is more diverse in various aspects such as proficiency, topics, and learners’ native languages. Based on the fluency-oriented metric and dataset, I propose a new “whole-sentence” error correction model with neural reinforcement learning. Unlike conventional maximum likelihood estimation (MLE), the model directly optimizes toward an objective that consid- ers a sentence-level, task-specific evaluation metric. I demonstrate that the proposed model outperforms MLE in human and automated evaluation metrics. Finally, I conclude the thesis and outline ideas and suggestions for future GEC research
    corecore