18 research outputs found

    Correcting Preposition Errors in Learner English Using Error Case Frames and Feedback Messages

    Get PDF
    Abstract This paper presents a novel framework called error case frames for correcting preposition errors. They are case frames specially designed for describing and correcting preposition errors. Their most distinct advantage is that they can correct errors with feedback messages explaining why the preposition is erroneous. This paper proposes a method for automatically generating them by comparing learner and native corpora. Experiments show (i) automatically generated error case frames achieve a performance comparable to conventional methods; (ii) error case frames are intuitively interpretable and manually modifiable to improve them; (iii) feedback messages provided by error case frames are effective in language learning assistance. Considering these advantages and the fact that it has been difficult to provide feedback messages by automatically generated rules, error case frames will likely be one of the major approaches for preposition error correction

    Grammatical error correction using hybrid systems and type filtering

    Get PDF
    This paper describes our submission to the CoNLL 2014 shared task on grammatical error correction using a hybrid approach, which includes both a rule-based and an SMT system augmented by a large webbased language model. Furthermore, we demonstrate that correction type estimation can be used to remove unnecessary corrections, improving precision without harming recall. Our best hybrid system achieves state of-the-art results, ranking first on the original test set and second on the test set with alternative annotations.[We would like to thank] Cambridge English Language Assessment, a division of Cambridge Assessment, for supporting this research

    Grammatical Error Correction: A Survey of the State of the Art

    Full text link
    Grammatical Error Correction (GEC) is the task of automatically detecting and correcting errors in text. The task not only includes the correction of grammatical errors, such as missing prepositions and mismatched subject-verb agreement, but also orthographic and semantic errors, such as misspellings and word choice errors respectively. The field has seen significant progress in the last decade, motivated in part by a series of five shared tasks, which drove the development of rule-based methods, statistical classifiers, statistical machine translation, and finally neural machine translation systems which represent the current dominant state of the art. In this survey paper, we condense the field into a single article and first outline some of the linguistic challenges of the task, introduce the most popular datasets that are available to researchers (for both English and other languages), and summarise the various methods and techniques that have been developed with a particular focus on artificial error generation. We next describe the many different approaches to evaluation as well as concerns surrounding metric reliability, especially in relation to subjective human judgements, before concluding with an overview of recent progress and suggestions for future work and remaining challenges. We hope that this survey will serve as comprehensive resource for researchers who are new to the field or who want to be kept apprised of recent developments

    Robust Text Correction for Grammar and Fluency

    Get PDF
    Grammar is one of the most important properties of natural language. It is a set of structural (i.e., syntactic and morphological) rules that are shared among native speakers in order to engage smooth communication. Automated grammatical error correction (GEC) is a natural language processing (NLP) application, which aims to correct grammatical errors in a given source sentence by computational models. Since the data-driven statistical methods began in 1990s and early 2000s, the GEC com- munity has worked on establishing a common framework for its evaluation (i.e., dataset and metric for benchmarking) in order to compare GEC models’ performance quantitatively. A series of shared tasks since early 2010s is a good example of this. In the first half of this thesis, I propose character-level and token-level error correction algorithms. For the character-level error correction, I introduce a semi-character recurrent neural network, which is motivated by a finding in psycholinguistics, called the Cmabrigde Uinervtisy (Cambridge University) effect or typoglycemia. For word-level error correc- tion, I propose an error-repair dependency parsing algorithm for ungrammatical texts. The algorithm can parse sentences and correct grammatical errors simultaneously. However, it is important to note that grammatical errors are not usually limited to mor- phological or syntactic errors. For example, collocational errors such as *quick/fast food and *fast/quick meal are not fully explained by only syntactic rules. This is another im- portant property of natural language, called fluency (or acceptability). Fluency is a level of mastery that goes beyond knowledge of how to follow the rules, and includes know- ing when they can be broken or flouted. In fact, the GEC community has also extended the scope of error types from closed class errors (e.g., noun numbers, verb forms) to the fluency-oriented errors. The second half of this thesis investigates GEC while considering fluency as well as grammaticality. When it comes to “whole-sentence” correction, by extending the scope of errors considering fluency as well as grammaticality, the GEC community has overlooked the reliability and validity of the task scheme (i.e., evaluation metric and dataset for bench- marking). Thus, I reassess the goals of GEC as a “whole-sentence” rewriting task while considering fluency. Following the fluency-oriented GEC framework, I introduce a new benchmark corpus that is more diverse in various aspects such as proficiency, topics, and learners’ native languages. Based on the fluency-oriented metric and dataset, I propose a new “whole-sentence” error correction model with neural reinforcement learning. Unlike conventional maximum likelihood estimation (MLE), the model directly optimizes toward an objective that consid- ers a sentence-level, task-specific evaluation metric. I demonstrate that the proposed model outperforms MLE in human and automated evaluation metrics. Finally, I conclude the thesis and outline ideas and suggestions for future GEC research
    corecore