10 research outputs found

    Spell-checking in Spanish: the case of diacritic accents

    Get PDF
    This article presents the problem of diacritic restoration (or diacritization) in the context of spell-checking, with the focus on an orthographically rich language such as Spanish. We argue that despite the large volume of work published on the topic of diacritization, currently available spell-checking tools have still not found a proper solution to the problem in those cases where both forms of a word are listed in the checker’s dictionary. This is the case, for instance, when a word form exists with and without diacritics, such as continuo ‘continuous’ and continuó ‘he/she/it continued’, or when different diacritics make other word distinctions, as in continúo ‘I continue’. We propose a very simple solution based on a word bigram model derived from correctly typed Spanish texts and evaluate the ability of this model to restore diacritics in artificial as well as real errors. The case of diacritics is only meant to be an example of the possible applications for this idea, yet we believe that the same method could be applied to other kinds of orthographic or even grammatical errors. Moreover, given that no explicit linguistic knowledge is required, the proposed model can be used with other languages provided that a large normative corpus is available.Peer ReviewedPostprint (author’s final draft

    Syntax-Driven Machine Translation as a Model of ESL Revision

    Get PDF
    Abstract In this work, we model the writing revision process of English as a Second Language (ESL) students with syntax-driven machine translation methods. We compare two approaches: tree-to-string transformation

    A Web-based English Proofing System for English as a Second Language Users

    No full text
    We describe an algorithm that relies on web frequency counts to identify and correct writing errors made by non-native writers of English. Evaluation of the system on a realworld ESL corpus showed very promising performance on the very difficult problem of critiquing English determiner use: 62 % precision and 41 % recall, with a false flag rate of only 2 % (compared to a random-guessing baseline of 5 % precision, 7 % recall, and more than 80 % false flag rate). Performance on collocation errors was less good, suggesting that a web-based approach should be combined with local linguistic resources to achieve both effectiveness and efficiency.

    Automatic correction of grammatical errors in non-native English text

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Cataloged from PDF version of thesis.Includes bibliographical references (p. 99-107).Learning a foreign language requires much practice outside of the classroom. Computer-assisted language learning systems can help fill this need, and one desirable capability of such systems is the automatic correction of grammatical errors in texts written by non-native speakers. This dissertation concerns the correction of non-native grammatical errors in English text, and the closely related task of generating test items for language learning, using a combination of statistical and linguistic methods. We show that syntactic analysis enables extraction of more salient features. We address issues concerning robustness in feature extraction from non-native texts; and also design a framework for simultaneous correction of multiple error types. Our proposed methods are applied on some of the most common usage errors, including prepositions, verb forms, and articles. The methods are evaluated on sentences with synthetic and real errors, and in both restricted and open domains. A secondary theme of this dissertation is that of user customization. We perform a detailed analysis on a non-native corpus, illustrating the utility of an error model based on the mother tongue. We study the benefits of adjusting the correction models based on the quality of the input text; and also present novel methods to generate high-quality multiple-choice items that are tailored to the interests of the user.by John Sie Yuen Lee.Ph.D

    Utilisation de méthodes linguistiques pour la détection et la correction automatisées d'erreurs produites par des francophones écrivant en anglais

    Get PDF
    The starting point of this research is the observation that French speakers writing in English in personal or professional contexts still encounter grammatical difficulties, even at intermediate to advanced levels. The first tools they can reach for to correct those errors, automatic grammar checkers, do not offer corrections for a large number of the errors produced by French-speaking users of English, especially because those tools are rarely designed for L2 users. We propose to identify the difficulties encountered by these speakers through the detection of errors in a representative corpus, and to create a linguistic model of errors and corrections. The model is the result of the thorough linguistic analysis of the phenomena at stake, based on grammatical information available in reference grammars, corpus studies, and the analysis of erroneous segments. The validity of the use of linguistic methods is established through the implementation of detection and correction rules in a functional platform, followed by the evaluation of the results of the application of those rules on L1 and L2 English corpora.Le point de départ de cette recherche est le constat des difficultés persistantes rencontrées par les francophones de niveau intermédiaire à avancé lors de la production de textes en anglais, dans des contextes personnels ou professionnels. Les premiers outils utilisés pour remédier à ces erreurs, les correcteurs grammaticaux automatiques, ne prennent pas en compte de nombreuses erreurs produites par les francophones utilisant l'anglais, notamment car ces correcteurs sont rarement adaptés à un public ayant l'anglais comme L2. Nous proposons d'identifier précisément les difficultés rencontrées par ce public cible à partir du relevé des erreurs dans un corpus adapté, et d'élaborer une modélisation linguistique des erreurs et des corrections à apporter. Cette modélisation est fondée sur une analyse linguistique approfondie des phénomènes concernés, à partir d'indications grammaticales, d'études de corpus, et de l'analyse des segments erronés. La validité de l'utilisation de méthodes linguistiques est établie par l'implémentation informatique des règles de détection et de correction, suivie de l'évaluation des résultats de l'application de ces règles sur des corpus d'anglais L1 et L2
    corecore