81 research outputs found

    The adaptation of an English spellchecker for Japanese writers

    Get PDF
    It has been pointed out that the spelling errors made by second-language writers writing in English have features that are to some extent characteristic of their first language, and the suggestion has been made that a spellchecker could be adapted to take account of these features. In the work reported here, a corpus of spelling errors made by Japanese writers writing in English was compared with a corpus of errors made by native speakers. While the great majority of errors were common to the two corpora, some distinctively Japanese error patterns were evident against this common background, notably a difficulty in deciding between the letters b and v, and the letters l and r, and a tendency to add syllables. A spellchecker that had been developed for native speakers of English was adapted to cope with these errors. A brief account is given of the spellchecker’s mode of operation to indicate how it lent itself to modifications of this kind. The native-speaker spellchecker and the Japanese-adapted version were run over the error corpora and the results show that these adaptations produced a modest but worthwhile improvement to the spellchecker’s performance in correcting Japanese-made errors

    Corpus Linguistics based error analysis of first year Universiti Teknologi Malaysia students’ writing

    Get PDF
    The ability to write in English among Malaysian university students is generally not at the most satisfactory level although English is considered as a second language. There has been a growing research interest in the analysis of errors students make in their English writing. The purpose of this study is to identify the errors made by first year UTM students in their writing. Besides that, this study also seeks to find out how much students know about the errors that they produce in writing besides investigating how they react towards these errors. For this study, 66 questionnaires were distributed to first year UTM students from the Faculty of Mechanical Engineering and the Faculty of Civil Engineering. Besides that, students’ samples of paragraph were also used to collect the intended data. Findings from the study show that from the 66 paragraph samples analyzed, a total of 1202 errors were found and then tagged according to the types of error. Besides that, findings from the questionnaire show that many of the students are not sure about their English proficiency level and most of them agreed that they would like to improve their English writing by addressing the errors that they make. The paper concludes with the overall summary of the study, limitations of the study as well as the pedagogical implications of the study

    Fifty years of spellchecking

    Get PDF
    A short history of spellchecking from the late 1950s to the present day, describing its development through dictionary lookup, affix stripping, correction, confusion sets, and edit distance to the use of gigantic databases

    You can’t suggest that?! : Comparisons and improvements of speller error models

    Get PDF
    In this article, we study correction of spelling errors, specifically on how the spelling errors are made and how can we model them computationally in order to fix them.The article describes two different approaches to generating spelling correction suggestions for three Uralic languages: Estonian, North Sámi and South Sámi.The first approach of modelling spelling errors is rule-based, where experts write rules that describe the kind of errors are made, and these are compiled into finite-state automaton that models the errors.The second is data-based, where we show a machine learning algorithm a corpus of errors that humans have made, and it creates a neural network that can model the errors.Both approaches require collection of error corpora and understanding its contents; therefore we also describe the actual errors we have seen in detail.We find that while both approaches create error correction systems, with current resources the expert-build systems are still more reliable

    You can’t suggest that?! Comparisons and improvements of speller error models

    Get PDF
    In this article, we study correction of spelling errors, specifically on how the spelling errors are made and how can we model them computationally in order to fix them. The article describes two different approaches to generating spelling correction suggestions for three Uralic languages: Estonian, North Sámi and South Sámi. The first approach of modelling spelling errors is rule-based, where experts write rules that describe the kind of errors are made, and these are compiled into finite-state automaton that models the errors. The second is data-based, where we show a machine learning algorithm a corpus of errors that humans have made, and it creates a neural network that can model the errors. Both approaches require collection of error corpora and understanding its contents; therefore we also describe the actual errors we have seen in detail. We find that while both approaches create error correction systems, with current resources the expert-build systems are still more reliable

    The relationship between phonological and morphological deficits in Broca's aphasia: further evidence from errors in verb inflection

    Get PDF
    A previous study of 10 patients with Broca’s aphasia demonstrated that the advantage for producing the past tense of irregular over regular verbs exhibited by these patients was eliminated when the two sets of past-tense forms were matched for phonological complexity (Bird, Lambon Ralph, Seidenberg, McClelland, & Patterson, 2003). The interpretation given was that a generalised phonological impairment was central to the patients’ language deficits, including their poor performance on regular past tense verbs. The current paper provides further evidence in favour of this hypothesis, on the basis of a detailed analysis of the errors produced by these same 10 patients in reading, repetition, and sentence completion for a large number of regular, irregular, and nonce verbs. The patients’ predominant error types in all tasks and for all verb types were close and distant phonologically related responses. The balance between close and distant errors varied along three continua: the severity of the patient (more distant errors produced by the more severely impaired patients); the difficulty of the task (more distant errors in sentence completion > reading > repetition); the difficulty of the item (more distant errors for novel word forms than real verbs). A position analysis for these phonologically related errors revealed that vowels were most likely to be preserved and that consonant onsets and offsets were equally likely to be incorrect. Critically, the patients’ errors exhibited a strong tendency to simplify the phonological form of the target. These results are consistent with the notion that the patients’ relatively greater difficulty with regular past tenses reflects a phonological impairment that is sensitive to the complexity of spoken forms

    A large list of confusion sets for spellchecking assessed against a corpus of real-word errors

    Get PDF
    One of the methods that has been proposed for dealing with real-word errors (errors that occur when a correctly spelled word is substituted for the one intended) is the "confusion-set" approach - a confusion set being a small group of words that are likely to be confused with one another. Using a list of confusion sets drawn up in advance, a spellchecker, on finding one of these words in a text, can assess whether one of the other members of its set would be a better fit and, if it appears to be so, propose that word as a correction. Much of the research using this approach has suffered from two weaknesses. The first is the small number of confusion sets used. The second is that systems have largely been tested on artificial errors. In this paper we address these two weaknesses. We describe the creation of a realistically sized list of confusion sets, then the assembling of a corpus of real-word errors, and then we assess the potential of that list in relation to that corpus

    Detection is the central problem in real-word spelling correction

    Full text link
    Real-word spelling correction differs from non-word spelling correction in its aims and its challenges. Here we show that the central problem in real-word spelling correction is detection. Methods from non-word spelling correction, which focus instead on selection among candidate corrections, do not address detection adequately, because detection is either assumed in advance or heavily constrained. As we demonstrate in this paper, merely discriminating between the intended word and a random close variation of it within the context of a sentence is a task that can be performed with high accuracy using straightforward models. Trigram models are sufficient in almost all cases. The difficulty comes when every word in the sentence is a potential error, with a large set of possible candidate corrections. Despite their strengths, trigram models cannot reliably find true errors without introducing many more, at least not when used in the obvious sequential way without added structure. The detection task exposes weakness not visible in the selection task

    Searching by approximate personal-name matching

    Get PDF
    We discuss the design, building and evaluation of a method to access theinformation of a person, using his name as a search key, even if it has deformations. We present a similarity function, the DEA function, based on the probabilities of the edit operations accordingly to the involved letters and their position, and using a variable threshold. The efficacy of DEA is quantitatively evaluated, without human relevance judgments, very superior to the efficacy of known methods. A very efficient approximate search technique for the DEA function is also presented based on a compacted trie-tree structure.Postprint (published version
    • …
    corecore