Location of Repository

A large list of confusion sets for spellchecking assessed against a corpus of real-word errors

By Jennifer Pedler and Roger Mitton


One of the methods that has been proposed for dealing with real-word errors (errors that occur when a correctly spelled word is substituted for the one intended) is the "confusion-set" approach - a confusion set being a small group of words that are likely to be confused with one another. Using a list of confusion sets drawn up in advance, a spellchecker, on finding one of these words in a text, can assess whether one of the other members of its set would be a better fit and, if it appears to be so, propose that word as a correction. Much of the research using this approach has suffered from two weaknesses. The first is the small number of confusion sets used. The second is that systems have largely been tested on artificial errors. In this paper we address these two weaknesses. We describe the creation of a realistically sized list of confusion sets, then the assembling of a corpus of real-word errors, and then we assess the potential of that list in relation to that corpus

Topics: csis
Year: 2010
OAI identifier: oai:eprints.bbk.ac.uk.oai2:3173

Suggested articles



  1. (1995). A Bayesian Hybrid Method for Context-sensitive Spelling Correction.
  2. (1983). A spelling aid for dyslexic writers.
  3. (1999). A Winnow based approach to context-sensitive spelling correction.
  4. (1966). Binary Codes Capable of Correcting Deletions, Insertions and Reversals.
  5. (1996). Combining Trigrambased and Feature-based Methods for Context-sensitive Spelling Correction. doi
  6. (2001). Computer Spellcheckers and Dyslexics - a Performance Study. doi
  7. (1991). Context Based Spelling Correction. doi
  8. (1997). Contextual Spelling Correction using Latent Semantic Analysis. doi
  9. (1987). Dealing with Ill-formed English Text. In
  10. (1996). English Spelling and the Computer, doi
  11. (2008). I saw TREE trees in the park: How to correct real-word spelling mistakes’.
  12. (2001). Scaling Up Context Sensitive Text Correction.
  13. (1980). Slips of the Pen. In
  14. (1987). Spelling Checkers, Spelling Correctors and the Misspellings of Poor Spellers. doi
  15. (2007). The Computer Correction of Real-Word Spelling Errors in Dyslexic Text. PhD Thesis.
  16. (1974). The String to String Correction Problem. doi

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.