Search CORE

841 research outputs found

Detection is the central problem in real-word spelling correction

Author: Wilcox-O'Hearn L. Amber
Publication venue
Publication date: 15/08/2014
Field of study

Real-word spelling correction differs from non-word spelling correction in its aims and its challenges. Here we show that the central problem in real-word spelling correction is detection. Methods from non-word spelling correction, which focus instead on selection among candidate corrections, do not address detection adequately, because detection is either assumed in advance or heavily constrained. As we demonstrate in this paper, merely discriminating between the intended word and a random close variation of it within the context of a sentence is a task that can be performed with high accuracy using straightforward models. Trigram models are sufficient in almost all cases. The difficulty comes when every word in the sentence is a potential error, with a large set of possible candidate corrections. Despite their strengths, trigram models cannot reliably find true errors without introducing many more, at least not when used in the obvious sequential way without added structure. The detection task exposes weakness not visible in the selection task

arXiv.org e-Print Archive

CiteSeerX

Arabic Spelling Correction using Supervised Learning

Author: Aly Mohamed
Atiya Amir
Hassan Youssef
Publication venue
Publication date: 01/01/2014
Field of study

In this work, we address the problem of spelling correction in the Arabic language utilizing the new corpus provided by QALB (Qatar Arabic Language Bank) project which is an annotated corpus of sentences with errors and their corrections. The corpus contains edit, add before, split, merge, add after, move and other error types. We are concerned with the first four error types as they contribute more than 90% of the spelling errors in the corpus. The proposed system has many models to address each error type on its own and then integrating all the models to provide an efficient and robust system that achieves an overall recall of 0.59, precision of 0.58 and F1 score of 0.58 including all the error types on the development set. Our system participated in the QALB 2014 shared task "Automatic Arabic Error Correction" and achieved an F1 score of 0.6, earning the sixth place out of nine participants.Comment: System description paper that is submitted in the EMNLP 2014 conference shared task "Automatic Arabic Error Correction" (Mohit et al., 2014) in the Arabic NLP workshop. 6 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

A comparison of standard spell checking algorithms and a novel binary neural approach

Author: Austin J.
Hodge V.J.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2003
Field of study

In this paper, we propose a simple, flexible, and efficient hybrid spell checking methodology based upon phonetic matching, supervised learning, and associative matching in the AURA neural system. We integrate Hamming Distance and n-gram algorithms that have high recall for typing errors and a phonetic spell-checking algorithm in a single novel architecture. Our approach is suitable for any spell checking application though aimed toward isolated word error correction, particularly spell checking user queries in a search engine. We use a novel scoring scheme to integrate the retrieved words from each spelling approach and calculate an overall score for each matched word. From the overall scores, we can rank the possible matches. In this paper, we evaluate our approach against several benchmark spellchecking algorithms for recall accuracy. Our proposed hybrid methodology has the highest recall rate of the techniques evaluated. The method has a high recall rate and low-computational cost

White Rose Research Online

Holaaa!! Writin like u talk is kewl but kinda hard 4 NLP

Author: Domingo Judit
Marquina Montse
Melero Maite
Quixal Martí
Ruiz Costa-Jussà Marta
Publication venue
Publication date: 01/01/2012
Field of study

We present work in progress aiming to build tools for the normalization of User-Generated Content (UGC). As we will see, the task requires the revisiting of the initial steps of NLP processing, since UGC (micro-blog, blog, and, generally, Web 2.0 user texts) presents a number of non-standard communicative and linguistic characteristics, and is in fact much closer to oral and colloquial language than to edited text. We present and characterize a corpus of UGC text in Spanish from three different sources: Twitter, consumer reviews and blogs. We motivate the need for UGC text normalization by analyzing the problems found when processing this type of text through a conventional language processing pipeline, particularly in the tasks of lemmatization and morphosyntactic tagging, and finally we propose a strategy for automatically normalizing UGC using a selector of correct forms on top of a pre-existing spell-checker.Postprint (published version

UPCommons. Portal del coneixement obert de la UPC

A post processing system for global correction of Ocr generated errors

Author: Bullard Bryan Edward
Publication venue: Digital Scholarship@UNLV
Publication date: 01/01/1992
Field of study

This thesis discusses the design and implementation of an OCR post processing system. The system is used to perform automatic spelling detection and correction on noisy, OCR generated text. Unlike previous post processing systems, this system works in conjunction with an inverted file database system. The initial results obtained from post processing 10,000 pages of OCR\u27ed text are encouraging. These results indicate that the use of global and local document information extracted from the inverted file system can be effectively used to correct OCR generated spelling errors

University of Nevada, Las Vegas Repository

Fast and Accurate Spelling Correction Using Trie and Damerau-levenshtein Distance Bigram

Author: M. Viny Christanti
Naga Dali S.
Rudy Rudy
Publication venue: 'Universitas Ahmad Dahlan'
Publication date: 01/04/2018
Field of study

This research was intended to create a fast and accurate spelling correction system with the ability to handle both kind of spelling errors, non-word and real word errors. Existing spelling correction system was analyzed and was then applied some modifications to improve its accuracy and speed. The proposed spelling correction system is then built based on the method and intuition used by existing system along with the modifications made in previous step. The result is a various spelling correction system using different methods. Best result is achieved by the system that uses bigram with Trie and Damerau-Levenshtein distance with the word level accuracy of 84.62% and an average processing speed of 18.89 ms per sentence

Journal of Education and Learning (EduLearn)

TELKOMNIKA (Telecommunication Computing Electronics and Control)

UAD Journal Management System

Ordering the suggestions of a spellchecker without using context.

Author: Mitton Roger
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2009
Field of study

Having located a misspelling, a spellchecker generally offers some suggestions for the intended word. Even without using context, a spellchecker can draw on various types of information in ordering its suggestions. A series of experiments is described, beginning with a basic corrector that implements a well-known algorithm for reversing single simple errors, and making successive enhancements to take account of substring matches, pronunciation, known error patterns, syllable structure and word frequency. The improvement in the ordering produced by each enhancement is measured on a large corpus of misspellings. The final version is tested on other corpora against a widely used commercial spellchecker and a research prototype

Birkbeck Institutional Research Online