Search CORE

276 research outputs found

Improved German Spelling Acquisition through Learning Analytics

Author: Ebner M.
Ebner M.
Edtstadler K.
Publication venue
Publication date: 01/01/2015
Field of study

You can’t suggest that?! Comparisons and improvements of speller error models

Author: Kaalep Heiki-Jaan
Moshagen Sjur Nørstebø
Pirinen Flammie
Publication venue: 'UiT The Arctic University of Norway'
Publication date: 30/08/2022
Field of study

In this article, we study correction of spelling errors, specifically on how the spelling errors are made and how can we model them computationally in order to fix them. The article describes two different approaches to generating spelling correction suggestions for three Uralic languages: Estonian, North Sámi and South Sámi. The first approach of modelling spelling errors is rule-based, where experts write rules that describe the kind of errors are made, and these are compiled into finite-state automaton that models the errors. The second is data-based, where we show a machine learning algorithm a corpus of errors that humans have made, and it creates a neural network that can model the errors. Both approaches require collection of error corpora and understanding its contents; therefore we also describe the actual errors we have seen in detail. We find that while both approaches create error correction systems, with current resources the expert-build systems are still more reliable

Munin - Open Research Archive

You can’t suggest that?! : Comparisons and improvements of speller error models

Author: Kaalep Heiki-Jaan
Moshagen Sjur
Pirinen Flammie
Publication venue: 'UiT The Arctic University of Norway'
Publication date: 30/08/2022
Field of study

In this article, we study correction of spelling errors, specifically on how the spelling errors are made and how can we model them computationally in order to fix them.The article describes two different approaches to generating spelling correction suggestions for three Uralic languages: Estonian, North Sámi and South Sámi.The first approach of modelling spelling errors is rule-based, where experts write rules that describe the kind of errors are made, and these are compiled into finite-state automaton that models the errors.The second is data-based, where we show a machine learning algorithm a corpus of errors that humans have made, and it creates a neural network that can model the errors.Both approaches require collection of error corpora and understanding its contents; therefore we also describe the actual errors we have seen in detail.We find that while both approaches create error correction systems, with current resources the expert-build systems are still more reliable

Septentrio Academic Publishing

Munin - Open Research Archive

Ordering the suggestions of a spellchecker without using context.

Author: Mitton Roger
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2009
Field of study

Having located a misspelling, a spellchecker generally offers some suggestions for the intended word. Even without using context, a spellchecker can draw on various types of information in ordering its suggestions. A series of experiments is described, beginning with a basic corrector that implements a well-known algorithm for reversing single simple errors, and making successive enhancements to take account of substring matches, pronunciation, known error patterns, syllable structure and word frequency. The improvement in the ordering produced by each enhancement is measured on a large corpus of misspellings. The final version is tested on other corpora against a widely used commercial spellchecker and a research prototype

Birkbeck Institutional Research Online

Studying the Effect and Treatment of Misspelled Queries in Cross-Language Information Retrieval

Author: Aisopos
Bendersky
CLEF Initiative
Darwish
Di Nunzio
Evert
Graña
Graña
Guo
Jansen
Jesús Vilares
Kim
Koehn
Kukich
Leveling
Levenshtein
Lui
Manning
Manning
Manuel Vilares
McNamee
McNamee
Miguel A. Alonso
Nie
Och
Otero
Ounis
Pennell
Peters
Robertson
Savary
Vilares
Vilares
Vilares
Vilares
Vilares
Vilares
Wu
Yerai Doval
Publication venue
Publication date: 01/01/2016
Field of study

[Abstract] The performance of Information Retrieval systems is limited by the linguistic variation present in natural language texts. Word-level Natural Language Processing techniques have been shown to be useful in reducing this variation. In this article, we summarize our work on the extension of these techniques for dealing with phrase-level variation in European languages, taking Spanish as a case in point. We propose the use of syntactic dependencies as complex index terms in an attempt to solve the problems deriving from both syntactic and morpho-syntactic variation and, in this way, to obtain more precise index terms. Such dependencies are obtained through a shallow parser based on cascades of finite-state transducers in order to reduce as far as possible the overhead due to this parsing process. The use of different sources of syntactic information, queries or documents, has been also studied, as has the restriction of the dependencies applied to those obtained from noun phrases. Our approaches have been tested using the CLEF corpus, obtaining consistent improvements with regard to classical word-level non-linguistic techniques. Results show, on the one hand, that syntactic information extracted from documents is more useful than that from queries. On the other hand, it has been demonstrated that by restricting dependencies to those corresponding to noun phrases, important reductions of storage and management costs can be achieved, albeit at the expense of a slight reduction in performance.Ministerio de Economía y Competitividad; FFI2014-51978-C2-1-RRede Galega de Procesamento da Linguaxe e Recuperación de Información; CN2014/034Ministerio de Economía y Competitividad; BES-2015-073768Ministerio de Economía y Competitividad; FFI2014-51978-C2-2-

Repositorio da Universidade da Coruña

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

OCRspell: An interactive spelling correction system for OCR errors in text

Author: Stofsky Eric
Publication venue: Digital Scholarship@UNLV
Publication date: 01/01/1996
Field of study

In this thesis we describe a spelling correction system designed specifically for OCR (Optical Character Recognition) generated text that selects candidate words through the use of information gathered from multiple knowledge sources. This system for text correction is based on static and dynamic device mappings, approximate string matching, and n-gram analysis. Our statistically based, Bayesian system incorporates a learning feature that collects confusion information at the collection and document levels. An evaluation of the new system is presented as well

University of Nevada, Las Vegas Repository

Matching health information seekers' queries to medical terms

Author: A Gaudinat
A Keselman
A Mykowiecka
A Stanier
AT McCray
C Boyer
C Grouin
C Senger
E Brill
Elise Prieur-Gaston
F Abad Garcia
F Brouard
G Stoilos
J Crowell
JW Wilbur
K Kuckich
L Peters
L Yujian
LF Soualmia
Lina F Soualmia
LJ Peterson
M Douyère
M Kernigham
P Ruch
SJ Grannis
SJ Nelson
SM Meystre
Stéfan J Darmoni
T Koch
T Yarkoni
Thierry Lecroq
VI Levenshtein
VJ Hodge
W Winkler
Zied Moalla
Ö Uzuner
Ö Uzuner
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Spell checkers and correctors : a unified treatment

Author: Liang Hsuan Lorraine
Publication venue: 'University of Pretoria - Department of Philosophy'
Publication date: 25/06/2009
Field of study

The aim of this dissertation is to provide a unified treatment of various spell checkers and correctors. Firstly, the spell checking and correcting problems are formally described in mathematics in order to provide a better understanding of these tasks. An approach that is similar to the way in which denotational semantics used to describe programming languages is adopted. Secondly, the various attributes of existing spell checking and correcting techniques are discussed. Extensive studies on selected spell checking/correcting algorithms and packages are then performed. Lastly, an empirical investigation of various spell checking/correcting packages is presented. It provides a comparison and suggests a classification of these packages in terms of their functionalities, implementation strategies, and performance. The investigation was conducted on packages for spell checking and correcting in English as well as in Northern Sotho and Chinese. The classification provides a unified presentation of the strengths and weaknesses of the techniques studied in the research. The findings provide a better understanding of these techniques in order to assist in improving some existing spell checking/correcting applications and future spell checking/correcting package designs and implementations.Dissertation (MSc)--University of Pretoria, 2009.Computer Scienceunrestricte

UPSpace at the University of Pretoria

Effective Spell Checking Methods Using Clustering Algorithms

Author: Cordeiro De Amorim Renato
Zampieri Marcos
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2013
Field of study

This paper presents a novel approach to spell checking using dictionary clustering. The main goal is to reduce the number of times distances have to be calculated when finding target words for misspellings. The method is unsupervised and combines the application of anomalous pattern initialization and partition around medoids (PAM). To evaluate the method, we used an English misspelling list compiled using real examples extracted from the Birkbeck spelling error corpus.Final Published versio

CiteSeerX

University of Hertfordshire Research Archive