6,065 research outputs found
Context-sensitive Spelling Correction Using Google Web 1T 5-Gram Information
In computing, spell checking is the process of detecting and sometimes
providing spelling suggestions for incorrectly spelled words in a text.
Basically, a spell checker is a computer program that uses a dictionary of
words to perform spell checking. The bigger the dictionary is, the higher is
the error detection rate. The fact that spell checkers are based on regular
dictionaries, they suffer from data sparseness problem as they cannot capture
large vocabulary of words including proper names, domain-specific terms,
technical jargons, special acronyms, and terminologies. As a result, they
exhibit low error detection rate and often fail to catch major errors in the
text. This paper proposes a new context-sensitive spelling correction method
for detecting and correcting non-word and real-word errors in digital text
documents. The approach hinges around data statistics from Google Web 1T 5-gram
data set which consists of a big volume of n-gram word sequences, extracted
from the World Wide Web. Fundamentally, the proposed method comprises an error
detector that detects misspellings, a candidate spellings generator based on a
character 2-gram model that generates correction suggestions, and an error
corrector that performs contextual error correction. Experiments conducted on a
set of text documents from different domains and containing misspellings,
showed an outstanding spelling error correction rate and a drastic reduction of
both non-word and real-word errors. In a further study, the proposed algorithm
is to be parallelized so as to lower the computational cost of the error
detection and correction processes.Comment: LACSC - Lebanese Association for Computational Sciences -
http://www.lacsc.or
Learning to Resolve Natural Language Ambiguities: A Unified Approach
We analyze a few of the commonly used statistics based and machine learning
algorithms for natural language disambiguation tasks and observe that they can
be re-cast as learning linear separators in the feature space. Each of the
methods makes a priori assumptions, which it employs, given the data, when
searching for its hypothesis. Nevertheless, as we show, it searches a space
that is as rich as the space of all linear separators. We use this to build an
argument for a data driven approach which merely searches for a good linear
separator in the feature space, without further assumptions on the domain or a
specific problem.
We present such an approach - a sparse network of linear separators,
utilizing the Winnow learning algorithm - and show how to use it in a variety
of ambiguity resolution problems. The learning approach presented is
attribute-efficient and, therefore, appropriate for domains having very large
number of attributes.
In particular, we present an extensive experimental comparison of our
approach with other methods on several well studied lexical disambiguation
tasks such as context-sensitive spelling correction, prepositional phrase
attachment and part of speech tagging. In all cases we show that our approach
either outperforms other methods tried for these tasks or performs comparably
to the best
A comparative evaluation of deep and shallow approaches to the automatic detection of common grammatical errors
This paper compares a deep and a shallow processing approach to the problem of classifying a sentence as grammatically wellformed or ill-formed. The deep processing
approach uses the XLE LFG parser and English grammar: two versions are presented, one which uses the XLE directly to perform the classification, and another one which uses a decision tree trained on features consisting of the XLE’s output statistics. The shallow processing approach predicts grammaticality based on n-gram frequency statistics:
we present two versions, one which uses frequency thresholds and one which uses a decision tree trained on the frequencies of the rarest n-grams in the input sentence.
We find that the use of a decision tree improves on the basic approach only for the deep parser-based approach. We also show that combining both the shallow and deep
decision tree features is effective. Our evaluation
is carried out using a large test set of grammatical and ungrammatical sentences. The ungrammatical test set is generated automatically by inserting grammatical errors
into well-formed BNC sentences
Reduced structural connectivity between left auditory thalamus and the motion-sensitive planum temporale in developmental dyslexia
Developmental dyslexia is characterized by the inability to acquire typical
reading and writing skills. Dyslexia has been frequently linked to cerebral
cortex alterations; however recent evidence also points towards sensory
thalamus dysfunctions: dyslexics showed reduced responses in the left auditory
thalamus (medial geniculate body, MGB) during speech processing in contrast to
neurotypical readers. In addition, in the visual modality, dyslexics have
reduced structural connectivity between the left visual thalamus (lateral
geniculate nucleus, LGN) and V5/MT, a cerebral cortex region involved in visual
movement processing. Higher LGN-V5/MT connectivity in dyslexics was associated
with the faster rapid naming of letters and numbers (RANln), a measure that is
highly correlated with reading proficiency. We here tested two hypotheses that
were directly derived from these previous findings. First, we tested the
hypothesis that dyslexics have reduced structural connectivity between the left
MGB and the auditory motion-sensitive part of the left planum temporale (mPT).
Second, we hypothesized that the amount of left mPT-MGB connectivity correlates
with dyslexics RANln scores. Using diffusion tensor imaging based probabilistic
tracking we show that male adults with developmental dyslexia have reduced
structural connectivity between the left MGB and the left mPT, confirming the
first hypothesis. Stronger left mPT-MGB connectivity was not associated with
faster RANnl scores in dyslexics, but in neurotypical readers. Our findings
provide first evidence that reduced cortico-thalamic connectivity in the
auditory modality is a feature of developmental dyslexia, and that it may also
impact on reading related cognitive abilities in neurotypical readers
- …