Search CORE

82,928 research outputs found

Applied morphological processing of English

Author: Carroll John
Minnen Guido
Pearce Darren
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2001
Field of study

We describe two newly developed computational tools for morphological processing: a program for analysis of English inflectional morphology, and a morphological generator, automatically derived from the analyser. The tools are fast, being based on finite-state techniques, have wide coverage, incorporating data from various corpora and machine readable dictionaries, and are robust, in that they are able to deal effectively with unknown words. The tools are freely available. We evaluate the accuracy and speed of both tools and discuss a number of practical applications in which they have been put to use

Crossref

UCL Discovery

Sussex Research Online

Error-tolerant Finite State Recognition with Applications to Morphological Analysis and Spelling Correction

Author: Oflazer Kemal
Publication venue
Publication date: 21/07/1995
Field of study

Error-tolerant recognition enables the recognition of strings that deviate mildly from any string in the regular set recognized by the underlying finite state recognizer. Such recognition has applications in error-tolerant morphological processing, spelling correction, and approximate string matching in information retrieval. After a description of the concepts and algorithms involved, we give examples from two applications: In the context of morphological analysis, error-tolerant recognition allows misspelled input word forms to be corrected, and morphologically analyzed concurrently. We present an application of this to error-tolerant analysis of agglutinative morphology of Turkish words. The algorithm can be applied to morphological analysis of any language whose morphology is fully captured by a single (and possibly very large) finite state transducer, regardless of the word formation processes and morphographemic phenomena involved. In the context of spelling correction, error-tolerant recognition can be used to enumerate correct candidate forms from a given misspelled string within a certain edit distance. Again, it can be applied to any language with a word list comprising all inflected forms, or whose morphology is fully described by a finite state transducer. We present experimental results for spelling correction for a number of languages. These results indicate that such recognition works very efficiently for candidate generation in spelling correction for many European languages such as English, Dutch, French, German, Italian (and others) with very large word lists of root and inflected forms (some containing well over 200,000 forms), generating all candidate solutions within 10 to 45 milliseconds (with edit distance 1) on a SparcStation 10/41. For spelling correction in Turkish, error-tolerantComment: Replaces 9504031. gzipped, uuencoded postscript file. To appear in Computational Linguistics Volume 22 No:1, 1996, Also available as ftp://ftp.cs.bilkent.edu.tr/pub/ko/clpaper9512.ps.

arXiv.org e-Print Archive

CiteSeerX

Bilkent University Institutional Repository

Processing of regular and irregular past tense morphology in highly proficient second language learners of English: a self-paced reading study

Author: Marinis Theodoros
Pliatsikas Christos
Publication venue: Cambridge University Press
Publication date: 14/03/2012
Field of study

Dual-system models suggest that English past tense morphology involves two processing routes: rule application for regular verbs and memory retrieval for irregular verbs (Pinker, 1999). In second language (L2) processing research, Ullman (2001a) suggested that both verb types are retrieved from memory, but more recently Clahsen and Felser (2006) and Ullman (2004) argued that past tense rule application can be automatised with experience by L2 learners. To address this controversy, we tested highly proficient Greek-English learners with naturalistic or classroom L2 exposure compared to native English speakers in a self-paced reading task involving past tense forms embedded in plausible sentences. Our results suggest that, irrespective to the type of exposure, proficient L2 learners of extended L2 exposure apply rule-based processing

KOPS - The Institutional Repository of the University of Konstanz

Central Archive at the University of Reading

Crossref

Kent Academic Repository

Morphological Analysis as Classification: an Inductive-Learning Approach

Author: Bosch Antal van den
Daelemans Walter
Weijters Ton
Publication venue
Publication date: 01/01/1996
Field of study

Morphological analysis is an important subtask in text-to-speech conversion, hyphenation, and other language engineering tasks. The traditional approach to performing morphological analysis is to combine a morpheme lexicon, sets of (linguistic) rules, and heuristics to find a most probable analysis. In contrast we present an inductive learning approach in which morphological analysis is reformulated as a segmentation task. We report on a number of experiments in which five inductive learning algorithms are applied to three variations of the task of morphological analysis. Results show (i) that the generalisation performance of the algorithms is good, and (ii) that the lazy learning algorithm IB1-IG performs best on all three tasks. We conclude that lazy learning of morphological analysis as a classification task is indeed a viable approach; moreover, it has the strong advantages over the traditional approach of avoiding the knowledge-acquisition bottleneck, being fast and deterministic in learning and processing, and being language-independent.Comment: 11 pages, 5 encapsulated postscript figures, uses non-standard NeMLaP proceedings style nemlap.sty; inputs ipamacs (international phonetic alphabet) and epsf macro

arXiv.org e-Print Archive

CiteSeerX

Institutional Repository Universiteit Antwerpen

Tilburg University Repository

Developmental changes in the role of different metalinguistic awareness skills in Chinese reading acquisition from preschool to third grade

Author: A Castles
A Gottardo
A Uno
AA Roman
AE Cunningham
AE Cunningham
Bao-Guo Chen
C Chaney
C McBride-Chang
C McBride-Chang
C McBride-Chang
C McBride-Chang
CF Hu
CSH Ho
CSH Ho
CSH Ho
CSH Ho
CSH Ho
CSH Ho
CSH Ho
D Lin
GK Georgiou
H Shu
H Shu
H Shu
H Shu
Hong-Yan Bi
HS Huang
HY Bi
J Zhang
JC Ziegler
JC Ziegler
JJ Wang
JL Anthony
JL Metsala
Johan J. Bolhuis
JR Kirby
JR Kirby
KB Cartwright
L Chan
L Ehri
LH Tan
MB Denckla
MB Denckla
NA Badian
NA Badian
SA Brady
SH Deacon
SH Deacon
SR Burgess
T Nunes
Taeko N. Wydell
TN Wydell
Tong-Qi Wei
V Muter
W Nagy
WL Liu
WT Siok
X Tong
X Wu
Xu-Chu Weng
YF Su
Ying Liu
YM Ku
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 08/05/2014
Field of study

Copyright @ 2014 Wei et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.The present study investigated the relationship between Chinese reading skills and metalinguistic awareness skills such as phonological, morphological, and orthographic awareness for 101 Preschool, 94 Grade-1, 98 Grade-2, and 98 Grade-3 children from two primary schools in Mainland China. The aim of the study was to examine how each of these metalinguistic awareness skills would exert their influence on the success of reading in Chinese with age. The results showed that all three metalinguistic awareness skills significantly predicted reading success. It further revealed that orthographic awareness played a dominant role in the early stages of reading acquisition, and its influence decreased with age, while the opposite was true for the contribution of morphological awareness. The results were in stark contrast with studies in English, where phonological awareness is typically shown as the single most potent metalinguistic awareness factor in literacy acquisition. In order to account for the current data, a three-stage model of reading acquisition in Chinese is discussed.National Natural Science Foundation of China and Knowledge Innovation Program of the Chinese Academy of Sciences

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

Institute of Psychology,Chinese Academy Of Sciences

PubMed Central

Institutional Repository of Institute of Psychology, Chinese Academy of Sciences

Brunel University Research Archive

FigShare

Beyond Stemming and Lemmatization: Ultra-stemming to Improve Automatic Text Summarization

Author: Torres-Moreno Juan-Manuel
Publication venue
Publication date: 14/09/2012
Field of study

In Automatic Text Summarization, preprocessing is an important phase to reduce the space of textual representation. Classically, stemming and lemmatization have been widely used for normalizing words. However, even using normalization on large texts, the curse of dimensionality can disturb the performance of summarizers. This paper describes a new method for normalization of words to further reduce the space of representation. We propose to reduce each word to its initial letters, as a form of Ultra-stemming. The results show that Ultra-stemming not only preserve the content of summaries produced by this representation, but often the performances of the systems can be dramatically improved. Summaries on trilingual corpora were evaluated automatically with Fresa. Results confirm an increase in the performance, regardless of summarizer system used.Comment: 22 pages, 12 figures, 9 table

arXiv.org e-Print Archive

CiteSeerX

Building Morphological Chains for Agglutinative Languages

Author: B Can
H Ishwaran
J Goldsmith
J Hankamer
K Narasimhan
Publication venue
Publication date: 23/04/2017
Field of study

In this paper, we build morphological chains for agglutinative languages by using a log-linear model for the morphological segmentation task. The model is based on the unsupervised morphological segmentation system called MorphoChains. We extend MorphoChains log linear model by expanding the candidate space recursively to cover more split points for agglutinative languages such as Turkish, whereas in the original model candidates are generated by considering only binary segmentation of each word. The results show that we improve the state-of-art Turkish scores by 12% having a F-measure of 72% and we improve the English scores by 3% having a F-measure of 74%. Eventually, the system outperforms both MorphoChains and other well-known unsupervised morphological segmentation systems. The results indicate that candidate generation plays an important role in such an unsupervised log-linear model that is learned using contrastive estimation with negative samples.Comment: 10 pages, accepted and presented at the CICLing 2017 (18th International Conference on Intelligent Text Processing and Computational Linguistics

arXiv.org e-Print Archive

Crossref

OpenMETU (Middle East Technical University)