Search CORE

219 research outputs found

Proceedings of the Morpho Challenge 2010 Workshop

Author: Kurimo Mikko
Turunen Ville T.
Virpioja Sami
Publication venue: Aalto-yliopiston teknillinen korkeakoulu
Publication date: 01/01/2010
Field of study

In natural language processing many practical tasks, such as speech recognition, information retrieval and machine translation depend on a large vocabulary and statistical language models. For morphologically rich languages, such as Finnish and Turkish, the construction of a vocabulary and language models that have a sufficient coverage is particularly difficult, because of the huge amount of different word forms. In Morpho Challenge 2010 unsupervised and semi-supervised algorithms are suggested to provide morpheme analyses for words in different languages and evaluated in various practical applications. As a research theme, unsupervised morphological analysis has received wide attention in conferences and scientific journals focused on computational linguistic and its applications. This is the proceedings of the Morpho Challenge 2010 Workshop that contains one introduction article with a description of the tasks, evaluation and results and six articles describing the participating unsupervised and supervised learning algorithms. The Morpho Challenge 2010 Workshop was held at Espoo, Finland in 2-3 September, 2010.reviewe

Aaltodoc Publication Archive

Minimally-Supervised Morphological Segmentation using Adaptor Grammars

Author: Goldwater Sharon
Sirts Kairit
Publication venue
Publication date: 01/01/2013
Field of study

This paper explores the use of Adaptor Grammars, a nonparametric Bayesian modelling framework, for minimally supervised morphological segmentation. We compare three training methods: unsupervised training, semi-supervised training, and a novel model selection method. In the model selection method, we train unsupervised Adaptor Grammars using an over-articulated metagrammar, then use a small labelled data set to select which potential morph boundaries identified by the metagrammar should be returned in the final output. We evaluate on five languages and show that semi-supervised training provides a boost over unsupervised training, while the model selection method yields the best average results over all languages and is competitive with state-of-the-art semi-supervised systems. Moreover, this method provides the potential to tune performance according to different evaluation metrics or downstream tasks.12 page(s

Edinburgh Research Explorer

Macquarie University ResearchOnline

Methods and algorithms for unsupervised learning of morphology

Author: A. Gelbukh
A. Gispert de
B. Can
C. Monson
D. Blackwell
D. Harman
D.R. Morrison
E. Arısoy
E. Minkov
H. Ishwaran
H. Poon
H. Poon
J. Goldsmith
K. Järvelin
K. Kettunen
K. Kirchhoff
K. Sirts
K. Toutanova
L. Aunimo
M. Creutz
M. Kurimo
M.A. Hafer
M.R. Brent
N.A. Smith
P.F. Brown
R. Krovetz
S. Bordag
S. Manandhar
S. Neuvel
Z.S. Harris
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

This is an accepted manuscript of a chapter published by Springer in Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8403 in 2014 available online: https://doi.org/10.1007/978-3-642-54906-9_15 The accepted version of the publication may differ from the final published version.This paper is a survey of methods and algorithms for unsupervised learning of morphology. We provide a description of the methods and algorithms used for morphological segmentation from a computational linguistics point of view. We survey morphological segmentation methods covering methods based on MDL (minimum description length), MLE (maximum likelihood estimation), MAP (maximum a posteriori), parametric and non-parametric Bayesian approaches. A review of the evaluation schemes for unsupervised morphological segmentation is also provided along with a summary of evaluation results on the Morpho Challenge evaluations.Published versio

Crossref

Wolverhampton Intellectual Repository and E-theses

Low-Resource Active Learning of Morphological Segmentation

Author: Grönroos Stig-Arne
Hiovain Katri
Jokinen Päivi Kristiina
Kurimo Mikko
Rauhala Ilona Erika
Smit Peter
Virpioja Sami Petteri
Publication venue
Publication date: 01/01/2016
Field of study

Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Using Statistical Models of Morphology in the Search for Optimal Units of Representation in the Human Mental Lexicon

Author: Alegre
Ambridge
Baayen
Baayen
Baayen
Baayen
Baayen
Balling
Bates
Bertram
Boston
Bozic
Butterworth
Chomsky
Creutz
Creutz
Creutz
Creutz
Diependaele
Ettinger
Fossum
Frank
Frank
Frank
Frank
Frauenfelder
Fruchter
Goldsmith
Goldwater
Goldwater
Grasemann
Hafer
Hale
Harris
Hay
Hirsimäki
Hsu
Hyönä
Järvikivi
Kohonen
Kostić
Kuperman
Kuperman
Kurimo
Lagus
Laine
Laine
Laudanna
Lehtonen
Lehtonen
Lehtonen
Levy
Lidz
Milin
Milin
Moscoso del Prado Martín
Moscoso del Prado Martín
Niemi
Norris
O'Reilly
O'Reilly
Quasthoff
Rastle
Rescorla
Rescorla
Rissanen
Rissanen
Rueckl
Räsänen
Saffran
Schreuder
Schreuder
Smith
Soveri
Taft
Taft
Taft
Tiedemann
Tomasello
Vartiainen
Virpioja
Virpioja
Wu
Yang
Zipf
Zipf
Publication venue
Publication date: 01/04/2018
Field of study

Determining optimal units of representing morphologically complex words in the mental lexicon is a central question in psycholinguistics. Here, we utilize advances in computational sciences to study human morphological processing using statistical models of morphology, particularly the unsupervised Morfessor model that works on the principle of optimization. The aim was to see what kind of model structure corresponds best to human word recognition costs for multimorphemic Finnish nouns: a model incorporating units resembling linguistically defined morphemes, a whole-word model, or a model that seeks for an optimal balance between these two extremes. Our results showed that human word recognition was predicted best by a combination of two models: a model that decomposes words at some morpheme boundaries while keeping others unsegmented and a whole-word model. The results support dual-route models that assume that both decomposed and full-form representations are utilized to optimally process complex words within the mental lexicon.Peer reviewe

Crossref

Aaltodoc Publication Archive

Helsingin yliopiston digitaalinen arkisto

A Task-based Evaluation of French Morphological Resources and Tools

Author: Bernhard Delphine
Cartoni Bruno
Tribout Delphine
Publication venue: Stanford Calif.: CSLI Publications
Publication date: 01/01/2011
Field of study

Morphology is a key component for many Language Technology applications. However, morphological relations, especially those relying on the derivation and compounding processes, are often addressed in a superﬁcial manner. In this article, we focus on assessing the relevance of deep and motivated morphological knowledge in Natural Language Processing applications. We ﬁrst describe an annotation experiment whose goal is to evaluate the role of morphology for one task, namely Question Answering (QA). We then highlight the kind of linguistic knowledge that is necessary for this particular task and propose a qualitative analysis of morphological phenomena in order to identify the morphological processes that are most relevant. Based on this study, we perform an intrinsic evaluation of existing tools and resources for French morphology, in order to quantify their coverage. Our conclusions provide helpful insights for using and building appropriate morphological resources and tools that could have a signiﬁcant impact on the application performance

Hal-Diderot

Morphological analysis for the Maltese language : the challenges of a hybrid system

Author: Borg Claudia
Gatt Albert
Publication venue: 'SAGE Publications'
Publication date: 01/01/2017
Field of study

Maltese is a morphologically rich language with a hybrid morphological system which features both concatenative and non-concatenative processes. This paper analyses the impact of this hybridity on the performance of machine learning techniques for morphological labelling and clustering. In particular, we analyse a dataset of morphologically related word clusters to evaluate the difference in results for concatenative and non-concatenative clusters. We also describe research carried out in morphological labelling, with a particular focus on the verb category. Two evaluations were carried out, one using an unseen dataset, and another one using a gold standard dataset which was manually labelled. The gold standard dataset was split into concatenative and non-concatenative to analyse the difference in results between the two morphological systems.non peer-reviewe

OAR@UM

Morphological Segmentation for Keyword Spotting

Author: Barzilay Regina
Karakos Damianos
Narasimhan Karthik Rajagopal
Schwartz Richard
Tsakalidis Stavros
Publication venue
Publication date: 01/01/2014
Field of study

We explore the impact of morphological segmentation on keyword spotting (KWS). Despite potential benefits, state-of-the-art KWS systems do not use morphological information. In this paper, we augment a state-of-the-art KWS system with sub-word units derived from supervised and unsupervised morphological segmentations, and compare with phonetic and syllabic segmentations. Our experiments demonstrate that morphemes improve overall performance of KWS systems. Syllabic units, however, rival the performance of morphological units when used in KWS. By combining morphological, phonetic and syllabic segmentations, we demonstrate substantial performance gains.United States. Intelligence Advanced Research Projects Activity (United States. Army Research Laboratory Contract W911NF-12-C-0013

CiteSeerX

DSpace@MIT

Crossref