219 research outputs found

    Proceedings of the Morpho Challenge 2010 Workshop

    Get PDF
    In natural language processing many practical tasks, such as speech recognition, information retrieval and machine translation depend on a large vocabulary and statistical language models. For morphologically rich languages, such as Finnish and Turkish, the construction of a vocabulary and language models that have a sufficient coverage is particularly difficult, because of the huge amount of different word forms. In Morpho Challenge 2010 unsupervised and semi-supervised algorithms are suggested to provide morpheme analyses for words in different languages and evaluated in various practical applications. As a research theme, unsupervised morphological analysis has received wide attention in conferences and scientific journals focused on computational linguistic and its applications. This is the proceedings of the Morpho Challenge 2010 Workshop that contains one introduction article with a description of the tasks, evaluation and results and six articles describing the participating unsupervised and supervised learning algorithms. The Morpho Challenge 2010 Workshop was held at Espoo, Finland in 2-3 September, 2010.reviewe

    Minimally-Supervised Morphological Segmentation using Adaptor Grammars

    Get PDF
    This paper explores the use of Adaptor Grammars, a nonparametric Bayesian modelling framework, for minimally supervised morphological segmentation. We compare three training methods: unsupervised training, semi-supervised training, and a novel model selection method. In the model selection method, we train unsupervised Adaptor Grammars using an over-articulated metagrammar, then use a small labelled data set to select which potential morph boundaries identified by the metagrammar should be returned in the final output. We evaluate on five languages and show that semi-supervised training provides a boost over unsupervised training, while the model selection method yields the best average results over all languages and is competitive with state-of-the-art semi-supervised systems. Moreover, this method provides the potential to tune performance according to different evaluation metrics or downstream tasks.12 page(s

    Methods and algorithms for unsupervised learning of morphology

    Get PDF
    This is an accepted manuscript of a chapter published by Springer in Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8403 in 2014 available online: https://doi.org/10.1007/978-3-642-54906-9_15 The accepted version of the publication may differ from the final published version.This paper is a survey of methods and algorithms for unsupervised learning of morphology. We provide a description of the methods and algorithms used for morphological segmentation from a computational linguistics point of view. We survey morphological segmentation methods covering methods based on MDL (minimum description length), MLE (maximum likelihood estimation), MAP (maximum a posteriori), parametric and non-parametric Bayesian approaches. A review of the evaluation schemes for unsupervised morphological segmentation is also provided along with a summary of evaluation results on the Morpho Challenge evaluations.Published versio

    Using Statistical Models of Morphology in the Search for Optimal Units of Representation in the Human Mental Lexicon

    Get PDF
    Determining optimal units of representing morphologically complex words in the mental lexicon is a central question in psycholinguistics. Here, we utilize advances in computational sciences to study human morphological processing using statistical models of morphology, particularly the unsupervised Morfessor model that works on the principle of optimization. The aim was to see what kind of model structure corresponds best to human word recognition costs for multimorphemic Finnish nouns: a model incorporating units resembling linguistically defined morphemes, a whole-word model, or a model that seeks for an optimal balance between these two extremes. Our results showed that human word recognition was predicted best by a combination of two models: a model that decomposes words at some morpheme boundaries while keeping others unsegmented and a whole-word model. The results support dual-route models that assume that both decomposed and full-form representations are utilized to optimally process complex words within the mental lexicon.Peer reviewe

    A Task-based Evaluation of French Morphological Resources and Tools

    Get PDF
    Morphology is a key component for many Language Technology applications. However, morphological relations, especially those relying on the derivation and compounding processes, are often addressed in a superficial manner. In this article, we focus on assessing the relevance of deep and motivated morphological knowledge in Natural Language Processing applications. We first describe an annotation experiment whose goal is to evaluate the role of morphology for one task, namely Question Answering (QA). We then highlight the kind of linguistic knowledge that is necessary for this particular task and propose a qualitative analysis of morphological phenomena in order to identify the morphological processes that are most relevant. Based on this study, we perform an intrinsic evaluation of existing tools and resources for French morphology, in order to quantify their coverage. Our conclusions provide helpful insights for using and building appropriate morphological resources and tools that could have a significant impact on the application performance

    Morphological analysis for the Maltese language : the challenges of a hybrid system

    Get PDF
    Maltese is a morphologically rich language with a hybrid morphological system which features both concatenative and non-concatenative processes. This paper analyses the impact of this hybridity on the performance of machine learning techniques for morphological labelling and clustering. In particular, we analyse a dataset of morphologically related word clusters to evaluate the difference in results for concatenative and non-concatenative clusters. We also describe research carried out in morphological labelling, with a particular focus on the verb category. Two evaluations were carried out, one using an unseen dataset, and another one using a gold standard dataset which was manually labelled. The gold standard dataset was split into concatenative and non-concatenative to analyse the difference in results between the two morphological systems.non peer-reviewe

    Morphological Segmentation for Keyword Spotting

    Get PDF
    We explore the impact of morphological segmentation on keyword spotting (KWS). Despite potential benefits, state-of-the-art KWS systems do not use morphological information. In this paper, we augment a state-of-the-art KWS system with sub-word units derived from supervised and unsupervised morphological segmentations, and compare with phonetic and syllabic segmentations. Our experiments demonstrate that morphemes improve overall performance of KWS systems. Syllabic units, however, rival the performance of morphological units when used in KWS. By combining morphological, phonetic and syllabic segmentations, we demonstrate substantial performance gains.United States. Intelligence Advanced Research Projects Activity (United States. Army Research Laboratory Contract W911NF-12-C-0013
    corecore