Search CORE

2,865 research outputs found

Building Morphological Chains for Agglutinative Languages

Author: B Can
H Ishwaran
J Goldsmith
J Hankamer
K Narasimhan
Publication venue
Publication date: 23/04/2017
Field of study

In this paper, we build morphological chains for agglutinative languages by using a log-linear model for the morphological segmentation task. The model is based on the unsupervised morphological segmentation system called MorphoChains. We extend MorphoChains log linear model by expanding the candidate space recursively to cover more split points for agglutinative languages such as Turkish, whereas in the original model candidates are generated by considering only binary segmentation of each word. The results show that we improve the state-of-art Turkish scores by 12% having a F-measure of 72% and we improve the English scores by 3% having a F-measure of 74%. Eventually, the system outperforms both MorphoChains and other well-known unsupervised morphological segmentation systems. The results indicate that candidate generation plays an important role in such an unsupervised log-linear model that is learned using contrastive estimation with negative samples.Comment: 10 pages, accepted and presented at the CICLing 2017 (18th International Conference on Intelligent Text Processing and Computational Linguistics

arXiv.org e-Print Archive

Crossref

OpenMETU (Middle East Technical University)

Recommended from our members

Minimally supervised induction of morphology through bitexts

Author: Moon Taesun, Ph. D.
Publication venue
Publication date: 01/12/2008
Field of study

textA knowledge of morphology can be useful for many natural language processing systems. Thus, much effort has been expended in developing accurate computational tools for morphology that lemmatize, segment and generate new forms. The most powerful and accurate of these have been manually encoded, such endeavors being without exception expensive and time-consuming. There have been consequently many attempts to reduce this cost in the development of morphological systems through the development of unsupervised or minimally supervised algorithms and learning methods for acquisition of morphology. These efforts have yet to produce a tool that approaches the performance of manually encoded systems. Here, I present a strategy for dealing with morphological clustering and segmentation in a minimally supervised manner but one that will be more linguistically informed than previous unsupervised approaches. That is, this study will attempt to induce clusters of words from an unannotated text that are inflectional variants of each other. Then a set of inflectional suffixes by part-of-speech will be induced from these clusters. This level of detail is made possible by a method known as alignment and transfer (AT), among other names, an approach that uses aligned bitexts to transfer linguistic resources developed for one language–the source language–to another language–the target. This approach has a further advantage in that it allows a reduction in the amount of training data without a significant degradation in performance making it useful in applications targeted at data collected from endangered languages. In the current study, however, I use English as the source and German as the target for ease of evaluation and for certain typlogical properties of German. The two main tasks, that of clustering and segmentation, are approached as sequential tasks with the clustering informing the segmentation to allow for greater accuracy in morphological analysis. While the performance of these methods does not exceed the current roster of unsupervised or minimally supervised approaches to morphology acquisition, it attempts to integrate more learning methods than previous studies. Furthermore, it attempts to learn inflectional morphology as opposed to derivational morphology, which is a crucial distinction in linguistics.Linguistic

Texas ScholarWorks

Methods and algorithms for unsupervised learning of morphology

Author: A. Gelbukh
A. Gispert de
B. Can
C. Monson
D. Blackwell
D. Harman
D.R. Morrison
E. Arısoy
E. Minkov
H. Ishwaran
H. Poon
H. Poon
J. Goldsmith
K. Järvelin
K. Kettunen
K. Kirchhoff
K. Sirts
K. Toutanova
L. Aunimo
M. Creutz
M. Kurimo
M.A. Hafer
M.R. Brent
N.A. Smith
P.F. Brown
R. Krovetz
S. Bordag
S. Manandhar
S. Neuvel
Z.S. Harris
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

This is an accepted manuscript of a chapter published by Springer in Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8403 in 2014 available online: https://doi.org/10.1007/978-3-642-54906-9_15 The accepted version of the publication may differ from the final published version.This paper is a survey of methods and algorithms for unsupervised learning of morphology. We provide a description of the methods and algorithms used for morphological segmentation from a computational linguistics point of view. We survey morphological segmentation methods covering methods based on MDL (minimum description length), MLE (maximum likelihood estimation), MAP (maximum a posteriori), parametric and non-parametric Bayesian approaches. A review of the evaluation schemes for unsupervised morphological segmentation is also provided along with a summary of evaluation results on the Morpho Challenge evaluations.Published versio

Crossref

Wolverhampton Intellectual Repository and E-theses

Tree Structured Dirichlet Processes for Hierarchical Morphological Segmentation

Author: Burcu Can
Can Burcu
Creutz Mathias
Dreyer Markus
Kurimo Mikko
Kurimo Mikko
Lignos Constantine
Mikolov Tomas
Nicolas Lionel
Snyder Benjamin
Suresh Manandhar
Teh Y. W.
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2018
Field of study

This article presents a probabilistic hierarchical clustering model for morphological segmentation In contrast to existing approaches to morphology learning, our method allows learning hierarchical organization of word morphology as a collection of tree structured paradigms. The model is fully unsupervised and based on the hierarchical Dirichlet process. Tree hierarchies are learned along with the corresponding morphological paradigms simultaneously. Our model is evaluated on Morpho Challenge and shows competitive performance when compared to state-of-the-art unsupervised morphological segmentation systems. Although we apply this model for morphological segmentation, the model itself can also be used for hierarchical clustering of other types of data

Hacettepe University Institutional Repository

Crossref

White Rose Research Online

Wolverhampton Intellectual Repository and E-theses

Categories and paradigms : on underspecification in Russian declension

Author: Wiese Bernd
Publication venue
Publication date: 28/04/2009
Field of study

In morphological systems of the agglutinative type we sometimes encounter a nearly perfect one-to-one relation between form and function. Turkish inflectional morphology is, of course, the standard textbook example. Things seem to be quite different in systems of the flexive type. Declension in Contemporary Standard Russian (henceforth Russian, for short) may be cited as a typical example: We find, among other things, cumulative markers, “synonymous” endings (e.g., dative singular noun forms in -i, -e, or -u), and “homonymous” endings (e.g., -i, genitive, dative, and prepositional singular). True, some endings are more of an agglutinative nature, being bound to a specific case-number combination and applying across declensions, e.g., -am (dative plural, all nouns); and some cross the boundaries of word classes, e.g., -o, which serves as the nominative/accusative singular ending of neuter forms of pronouns (and adjectives) and as the nominative/accusative singular ending of (most) neuter nouns as well. Still, many observers have been struck by the impression that what we face here are rather uneconomic or even, so to speak, unnatural structures. But perhaps flexive systems are not as complicated as they seem. What seems to be uneconomic complexity may be, at least partially, an artifact of uneconomic descriptions

Crossref

Hochschulschriftenserver - Universität Frankfurt am Main

Unsupervised morphological segmentation using neural word embeddings

Author: Can Burcu
Üstün Ahmet
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

This is an accepted manuscript of an article published by Springer in Král P., Martín-Vide C. (eds) Statistical Language and Speech Processing. SLSP 2016. Lecture Notes in Computer Science, vol 9918 on 21/09/2016, available online: https://doi.org/10.1007/978-3-319-45925-7_4 The accepted version of the publication may differ from the final published version.We present a fully unsupervised method for morphological segmentation. Unlike many morphological segmentation systems, our method is based on semantic features rather than orthographic features. In order to capture word meanings, word embeddings are obtained from a two-level neural network [11]. We compute the semantic similarity between words using the neural word embeddings, which forms our baseline segmentation model. We model morphotactics with a bigram language model based on maximum likelihood estimates by using the initial segmentations from the baseline. Results show that using semantic features helps to improve morphological segmentation especially in agglutinating languages like Turkish. Our method shows competitive performance compared to other unsupervised morphological segmentation systems.Published versio

Crossref

Wolverhampton Intellectual Repository and E-theses

OpenMETU (Middle East Technical University)

Lexical typology : a programmatic sketch

Author: Behrens Leila
Sasse Hans-Jürgen
Publication venue
Publication date: 01/01/1997
Field of study

The present paper is an attempt to lay the foundation for Lexical Typology as a new kind of linguistic typology.1 The goal of Lexical Typology is to investigate crosslinguistically significant patterns of interaction between lexicon and grammar

Hochschulschriftenserver - Universität Frankfurt am Main