Search CORE

13 research outputs found

Paradigm Completion for Derivational Morphology

Author: Cotterell Ryan
Khayrallah Huda
Kirov Christo
Vylomova Ekaterina
Yarowsky David
Publication venue
Publication date: 01/01/2017
Field of study

The generation of complex derived word forms has been an overlooked problem in NLP; we fill this gap by applying neural sequence-to-sequence models to the task. We overview the theoretical motivation for a paradigmatic treatment of derivational morphology, and introduce the task of derivational paradigm completion as a parallel to inflectional paradigm completion. State-of-the-art neural models, adapted from the inflection task, are able to learn a range of derivation patterns, and outperform a non-neural baseline by 16.4%. However, due to semantic, historical, and lexical considerations involved in derivational morphology, future work will be needed to achieve performance parity with inflection-generating systems.Comment: EMNLP 201

arXiv.org e-Print Archive

Crossref

Building Morphological Chains for Agglutinative Languages

Author: B Can
H Ishwaran
J Goldsmith
J Hankamer
K Narasimhan
Publication venue
Publication date: 23/04/2017
Field of study

In this paper, we build morphological chains for agglutinative languages by using a log-linear model for the morphological segmentation task. The model is based on the unsupervised morphological segmentation system called MorphoChains. We extend MorphoChains log linear model by expanding the candidate space recursively to cover more split points for agglutinative languages such as Turkish, whereas in the original model candidates are generated by considering only binary segmentation of each word. The results show that we improve the state-of-art Turkish scores by 12% having a F-measure of 72% and we improve the English scores by 3% having a F-measure of 74%. Eventually, the system outperforms both MorphoChains and other well-known unsupervised morphological segmentation systems. The results indicate that candidate generation plays an important role in such an unsupervised log-linear model that is learned using contrastive estimation with negative samples.Comment: 10 pages, accepted and presented at the CICLing 2017 (18th International Conference on Intelligent Text Processing and Computational Linguistics

arXiv.org e-Print Archive

Crossref

OpenMETU (Middle East Technical University)

MORSE: Semantic-ally Drive-n MORpheme SEgment-er

Author: Bhat Suma
Sakakini Tarek
Viswanath Pramod
Publication venue
Publication date: 01/01/2017
Field of study

We present in this paper a novel framework for morpheme segmentation which uses the morpho-syntactic regularities preserved by word representations, in addition to orthographic features, to segment words into morphemes. This framework is the first to consider vocabulary-wide syntactico-semantic information for this task. We also analyze the deficiencies of available benchmarking datasets and introduce our own dataset that was created on the basis of compositionality. We validate our algorithm across datasets and present state-of-the-art results

arXiv.org e-Print Archive

Crossref

Unsupervised learning of allomorphs in Turkish

Author: Can Burcu
Publication venue: 'The Scientific and Technological Research Council of Turkey'
Publication date: 30/12/2016
Field of study

© 2017 The Author. Published by The Scientific and Technological Research Council of Turkey. This is an open access article available under a Creative Commons licence. The published version can be accessed at the following link on the publisher’s website: https://journals.tubitak.gov.tr/elektrik/issues/elk-17-25-4/elk-25-4-57-1605-216.pdfOne morpheme may have several surface forms that correspond to allomorphs. In English, ed and d are surface forms of the past tense morpheme, and s, es, and ies are surface forms of the plural or present tense morpheme. Turkish has a large number of allomorphs due to its morphophonemic processes. One morpheme can have tens of different surface forms in Turkish. This leads to a sparsity problem in natural language processing tasks in Turkish. Detection of allomorphs has not been studied much because of its difficulty. For example, t¨u and di are Turkish allomorphs (i.e. past tense morpheme), but all of their letters are different. This paper presents an unsupervised model to extract the allomorphs in Turkish. We are able to obtain an F-measure of 73.71% in the detection of allomorphs, and our model outperforms previous unsupervised models on morpheme clustering.Published versio

Hacettepe University Institutional Repository

Crossref

Wolverhampton Intellectual Repository and E-theses

Modeling Syntactic Context Improves Morphological Segmentation

Author: Barzilay Regina
Haghighi Aria
Yoong Keok Lee
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/06/2011
Field of study

The connection between part-of-speech (POS) categories and morphological properties is well-documented in linguistics but underutilized in text processing systems. This paper proposes a novel model for morphological segmentation that is driven by this connection. Our model learns that words with common affixes are likely to be in the same syntactic category and uses learned syntactic categories to refine the segmentation boundaries of words. Our results demonstrate that incorporating POS categorization yields substantial performance gains on morphological segmentation of Arabic.United States. Army Research Office (contract/grant number W911NF-10-1-0533)U.S. Army Research Laboratory (contract/grant number W911NF-10-1-0533

DSpace@MIT

Inferring Morphological Rules from Small Examples using 0/1 Linear Programming

Author: Claessen Koen
Lilliestr\uf6m Ann
Smallbone Nicholas
Publication venue
Publication date: 01/01/2019
Field of study

We show how to express the problem of finding an optimal morpheme segmentation from a set of labelled words as a 0/1 linear programming problem, and how to build on this to analyse a language’s morphology. The result is an automatic method for segmentation and labelling that works well even when there is very little training data available

Chalmers Research

Joint Bayesian Morphology learning of Dravidian Languages

Author: Kumar Arun
Oliver González Antoni
Padró Lluís
Publication venue
Publication date: 01/01/2016
Field of study

In this paper a methodology for learning the complex agglutinative morphology of some Indian languages using Adaptor Grammars and morphology rules is presented. Adaptor grammars are a compositional Bayesian framework for grammatical inference, where we define a morphological grammar for agglutinative languages and morphological boundaries are inferred from a plain text corpus. Once morphological segmentations are produce, regular expressions for sandhi rules and orthography are applied to achieve the final segmentation. We test our algorithm in the case of two complex languages from the Dravidian family. The same morphological model and results are evaluated comparing to other state-of-the art unsupervised morphology learning systemsPostprint (published version

UPCommons. Portal del coneixement obert de la UPC