Search CORE

8 research outputs found

Recommended from our members

A Joint Model of Orthography and Morphological Segmentation

Author: Cotterell Ryan
Schütze Hinrich
Vieira Tim
Publication venue: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Publication date: 01/01/2016
Field of study

We present a model of morphological seg- mentation that jointly learns to segment and restore orthographic changes, e.g., funniest → fun-y-est. We term this form of analysis canon- ical segmentation and contrast it with the tra- ditional surface segmentation, which segments a surface form into a sequence of substrings, e.g., funniest → funn-i-est. We derive an im- portance sampling algorithm for approximate inference in the model and report experimental results on English, German and Indonesian

Apollo (Cambridge)

MORSE: Semantic-ally Drive-n MORpheme SEgment-er

Author: Bhat Suma
Sakakini Tarek
Viswanath Pramod
Publication venue
Publication date: 01/01/2017
Field of study

We present in this paper a novel framework for morpheme segmentation which uses the morpho-syntactic regularities preserved by word representations, in addition to orthographic features, to segment words into morphemes. This framework is the first to consider vocabulary-wide syntactico-semantic information for this task. We also analyze the deficiencies of available benchmarking datasets and introduce our own dataset that was created on the basis of compositionality. We validate our algorithm across datasets and present state-of-the-art results

arXiv.org e-Print Archive

Crossref

Canonical segmentation for Javanese-Indonesian Neural Machine Translation

Author: Azizah Kurniawati
Jatmiko Wisnu
Wijono Sri Hartati
Publication venue: School of Engineering. Taylor’s University
Publication date: 01/08/2022
Field of study

Repository Universitas Sanata Dharma

Predicting the Growth of Morphological Families from Social and Linguistic Factors

Author: Hofmann Valentin
Pierrehumbert Janet
Schütze Hinrich
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 01/01/2020
Field of study

We present the first study that examines the evolution of morphological families, i.e., sets of morphologically related words such as “trump”, “antitrumpism”, and “detrumpify”, in social media. We introduce the novel task of Morphological Family Expansion Predic- tion (MFEP) as predicting the increase in the size of a morphological family. We create a ten-year Reddit corpus as a benchmark for MFEP and evaluate a number of baselines on this benchmark. Our experiments demonstrate very good performance on MFEP

Crossref

Open Access LMU

Oxford University Research Archive