Search CORE

53 research outputs found

A Joint Learning Model of Word Segmentation, Lexical Acquisition, and Phonetic Variability

Author: Elsner Micha
Feldman Naomi
Goldwater Sharon
Wood Frank
Publication venue
Publication date: 01/01/2013
Field of study

Edinburgh Research Explorer

Bootstrapping a Unified Model of Lexical and Phonetic Acquisition

Author: Eisenstein Jacob
Elsner Micha
Goldwater Sharon
Publication venue
Publication date: 01/07/2012
Field of study

Edinburgh Research Explorer

Recommended from our members

Analogy in Contact: Modeling Maltese Plural Inflection

Author: Court Sara
Elsner Micha
Sims Andrea D
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/06/2023
Field of study

Maltese is often described as having a hybrid morphological system resulting from extensive contact between Semitic and Romance language varieties. Such a designation reflects an etymological divide as much as it does a larger tradition in the literature to consider concatenative and non-concatenative morphological patterns as distinct in the language architecture. Using a combination of computational modeling and information theoretic methods, we quantify the extent to which the phonology and etymology of a Maltese singular noun may predict the morphological process (affixal vs. templatic) as well as the specific plural allomorph (affix or template) relating a singular noun to its associated plural form(s) in the lexicon. The results indicate phonological pressures shape the organization of the Maltese lexicon with predictive power that extends beyond that of a word\u27s etymology, in line with analogical theories of language change in contact

ScholarWorks@UMass Amherst

Recommended from our members

Formalizing Inflectional Paradigm Shape with Information Theory

Author: Elsner Micha
LeFevre Grace
Sims Andrea D
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2021
Field of study

“Paradigm shape,” our term for the morphological structure formed by implicative relations between inflected forms, has not been formally quantified in a gradient manner. We develop a method to formalize paradigm shape by modeling the joint effect of stem alternations and affixes. Applied to Spanish verbs, our model successfully captures aspects of both allomorphic and distributional classes. These results are replicable and extendable to other languages

ScholarWorks@UMass Amherst

Recommended from our members

Interpreting Sequence-to-Sequence Models for Russian Inflectional Morphology

Author: Elsner Micha
King David L
Sims Andrea D
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2020
Field of study

Morphological inflection, as an engineering task in NLP, has seen a rise in the use of neural sequence-to-sequence models (Kann et al. 2016, Cotterell et al. 2018, Aharoni et al. 2017). While these outperform traditional systems based on edit rule induction, it is hard to interpret what they are learning in linguistic terms. We propose a new method of analyzing morphological sequence-to-sequence models which groups errors into linguistically meaningful classes, making what the model learns more transparent. As a case study, we analyze a seq2seq model on Russian, finding that semantic and lexically conditioned allomorphy (e.g. inanimate nouns like zavod `factory\u27 and animates like otec `father\u27 have different, animacy-conditioned accusative forms) are responsible for its relatively low accuracy. Augmenting the model with word embeddings as a proxy for lexical semantics leads to significant improvements in predicted wordform accuracy

ScholarWorks@UMass Amherst

Recommended from our members

Normalization may be ineffective for phonetic category learning

Author: Elsner Micha
Feldman Naomi H.
Hitczenko Kasia
Mazuka Reiko
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2019
Field of study

Sound categories often overlap in their acoustics, which can make phonetic learning difficult. Several studies argued that normalizing acoustics relative to context improves category separation (e.g. Dillon et al., 2013). However, recent work shows that normalization is ineffective for learning Japanese vowel length from spontaneous child-directed speech (Hitczenko et al., 2018). We show that this discrepancy arises from differences between spontaneous and controlled lab speech, and that normalization can increase category overlap when there are regularities in which contexts different sounds occur in - a hallmark of spontaneous speech. Therefore, normalization is unlikely to help in real, naturalistic phonetic learning situations

ScholarWorks@UMass Amherst

Challenges and solutions for Latin named entity recognition

Author: Ajaka Petra
Brown Christopher
de Marneffe Marie-Catherine
Elsner Micha
Erdmann Alex
Janse Mark
Joseph Brian D.
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2016
Field of study

Although spanning thousands of years and genres as diverse as liturgy, historiography, lyric and other forms of prose and poetry, the body of Latin texts is still relatively sparse compared to English. Data sparsity in Latin presents a number of challenges for traditional Named Entity Recognition techniques. Solving such challenges and enabling reliable Named Entity Recognition in Latin texts can facilitate many down-stream applications, from machine translation to digital historiography, enabling Classicists, historians, and archaeologists for instance, to track the relationships of historical persons, places, and groups on a large scale. This paper presents the first annotated corpus for evaluating Named Entity Recognition in Latin, as well as a fully supervised model that achieves over 90% F-score on a held-out test set, significantly outperforming a competitive baseline. We also present a novel active learning strategy that predicts how many and which sentences need to be annotated for named entities in order to attain a specified degree of accuracy when recognizing named entities automatically in a given text. This maximizes the productivity of annotators while simultaneously controlling quality

Ghent University Academic Bibliography

The Paradigm Discovery Problem

Author: Cotterell Ryan
Elsner Micha
Erdmann Alexander
Habash Nizar
Wu Shijie
Publication venue
Publication date: 01/01/2020
Field of study

This work treats the paradigm discovery problem (PDP), the task of learning an inflectional morphological system from unannotated sentences. We formalize the PDP and develop evaluation metrics for judging systems. Using currently available resources, we construct datasets for the task. We also devise a heuristic benchmark for the PDP and report empirical results on five diverse languages. Our benchmark system first makes use of word embeddings and string similarity to cluster forms by cell and by paradigm. Then, we bootstrap a neural transducer on top of the clustered data to predict words to realize the empty paradigm slots. An error analysis of our system suggests clustering by cell across different inflection classes is the most pressing challenge for future work. Our code and data are available for public use.Comment: Forthcoming at ACL 202

arXiv.org e-Print Archive

Repository for Publications and Research Data

Crossref

Recommended from our members

Stop the Morphological Cycle, I Want to Get Off: Modeling the Development of Fusion

Author: Antetomaso Stephanie
Elsner Micha
Johnson Martha B.
Sims Andrea D
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2020
Field of study

Historical linguists observe that many fusional (unsegmentable) morphological structures developed from agglutinative (segmentable) predecessors. Such changes may result when learners fail to acquire a phonological alternation, and instead, “chunk” the altered versions of morphemes and memorize them as underlying representations. We present a Bayesian model of this process, which learns which morphosyntactic properties are chunked together, what their underlying representations are, and what phonological processes apply to them. In simulations using artificial data, we provide quantitative support to two claims about agglutinative and fusional structures: that optional morphological markers discourage fusion from developing, but that stress-based vowel reduction encourages it

ScholarWorks@UMass Amherst

Influence of Visual Complexity on referring Expression Generation

Author: Clarke Alasdair
Elsner Micha
Rohde Hannah
Publication venue
Publication date: 01/01/2014
Field of study

Edinburgh Research Explorer