Search CORE

42 research outputs found

A Comparison of Feature-Based and Neural Scansion of Poetry

Author: Agirrezabal Manex
Alegria Iñaki
Hulden Mans
Publication venue
Publication date: 01/01/2017
Field of study

Automatic analysis of poetic rhythm is a challenging task that involves linguistics, literature, and computer science. When the language to be analyzed is known, rule-based systems or data-driven methods can be used. In this paper, we analyze poetic rhythm in English and Spanish. We show that the representations of data learned from character-based neural models are more informative than the ones from hand-crafted features, and that a Bi-LSTM+CRF-model produces state-of-the art accuracy on scansion of poetry in two languages. Results also show that the information about whole word structure, and not just independent syllables, is highly informative for performing scansion.Comment: RANLP 201

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System

An Encoder-Decoder Approach to the Paradigm Cell Filling Problem

Author: Hulden Mans
Silfverberg Miikka
Publication venue: The Association for Computational Linguistics
Publication date: 01/01/2018
Field of study

Peer reviewe

Crossref

Helsingin yliopiston digitaalinen arkisto

Recommended from our members

RNN Classification of English Vowels: Nasalized or Not

Author: Hulden Mans
Liu Ling
Scarborough Rebecca
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2019
Field of study

Vowel nasality is perceived and used by English listeners though it is not phonemic. Feature-based classifiers have been built to evaluate what features are useful for nasality perception and measurement. These classifiers require heavy high-level feature engineering with most features discrete and measured at discrete points. Recurrent neural networks can take advantage of sequential information, and has the advantage of freeing us from high-level feature engineering and potentially being stronger simulation models with a holistic view. Therefore, we constructed two types of RNN classifiers (vanilla RNN and LSTM) with MFCCs of the vowel as input to predict whether the vowel is nasalized or not. The LSTM model achieved the best performance, and supports the phonetic claim about the degree of coarticulatory nasality and the use of MFCCs for automatic speech recognition

ScholarWorks@UMass Amherst

A Computational Model for the Linguistic Notion of Morphological Paradigm

Author: Hulden Mans
Liu Ling
Silfverberg Miikka
Publication venue: The Association for Computational Linguistics
Publication date: 01/01/2018
Field of study

Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Applying the Transformer to Character-level Transduction

Author: Cotterell Ryan
Hulden Mans
Wu Shijie
Publication venue
Publication date: 28/01/2021
Field of study

The transformer has been shown to outperform recurrent neural network-based sequence-to-sequence models in various word-level NLP tasks. Yet for character-level transduction tasks, e.g. morphological inflection generation and historical text normalization, there are few works that outperform recurrent models using the transformer. In an empirical study, we uncover that, in contrast to recurrent sequence-to-sequence models, the batch size plays a crucial role in the performance of the transformer on character-level tasks, and we show that with a large enough batch size, the transformer does indeed outperform recurrent models. We also introduce a simple technique to handle feature-guided character-level transduction that further improves performance. With these insights, we achieve state-of-the-art performance on morphological inflection and historical text normalization. We also show that the transformer outperforms a strong baseline on two other character-level transduction tasks: grapheme-to-phoneme conversion and transliteration.Comment: EACL 202

arXiv.org e-Print Archive

Repository for Publications and Research Data

Marrying Universal Dependencies and Universal Morphology

Author: Cotterell Ryan
Hulden Mans
McCarthy Arya D.
Silfverberg Miikka
Yarowsky David
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2018
Field of study

The Universal Dependencies (UD) and Universal Morphology (UniMorph) projects each present schemata for annotating the morphosyntactic details of language. Each project also provides corpora of annotated text in many languages - UD at the token level and UniMorph at the type level. As each corpus is built by different annotators, language-specific decisions hinder the goal of universal schemata. With compatibility of tags, each project's annotations could be used to validate the other's. Additionally, the availability of both type- and token-level resources would be a boon to tasks such as parsing and homograph disambiguation. To ease this interoperability, we present a deterministic mapping from Universal Dependencies v2 features into the UniMorph schema. We validate our approach by lookup in the UniMorph corpora and find a macro-average of 64.13% recall. We also note incompatibilities due to paucity of data on either side. Finally, we present a critical evaluation of the foundations, strengths, and weaknesses of the two annotation projects.Comment: UDW1

arXiv.org e-Print Archive

Crossref

Recommended from our members

Sound Analogies with Phoneme Embeddings

Author: Hulden Mans
Mao Lingshuang
Silfverberg Miikka P
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2018
Field of study

Vector space models of words in NLP---word embeddings---have been recently shown to reliably encode semantic information, offering capabilities such as solving proportional analogy tasks such as man:woman::king:queen. We study how well these distributional properties carry over to similarly learned phoneme embeddings, and whether phoneme vector spaces align with articulatory distinctive features, using several methods of obtaining such continuous-space representations. We demonstrate a statistically significant correlation between distinctive feature spaces and vector spaces learned with word-context PPMI+SVD and word2vec, showing that many distinctive feature contrasts are implicitly present in phoneme distributions. Furthermore, these distributed representations allow us to solve proportional analogy tasks with phonemes, such as p is to b as t is to X , where the solution is that X = d . This effect is even stronger when a supervision signal is added where we extract phoneme representations from the embedding layer of an recurrent neural network that is trained to solve a word inflection task, i.e. a model that is made aware of word relatedness

ScholarWorks@UMass Amherst

Recommended from our members

Quantifying the Trade-off Between Two Types of Morphological Complexity

Author: Cotterell Ryan
Eisner Jason
Hulden Mans
Kirov Christo
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2018
Field of study

ScholarWorks@UMass Amherst