2,638 research outputs found
Acquisition of Inflectional Morphology in Artificial Neural Networks With Prior Knowledge
How does knowledge of one language’s morphology influence learning of inflection rules in a second one? In order to investigate this question in artificial neural network models, we perform experiments with a sequence-to-sequence architecture, which we train on different combinations of eight source and three target languages. A detailed analysis of the model outputs suggests the following conclusions: (i) if source and target language are closely related, acquisition of the target language’s inflectional morphology constitutes an easier task for the model; (ii) knowledge of a prefixing (resp. suffixing) language makes acquisition of a suffixing (resp. prefixing) language’s morphology more challenging; and (iii) surprisingly, a source language which exhibits an agglutinative morphology simplifies learning of a second language’s inflectional morphology, independent of their relatedness
Applying the Transformer to Character-level Transduction
The transformer has been shown to outperform recurrent neural network-based
sequence-to-sequence models in various word-level NLP tasks. Yet for
character-level transduction tasks, e.g. morphological inflection generation
and historical text normalization, there are few works that outperform
recurrent models using the transformer. In an empirical study, we uncover that,
in contrast to recurrent sequence-to-sequence models, the batch size plays a
crucial role in the performance of the transformer on character-level tasks,
and we show that with a large enough batch size, the transformer does indeed
outperform recurrent models. We also introduce a simple technique to handle
feature-guided character-level transduction that further improves performance.
With these insights, we achieve state-of-the-art performance on morphological
inflection and historical text normalization. We also show that the transformer
outperforms a strong baseline on two other character-level transduction tasks:
grapheme-to-phoneme conversion and transliteration.Comment: EACL 202
From Phonology to Syntax:Unsupervised Linguistic Typology at Different Levels with Language Embeddings
A core part of linguistic typology is the classification of languages
according to linguistic properties, such as those detailed in the World Atlas
of Language Structure (WALS). Doing this manually is prohibitively
time-consuming, which is in part evidenced by the fact that only 100 out of
over 7,000 languages spoken in the world are fully covered in WALS.
We learn distributed language representations, which can be used to predict
typological properties on a massively multilingual scale. Additionally,
quantitative and qualitative analyses of these language embeddings can tell us
how language similarities are encoded in NLP models for tasks at different
typological levels. The representations are learned in an unsupervised manner
alongside tasks at three typological levels: phonology (grapheme-to-phoneme
prediction, and phoneme reconstruction), morphology (morphological inflection),
and syntax (part-of-speech tagging).
We consider more than 800 languages and find significant differences in the
language representations encoded, depending on the target task. For instance,
although Norwegian Bokm{\aa}l and Danish are typologically close to one
another, they are phonologically distant, which is reflected in their language
embeddings growing relatively distant in a phonological task. We are also able
to predict typological features in WALS with high accuracies, even for unseen
language families.Comment: Accepted to NAACL 2018 (long paper). arXiv admin note: text overlap
with arXiv:1711.0546
- …