45 research outputs found
In search of isoglosses: continuous and discrete language embeddings in Slavic historical phonology
This paper investigates the ability of neural network architectures to
effectively learn diachronic phonological generalizations in a multilingual
setting. We employ models using three different types of language embedding
(dense, sigmoid, and straight-through). We find that the Straight-Through model
outperforms the other two in terms of accuracy, but the Sigmoid model's
language embeddings show the strongest agreement with the traditional
subgrouping of the Slavic languages. We find that the Straight-Through model
has learned coherent, semi-interpretable information about sound change, and
outline directions for future research
The evolution of similarity avoidance: a phylogenetic approach to phonotactic change
The cross-linguistic under-representation of adjacent consonants sharing a place of articulation within uninflected lexical items is well documented. At the same time, little is known regarding the specific diachronic mechanisms involved in the emergence and maintenance of this pattern. Phylogenetic analyses provide some support for the idea that adjacent identical consonants within words arise infrequently, but stronger support for the idea that words containing such a pattern die out more frequently than those without. I highlight the value of tools used in this paper for exploring the evolution of sound patterns, and also discuss some limitations of the implementation used in the paper to be improved upon
The evolution of similarity avoidance: a phylogenetic approach to phonotactic change
The cross-linguistic under-representation of adjacent consonants sharing a place of articulation within uninflected lexical items is well documented. At the same time, little is known regarding the specific diachronic mechanisms involved in the emergence and maintenance of this pattern. Phylogenetic analyses provide some support for the idea that adjacent identical consonants within words arise infrequently, but stronger support for the idea that words containing such a pattern die out more frequently than those without. I highlight the value of tools used in this paper for exploring the evolution of sound patterns, and also discuss some limitations of the implementation used in the paper to be improved upon
Rate variation in language change: Toward distributional phylogenetic modeling
Since the advent of phylogenetic linguistics, researchers have used a large number of phylogenetic comparative methods adapted from computational biology to model and analyze the dynamics of change of a wide range of linguistic features. Models of this sort vary in complexity; the simplest models of change assume homogeneity of transition rates within families, while state-of-the-art models of heterotachy allow transition rates to vary across lineages within a family. In this contribution, I review a range of applications of biological models of rate variation to questions in diachronic linguistics and highlight some models from computational biology that have remained largely overlooked by linguists.Building off of these and other biological models, I sketch out a program for what I term DISTRIBUTIONAL PHYLOGENETIC MODELING, inspired by an analogousrecently proposed family of hierarchical Bayesian models. I report the results of some work in progress carried out within this framework and present a casestudy illustrating the flexibility of the approach
Reconstructing the evolution of Indo-European grammar
This study uses phylogenetic methods adopted from computational biology in order to reconstruct features of Proto-Indo-European morphosyntax. We estimate the probability of the presence of typological features in Proto-Indo-European on the assumption that these features change according to a stochastic process governed by evolutionary transition rates between them. We compare these probabilities to previous reconstructions of Proto-Indo-European morphosyntax, which use either the comparative-historical method or implicational typology. We find that our reconstruction yields strong support for a canonical model (synthetic, nominative-accusative, headfinal) of the protolanguage and low support for any alternative model. Observing the evolutionary dynamics of features in our data set, we conclude that morphological features have slower rates of change, whereas syntactic traits change faster. Additionally, more frequent, unmarked traits in grammatical hierarchies have slower change rates when compared to less frequent, marked ones, which indicates that universal patterns of economy and frequency impact language change within the family.
Keywords - Indo-European linguistics, historical linguistics, phylogenetic linguistics, typology, syntactic reconstructio
Reconstructing the evolution of Indo-European grammar
This study uses phylogenetic methods adopted from computational biology in order to reconstruct features of Proto-Indo-European morphosyntax. We estimate the probability of the presence of typological features in Proto-Indo-European on the assumption that these features change according to a stochastic process governed by evolutionary transition rates between them. We compare these probabilities to previous reconstructions of Proto-Indo-European morphosyntax, which use either the comparative-historical method or implicational typology. We find that our reconstruction yields strong support for a canonical model (synthetic, nominative-accusative, headfinal) of the protolanguage and low support for any alternative model. Observing the evolutionary dynamics of features in our data set, we conclude that morphological features have slower rates of change, whereas syntactic traits change faster. Additionally, more frequent, unmarked traits in grammatical hierarchies have slower change rates when compared to less frequent, marked ones, which indicates that universal patterns of economy and frequency impact language change within the family.
Keywords - Indo-European linguistics, historical linguistics, phylogenetic linguistics, typology, syntactic reconstructio
Short vs long stem alternations in Romance verbal inflection: the S-morphome
Some verbs in Romance (e.g. the reflexes of faciō 'do', dīcō 'say', habeō 'have', sapiō 'know', possum 'be able', and volō 'want') display alternations between a short (e.g. It. f-are, f-a, d-ire) and a long (e.g. It. fac-evo, dic-e, dic-evo) stem. This paper contains an exploration of the lexical and paradigmatic distribution of these stem alternations across Romance varieties to trace when they emerged, how, and why. The results suggest a comparatively early emergence as a result of the interaction between preexisting morphological predictability relations within the paradigm and an evolutionary preference for shorter forms in high-frequency word forms and lexemes
Decoupling Speed of Change and Long-Term Preference in Language Evolution: Insights From Romance Verb Stem Alternations
Romance verb stem alternations (e.g., Spanish tengo `I have' vs. tienes `you have') constitute seemingly unnecessary but highly inheritable morphological traits. Using novel phylogenetic methods, we assess the impact of frequency and alternation patterns on properties of their evolution, specifically on the speed of change and the long term preference for pattern types within lemmata. We find credible differences in long-term trends between alternation patterns, and confirm the notion that frequency drives the maintenance of irregular patterns. However, our model reveals no or only weak effects of either predictor on the speed of change. Our findings call for modeling the multiple dimensions of language change jointly but with distinct parameters for speed (or rates) of change and long-term preferences
Dialectal Layers in West Iranian: a Hierarchical Dirichlet Process Approach to Linguistic Relationships
This paper addresses a series of complex and unresolved issues in the
historical phonology of West Iranian languages. The West Iranian languages
(Persian, Kurdish, Balochi, and other languages) display a high degree of
non-Lautgesetzlich behavior. Most of this irregularity is undoubtedly due to
language contact; we argue, however, that an oversimplified view of the
processes at work has prevailed in the literature on West Iranian dialectology,
with specialists assuming that deviations from an expected outcome in a given
non-Persian language are due to lexical borrowing from some chronological stage
of Persian. It is demonstrated that this qualitative approach yields at times
problematic conclusions stemming from the lack of explicit probabilistic
inferences regarding the distribution of the data: Persian may not be the sole
donor language; additionally, borrowing at the lexical level is not always the
mechanism that introduces irregularity. In many cases, the possibility that
West Iranian languages show different reflexes in different conditioning
environments remains under-explored. We employ a novel Bayesian approach
designed to overcome these problems and tease apart the different determinants
of irregularity in patterns of West Iranian sound change. Our methodology
allows us to provisionally resolve a number of outstanding questions in the
literature on West Iranian dialectology concerning the dialectal affiliation of
certain sound changes. We outline future directions for work of this sort.Comment: 28 p