1,580 research outputs found

    Rate variation in language change: Toward distributional phylogenetic modeling

    Get PDF
    Since the advent of phylogenetic linguistics, researchers have used a large number of phylogenetic comparative methods adapted from computational biology to model and analyze the dynamics of change of a wide range of linguistic features. Models of this sort vary in complexity; the simplest models of change assume homogeneity of transition rates within families, while state-of-the-art models of heterotachy allow transition rates to vary across lineages within a family. In this contribution, I review a range of applications of biological models of rate variation to questions in diachronic linguistics and highlight some models from computational biology that have remained largely overlooked by linguists.Building off of these and other biological models, I sketch out a program for what I term DISTRIBUTIONAL PHYLOGENETIC MODELING, inspired by an analogousrecently proposed family of hierarchical Bayesian models. I report the results of some work in progress carried out within this framework and present a casestudy illustrating the flexibility of the approach

    Information-theoretic causal inference of lexical flow

    Get PDF
    This volume seeks to infer large phylogenetic networks from phonetically encoded lexical data and contribute in this way to the historical study of language varieties. The technical step that enables progress in this case is the use of causal inference algorithms. Sample sets of words from language varieties are preprocessed into automatically inferred cognate sets, and then modeled as information-theoretic variables based on an intuitive measure of cognate overlap. Causal inference is then applied to these variables in order to determine the existence and direction of influence among the varieties. The directed arcs in the resulting graph structures can be interpreted as reflecting the existence and directionality of lexical flow, a unified model which subsumes inheritance and borrowing as the two main ways of transmission that shape the basic lexicon of languages. A flow-based separation criterion and domain-specific directionality detection criteria are developed to make existing causal inference algorithms more robust against imperfect cognacy data, giving rise to two new algorithms. The Phylogenetic Lexical Flow Inference (PLFI) algorithm requires lexical features of proto-languages to be reconstructed in advance, but yields fully general phylogenetic networks, whereas the more complex Contact Lexical Flow Inference (CLFI) algorithm treats proto-languages as hidden common causes, and only returns hypotheses of historical contact situations between attested languages. The algorithms are evaluated both against a large lexical database of Northern Eurasia spanning many language families, and against simulated data generated by a new model of language contact that builds on the opening and closing of directional contact channels as primary evolutionary events. The algorithms are found to infer the existence of contacts very reliably, whereas the inference of directionality remains difficult. This currently limits the new algorithms to a role as exploratory tools for quickly detecting salient patterns in large lexical datasets, but it should soon be possible for the framework to be enhanced e.g. by confidence values for each directionality decision

    A Global Lexical Dataset (GLED) with cognate annotation and phonological alignments

    Get PDF
    This repository comprises a dataset developed from a subset of ASJP, in which all lemmas are presented in a broad phonological transcription, automatically annotated for cognacy, and phonologically aligned. Per-family NEXUS files with binary annotation of presence/absence of cognate sets are also available. The dataset is intended to facilitate prototyping studies and methods in quantitative historical linguistics

    The evolution of similarity avoidance: a phylogenetic approach to phonotactic change

    Full text link
    The cross-linguistic under-representation of adjacent consonants sharing a place of articulation within uninflected lexical items is well documented. At the same time, little is known regarding the specific diachronic mechanisms involved in the emergence and maintenance of this pattern. Phylogenetic analyses provide some support for the idea that adjacent identical consonants within words arise infrequently, but stronger support for the idea that words containing such a pattern die out more frequently than those without. I highlight the value of tools used in this paper for exploring the evolution of sound patterns, and also discuss some limitations of the implementation used in the paper to be improved upon

    The evolution of similarity avoidance: a phylogenetic approach to phonotactic change

    Get PDF
    The cross-linguistic under-representation of adjacent consonants sharing a place of articulation within uninflected lexical items is well documented. At the same time, little is known regarding the specific diachronic mechanisms involved in the emergence and maintenance of this pattern. Phylogenetic analyses provide some support for the idea that adjacent identical consonants within words arise infrequently, but stronger support for the idea that words containing such a pattern die out more frequently than those without. I highlight the value of tools used in this paper for exploring the evolution of sound patterns, and also discuss some limitations of the implementation used in the paper to be improved upon

    A computer-assisted pproach to the comparison of mainland southeast Asian languages

    Get PDF
    This cumulative thesis is based on three separate projects based on a computer-assisted language comparison (CALC) framework to address common obstacles to studying the history of Mainland Southeast Asian (MSEA) languages, such as sparse and non-standardized lexical data, as well as an inadequate method of cognate judgments, and to provide caveats to scholars who will use Bayesian phylogenetic analysis. The first project provides a format that standardizes the sound inventories, regulates language labels, and clarifies lexical items. This standardized format allows us to merge various forms of raw data. The format also summarizes information to assist linguists in researching the relatedness among words and inferring relationships among languages. The second project focuses on increasing the transparency of lexical data and cognate judg- ments with regard to compound words. The method enables the annotation of each part of a word with semantic meanings and syntactic features. In addition, four different conversion methods were developed to convert morpheme cognates into word cognates for input into the Bayesian phylogenetic analysis. The third project applies the methods used in the first project to create a workflow by merging linguistic data sets and inferring a language tree using a Bayesian phylogenetic algorithm. Further- more, the project addresses the importance of integrating cross-disciplinary studies into historical linguistic research. Finally, the methods we proposed for managing lexical data for MSEA languages are discussed and summarized in six perspectives. The work can be seen as a milestone in reconstructing human prehistory in an area that has high linguistic and cultural diversity

    Information-theoretic causal inference of lexical flow

    Get PDF
    This volume seeks to infer large phylogenetic networks from phonetically encoded lexical data and contribute in this way to the historical study of language varieties. The technical step that enables progress in this case is the use of causal inference algorithms. Sample sets of words from language varieties are preprocessed into automatically inferred cognate sets, and then modeled as information-theoretic variables based on an intuitive measure of cognate overlap. Causal inference is then applied to these variables in order to determine the existence and direction of influence among the varieties. The directed arcs in the resulting graph structures can be interpreted as reflecting the existence and directionality of lexical flow, a unified model which subsumes inheritance and borrowing as the two main ways of transmission that shape the basic lexicon of languages

    A comparative phylogenetic approach to Austronesian cultural evolution

    Get PDF

    Macro- and Microevolution of Languages: Exploring Linguistic Divergence with Approaches from Evolutionary Biology

    Get PDF
    There are more than 7000 languages in the world, and many of these have emerged through linguistic divergence. While questions related to the drivers of linguistic diversity have been studied before, including studies with quantitative methods, there is no consensus as to which factors drive linguistic divergence, and how. In the thesis, I have studied linguistic divergence with a multidisciplinary approach, applying the framework and quantitative methods of evolutionary biology to language data. With quantitative methods, large datasets may be analyzed objectively, while approaches from evolutionary biology make it possible to revisit old questions (related to, for example, the shape of the phylogeny) with new methods, and adopt novel perspectives to pose novel questions. My chief focus was on the effects exerted on the speakers of a language by environmental and cultural factors. My approach was thus an ecological one, in the sense that I was interested in how the local environment affects humans and whether this human-environment connection plays a possible role in the divergence process. I studied this question in relation to the Uralic language family and to the dialects of Finnish, thus covering two different levels of divergence. However, as the Uralic languages have not previously been studied using quantitative phylogenetic methods, nor have population genetic methods been previously applied to any dialect data, I first evaluated the applicability of these biological methods to language data. I found the biological methodology to be applicable to language data, as my results were rather similar to traditional views as to both the shape of the Uralic phylogeny and the division of Finnish dialects. I also found environmental conditions, or changes in them, to be plausible inducers of linguistic divergence: whether in the first steps in the divergence process, i.e. dialect divergence, or on a large scale with the entire language family. My findings concerning Finnish dialects led me to conclude that the functional connection between linguistic divergence and environmental conditions may arise through human cultural adaptation to varying environmental conditions. This is also one possible explanation on the scale of the Uralic language family as a whole. The results of the thesis bring insights on several different issues in both a local and a global context. First, they shed light on the emergence of the Finnish dialects. If the approach used in the thesis is applied to the dialects of other languages, broader generalizations may be drawn as to the inducers of linguistic divergence. This again brings us closer to understanding the global patterns of linguistic diversity. Secondly, the quantitative phylogeny of the Uralic languages, with estimated times of language divergences, yields another hypothesis as to the shape and age of the language family tree. In addition, the Uralic languages can now be added to the growing list of language families studied with quantitative methods. This will allow broader inferences as to global patterns of language evolution, and more language families can be included in constructing the tree of the world’s languages. Studying history through language, however, is only one way to illuminate the human past. Therefore, thirdly, the findings of the thesis, when combined with studies of other language families, and those for example in genetics and archaeology, bring us again closer to an understanding of human history.Monet maailman yli 7000 kielestä ovat syntyneet erkaantumisprosessin kautta. Tällöin yhdestä kielestä muotoutuu eri tekijöiden vaikutuksesta aikojen saatossa useampia kieliä. Kielten erkaantumiseen vaikuttavia tekijöitä on tutkittu aiemminkin ja myös laskennallisia menetelmiä käyttäen. Vielä on kuitenkin epäselvää mitkä kaikki tekijät voivat vaikuttaa kielten erkaantumiseen ja miten. Tutkin väitöskirjassani kielten erkaantumiseen vaikuttavia tekijöitä. Lähestymistapani on monitieteinen, sillä sovellan laskennallisia evoluutiobiologian menetelmiä ja teorioita kieliaineistoon. Laskennalliset menetelmät mahdollistavat suurien aineistojen objektiivisen analysoinnin, kun taas evoluutiobiologisen lähestymistavan avulla voin muodostaa uudenlaisia tutkimuskysymyksiä ja käyttää uusia menetelmiä vastatakseni aiemmin esitettyihin kysymyksiin (esimerkiksi sukupuun muotoon liittyen). Tutkimuksessani keskityin selvittämään kielten erkaantumista ihmisen ekologian kannalta. Toisin sanoen olin kiinnostunut ympäristö- ja/tai kulttuuritekijöiden vaikutuksesta kielenpuhujiin ja siitä, voiko tämä kytkös olla osallisena kielten erkaantumisprosessissa. Tutkin kysymystä tämän prosessin kahdessa eri vaiheessa: sen alussa ennen kuin eriytyminen on kokonaan tapahtunut, ja sen jo tapahduttua. Murteiden eriytyminen vastaa prossessin alkuvaihetta, ja tutkin sitä suomen kielen murreaineistoa käyttäen. Tapahtuneita erkaantumisia tutkin sukupuista, joita tein uralilaisten kielten sanastoaineistosta. Koska uralilaisia kieliä ei ole aiemmin tutkittu vastaavanlaisin laskennallisin menetelmin eikä käyttämiäni populaatiogenetiikan menetelmiä ole käytetty aiemmin mihinkään murreaineistoon, testasin aluksi näiden menetelmien soveltuvuutta aineistojeni analysointiin. Totesin biologisten menetelmien soveltuvan kieliaineiston analysointiin, sillä tulokseni vastasivat perinteisiä näkemyksiä sekä uralilaisen sukupuun muodosta että suomen murrejaosta. Lisäksi havaitsin, että erot ympäristöoloissa mahdollisesti vaikuttavat kielten erkaantumiseen. Tämä oli havaittavissa niin eriytymisprosessin varhaisissa vaiheissa murteiden välillä kuin myös koko kieliryhmän eriytymisiä tutkittaessa. Koska ihmisten tiedetään usein sopeutuvan vallitseviin ympäristöolosuhteisiin kulttuurisopeumien avulla, päättelin murretutkimusteni tuloksista, että juuri kieltenpuhujien kulttuurinen sopeutuminen paikallisiin ympäristöolosuhteisiin saattaisi toimia puhujapopulaatioita erottavana tekijänä ja täten kytköksenä ympäristöerojen ja kielellisen erkaantumisen välillä. Tämä voisi mahdollisesti selittää myös uralilaisten kielten erkaantumisia. Väitöstutkimukseni tulokset tuovat uusia näkemyksiä kielten erkaantumiseen niin paikallisella kuin maailmanlaajuisellakin tasolla. Havaintoni ympäristöerojen mahdollisesta vaikutuksesta suomen murteiden muotoutumisessa herättää kysymyksen löytöni yleistettävyydestä myös muihin kieliin ja niiden murteisiin. Koska murteiden erkaantuminen on ensimmäinen vaihe kielen eriytymisprosessissa, on murteiden muotoutumista tutkimalla mahdollista myös selvittää, mitkä tekijät ovat aikaansaaneet maailmanlaajuisen kielten kirjon. Tästä syystä tarvitaan vastaavanlaisia tutkimuksia myös muiden kielten murteista. Esitän väitöskirjassani myös uralilaisten kielten laskennallisesti tehdyn sukupuun, jota voidaan verrata vastaavilla menetelmillä tehtyihin muiden kieliryhmien puihin. Tämän vertailun kautta on mahdollista selvittää onko kielisukupuiden muodossa jotain maailmanlaajuisia säännönmukaisuuksia, josta voi edelleen tehdä päätelmiä kieliin vaikuttavista lainalaisuuksista. Ihmiskunnan historian ja esihistorian selvittäminen on haasteellinen palapeli, jossa eri tieteenalojen palasia yhteen sovittelemalla voidaan päästä lähemmäksi yleistä ymmärrystä menneisyydestä. Väitöstutkimukseni on pieni osa tätä kokonaisuutta, mutta yhdistelemällä havaintojani niin muista kieliryhmistä tehtyihin havaintoihin kuin myös esimerkiksi arkeologian ja genetiikan tuloksiin, olemme taas askeleen lähempänä tätä tavoitetta.Siirretty Doriast
    corecore