148 research outputs found

    Computational Evolutionary Linguistics

    Get PDF
    Languages and species both evolve by a process of repeated divergences, which can be described with the branching of a phylogenetic tree or phylogeny. Taking advantage of this fact, it is possible to study language change using computational tree building techniques developed for evolutionary biology. Mathematical approaches to the construction of phylogenies fall into two major categories: character based and distance based methods. Character based methods were used in prior work in the application of phylogenetic methods to the Indo-European family of languages by researchers at the University of Pennsylvania. Discussion of the limitations of character-based models leads to a similar presentation of distance based models. We present an adaptation of these methods to linguistic data, and the phylogenies generated by applying these methods to several modern Germanic languages and Spanish. We conclude that distance based for phylogenies are useful for historical linguistic reconstruction, and that it would be useful to extend existing tree drawing methods to better model the evolutionary effects of language contact

    Using ancestral state reconstruction methods for onomasiological reconstruction in multilingual word lists

    Get PDF
    Current efforts in computational historical linguistics are predominantly concerned with phylogenetic inference. Methods for ancestral state reconstruction have only been applied sporadically. In contrast to phylogenetic algorithms, automatic reconstruction methods presuppose phylogenetic information in order to explain what has evolved when and where. Here we report a pilot study exploring how well automatic methods for ancestral state reconstruction perform in the task of onomasiological reconstruction in multilingual word lists, where algorithms are used to infer how the words evolved along a given phylogeny, and reconstruct which cognate classes were used to express a given meaning in the ancestral languages. Comparing three different methods, Maximum Parsimony, Minimal Lateral Networks, and Maximum Likeli- hood on three different test sets (Indo-European, Austronesian, Chinese) using binary and multi-state coding of the data as well as single and sampled phylogenies, we find that Maximum Likelihood largely outperforms the other methods. At the same time, however, the general performance was disappointingly low, ranging between 0.66 (Chinese) and 0.79 (Austronesian) for the F-Scores. A closer linguistic evaluation of the reconstructions proposed by the best method and the reconstructions given in the gold standards revealed that the majority of the cases where the algorithms failed can be attributed to problems of independent semantic shift (homoplasy), to morphological processes in lexical change, and to wrong reconstructions in the independently created test sets that we employed

    Reconstructing the evolution of Indo-European grammar

    Full text link
    This study uses phylogenetic methods adopted from computational biology in order to reconstruct features of Proto-Indo-European morphosyntax. We estimate the probability of the presence of typological features in Proto-Indo-European on the assumption that these features change according to a stochastic process governed by evolutionary transition rates between them. We compare these probabilities to previous reconstructions of Proto-Indo-European morphosyntax, which use either the comparative-historical method or implicational typology. We find that our reconstruction yields strong support for a canonical model (synthetic, nominative-accusative, headfinal) of the protolanguage and low support for any alternative model. Observing the evolutionary dynamics of features in our data set, we conclude that morphological features have slower rates of change, whereas syntactic traits change faster. Additionally, more frequent, unmarked traits in grammatical hierarchies have slower change rates when compared to less frequent, marked ones, which indicates that universal patterns of economy and frequency impact language change within the family. Keywords - Indo-European linguistics, historical linguistics, phylogenetic linguistics, typology, syntactic reconstructio

    Reconstructing the evolution of Indo-European grammar

    Get PDF
    This study uses phylogenetic methods adopted from computational biology in order to reconstruct features of Proto-Indo-European morphosyntax. We estimate the probability of the presence of typological features in Proto-Indo-European on the assumption that these features change according to a stochastic process governed by evolutionary transition rates between them. We compare these probabilities to previous reconstructions of Proto-Indo-European morphosyntax, which use either the comparative-historical method or implicational typology. We find that our reconstruction yields strong support for a canonical model (synthetic, nominative-accusative, headfinal) of the protolanguage and low support for any alternative model. Observing the evolutionary dynamics of features in our data set, we conclude that morphological features have slower rates of change, whereas syntactic traits change faster. Additionally, more frequent, unmarked traits in grammatical hierarchies have slower change rates when compared to less frequent, marked ones, which indicates that universal patterns of economy and frequency impact language change within the family

    Kielen muutos evolutiivisena prosessina

    Get PDF
    In the thesis it is discussed in what ways concepts and methodology developed in evolutionary biology can be applied to the explanation and research of language change. The parallel nature of the mechanisms of biological evolution and language change is explored along with the history of the exchange of ideas between these two disciplines. Against this background computational methods developed in evolutionary biology are taken into consideration in terms of their applicability to the study of historical relationships between languages. Different phylogenetic methods are explained in common terminology, avoiding the technical language of statistics. The thesis is on one hand a synthesis of earlier scientific discussion, and on the other an attempt to map out the problems of earlier approaches in addition to finding new guidelines in the study of language change on their basis. Primarily literature about the connections between evolutionary biology and language change, along with research articles describing applications of phylogenetic methods into language change have been used as source material. The thesis starts out by describing the initial development of the disciplines of evolutionary biology and historical linguistics, a process which right from the beginning can be seen to have involved an exchange of ideas concerning the mechanisms of language change and biological evolution. The historical discussion lays the foundation for the handling of the generalised account of selection developed during the recent few decades. This account is aimed for creating a theoretical framework capable of explaining both biological evolution and cultural change as selection processes acting on self-replicating entities. This thesis focusses on the capacity of the generalised account of selection to describe language change as a process of this kind. In biology, the mechanisms of evolution are seen to form populations of genetically related organisms through time. One of the central questions explored in this thesis is whether selection theory makes it possible to picture languages are forming populations of a similar kind, and what a perspective like this can offer to the understanding of language in general. In historical linguistics, the comparative method and other, complementing methods have been traditionally used to study the development of languages from a common ancestral language. Computational, quantitative methods have not become widely used as part of the central methodology of historical linguistics. After the fading of a limited popularity enjoyed by the lexicostatistical method since the 1950s, only in the recent years have also the computational methods of phylogenetic inference used in evolutionary biology been applied to the study of early language history. In this thesis the possibilities offered by the traditional methodology of historical linguistics and the new phylogenetic methods are compared. The methods are approached through the ways in which they have been applied to the Indo-European languages, which is the most thoroughly investigated language family using both the traditional and the phylogenetic methods. The problems of these applications along with the optimal form of the linguistic data used in these methods are explored in the thesis. The mechanisms of biological evolution are seen in the thesis as parallel in a limited sense to the mechanisms of language change, however sufficiently so that the development of a generalised account of selection is deemed as possibly fruiful for understanding language change. These similarities are also seen to support the validity of using phylogenetic methods in the study of language history, although the use of linguistic data and the models of language change employed by these models are seen to await further development.Tutkielma käsittelee evoluutiobiologiassa kehitetyn käsitteistön ja metodologian soveltamista kielenmuutoksen selittämiseen ja tutkimukseen. Tutkielmassa taustoitetaan biologisen evoluution ja kielenmuutoksen mekanismien rinnasteisuutta sekä näiden kahden eri alan teoreettisen tutkimuksen välisen vuoropuhelun historiaa. Tämän taustan pohjalta käsitellään evoluutiobiologiassa käytettyjen laskennallisten menetelmien soveltuvuutta kielihistorian selvittämiseen. Erilaiset menetelmät pyritään myös esittelemään yleistajuisesti jättäen tilastotieteen teknisen terminologian taustalle. Tutkielma on toisaalta synteesi aihepiirin aiemmasta tieteellisestä keskustelusta, ja toisaalta pyrkimys kartoittaa tähänastisten lähestymistapojen ongelmakohtia sekä löytää uusia suuntaviivoja kielen muutoksen tutkimuksessa niiden pohjalta. Lähdeaineistona on käytetty ensi sijassa evoluutiobiologian ja kielenmuutoksen välisiä yhteyksiä käsittelevää kirjallisuutta sekä fylogeneettisiä menetelmiä kielenmuutokseen soveltavia tutkimusartikkeleita. Tutkielma lähtee liikkeelle kuvaamalla evoluutiobiologian ja historiallisen kielitieteen tutkimusalojen kehitystä, johon nähdään kuuluneen alusta alkaen vuoropuhelu kielenmuutoksen ja biologisen evoluution mekanismeista. Historiallinen käsittely luo pohjaa viime vuosikymmeninä kehitetyn nk. yleisen valintateorian lähestymiselle. Yleinen valintateoria pyrkii muodostamaan viitekehyksen, joka pystyisi selittämään sekä biologista että kulttuurista evoluutiota replikaatioon perustuvana valintaprosessina. Tutkielmassa keskitytään tarkastelemaan yleisen valintateorian kykyä kuvata kielenmuutosta tällaisena prosessina. Biologiassa evoluution mekanismien nähdään muodostavan eliöiden populaatioita ajan kuluessa. Yksi keskeisistä tutkielmassa käsiteltävistä kysymyksistä on se, mahdollistaako valintateoria nähdä kielten muodostavan populaatioita, ja mitä tällaisesta näkökulmasta seuraa kielen ymmärtämiselle. Historiallisessa kielitieteessä kielten kehittymistä yhteisestä kantamuodosta on tutkittu perinteisesti vertailumenetelmällä ja muilla, sitä täydentävillä, menetelmillä joilla käsitellään suurta määrää kielellisiä muotoja koskevia muutoksia. Laskennalliset menetelmät eivät ole toistaiseksi tulleet osaksi historiallisen kielitieteen vakiometodologiaa. 1950-luvulta lähtien rajallisesti käytetyn leksikostatistisen merkityksen vähennyttyä entisestään kielihistorian tutkimukseen on aivan viime vuosina sovellettu myös evoluutiobiologiassa käytettyjä tilastollisia fylogeneettisen päättelyn laskennallisia malleja. Tutkielma vertaa historiallisen kielitieteen perinteistä metodologiaa ja uusien fylogeneettisten menetelmien tarjoamia mahdollisuuksia. Menetelmiä lähestytään sen kautta, miten niitä on sovellettu indoeurooppalaisiin kieliin, joka on eniten tutkittu kielikunta sekä perinteisillä että fylogeneettisillä menetelmillä. Tutkielmassa käydään läpi fylogeneettisten menetelmien sovellusten tähänastisia ongelmia sekä käytettävän kieliaineiston optimaalista muotoa. Biologisen evoluution mekanismit nähdään tutkielmassa rajallisessa määrin rinnasteisina kielenmuutoksen mekanismeihin, kuitenkin siinä määrin että yleisen valintateorian kehittäminen todetaan mahdollisesti hedelmälliseksi kielenmuutoksen ymmärtämiseksi. Yhtäläisyyksien ansiosta myös biologisesta tutkimuksesta peräisin olevien fylogeneettisten menetelmien nähdään olevan päteviä apuvälineitä kielenmuutoksen tutkimukseen, joskin kieliaineiston käytön ja menetelmien käyttämien kielenmuutoksen mallien todetaan odottavan lisäkehitystä

    Reconstructing the evolution of Indo-European grammar

    Full text link
    This study uses phylogenetic methods adopted from computational biology in order to reconstruct features of Proto-Indo-European morphosyntax. We estimate the probability of the presence of typological features in Proto-Indo-European on the assumption that these features change according to a stochastic process governed by evolutionary transition rates between them. We compare these probabilities to previous reconstructions of Proto-Indo-European morphosyntax, which use either the comparative-historical method or implicational typology. We find that our reconstruction yields strong support for a canonical model (synthetic, nominative-accusative, headfinal) of the protolanguage and low support for any alternative model. Observing the evolutionary dynamics of features in our data set, we conclude that morphological features have slower rates of change, whereas syntactic traits change faster. Additionally, more frequent, unmarked traits in grammatical hierarchies have slower change rates when compared to less frequent, marked ones, which indicates that universal patterns of economy and frequency impact language change within the family. Keywords - Indo-European linguistics, historical linguistics, phylogenetic linguistics, typology, syntactic reconstructio

    Universal typological dependencies should be detectable in the history of language families

    Get PDF
    1. Introduction We claim that making sense of the typological diversity of languages demands a historical/evolutionary approach.We are pleased that the target paper (Dunn et al. 2011a) has served to bring discussion of this claim into prominence, and are grateful that leading typologists have taken the time to respond (commentaries denoted by boldface). It is unfortunate though that a number of the commentaries in this issue of LT show significant misunderstandings of our paper. Donohue thinks we were out to show the stability of typological features, but that was not our target at all (although related methods can be used to do that: see, e.g., Greenhill et al. 2010a, Dediu 2011a). Plank seems to think we were arguing against universals of any type, but our target was in fact just the implicational universals of word order that have been the bread and butter of typology. He also seems to think we ignore diachrony, whereas in fact the method introduces diachrony centrally into typological reasoning, thereby potentially revolutionising typology (see Cysouw’s commentary). Levy & Daumé think we were testing for lineage-specificity, whereas that was in fact an outcome (the main finding) of our testing for correlated evolution. Dryer thinks we must account for the distribution of language types around the world, but that was not our aim: our aim was to test the causal connection between linguistic variables by taking the perspective of language evolution (diversification and change). Longobardi & Roberts seem to think we set out to extract family trees from syntactic features, but our goal was in fact to use trees based on lexical cognates and hang reconstructed syntactic states on each node of these trees, thereby reconstructing the processes of language change

    A quantitative approach to the study of syntactic evolution

    Get PDF
    The dissertation covers the experimentation of quantitative algorithmic procedures for the study of language evolution. In particular, the inquiry is based on the application of quantitative methods originally designed within molecular biology and population genetics to a parametric comparative dataset: The goal is to infer hypotheses regarding genealogical relationships between a specific set of languages, accounting for the role of areal convergence in linguistic variation, and to evaluate them in light of the traditional accounts provided by historical linguistics. The first focus is on the comparison between language evolution and biological evolution. The idea is that some important features of language development may also be identified drawing a parallel with the biological domain. On the whole, this analysis seems to show that language evolution and biological evolution are considerably different in some respects, but that the dissimilarities do not prevent the application of quantitative reconstruction procedures. Then most recent generative views on syntactic change are taken into consideration, showing that they are perfectly compatible with the evolutionary account outlined. To this end, basic notions regarding the cognitive-biolinguistic and the formal aspects of generative grammar are illustrated and, once the parametric perspective on synchronic language variation is clarified, the discussion is dedicated to the extension of the parametric approach to the explanation of diachronic phenomena, including genealogical development and contact. The successive step is the presentation of diverse methods of comparison adopted in historical linguistics and population genetics and, in particular, of the “Parametric comparison method”: The parallel between the latter and the procedures of investigation used in molecular biology paves the way to the introduction of the relevant quantitative techniques of phylogenetic reconstruction. After having outlined the overview of the principal datasets used so far to perform quantitative investigations on the history of languages, the parametric dataset is presented and overview of “traditional” and quantitative-based proposals regarding the genealogical classification of the languages included in the investigation is provided. The last section of the work covers the illustration of the quantitative analyses carried out. The preliminary character-based and distance-based review of the dataset is followed by the discussion on the choice of the phylogenetic methods adopted. Then the first outfit of phylogenies reconstructions on the full dataset is offered and commented on in detail. The successive focus is on possible strategies to account for homoplasy (i.e. chance and borrowing): An empirically-based selection of parameters and suggestions regarding the way in which parameters might be weighted according to their genealogical relevance are proposed. Finally, some tentative analyses concerning the possibility of detecting and accounting for borrowing in phylogenetic trees, the reconstruction of ancestral states and the mapping of syntactic distances onto the diachronic and the diatopic dimensions of variation are introduced. On the whole, the quantitative analyses appear to provide good indications of diverse facts: That phylogenetic techniques are to a large extent effectively applicable to the study of syntactic evolution, that the parametric comparison may successfully help shedding light on both short- and long-range genealogical relationships, and that traces of proper genealogical relatedness are likely to be preserved (and to be recoverable despite homoplasy) at the level of “macro-comparison”, like that instantiated in the parametric data
    corecore