230 research outputs found

    Word order evolves at similar rates in main and subordinate clauses

    Full text link
    In syntactic change, it remains an open issue whether word orders are more conservative or innovative in subordinate clauses compared with main clauses. Using 47 dependency-annotated corpora and Bayesian phylogenetic inference, we explore the evolution of S/V, V/O, and S/O orders across main and subordinate clauses in Indo-European. Our results reveal similar rates of change across clause types, with no evidence for any inherent conservatism of subordinate or main clauses. Our models also support evolutionary biases towards SV, VO, and SO orders, consistent with theories of dependency length minimization that favor verb-medial orders and with theories of a subject preference that favor SO orders. Finally, our results show that while the word order in the proto-language cannot be estimated with any reasonable degree of certainty, the early history of the family was dominated by a moderate preference for SVO orders, with substantial uncertainty between VO and OV orders in both main and subordinate clauses

    Molecular phylogenetics, evolution of sexual systems and historical biogeography of Darwin's favourite orchids (Catasetinae) and Swan orchids (Cycnoches Lindl.)

    Get PDF
    The Orchidaceae are one of the most species rich and widespread lineages among angiosperms. They have evolved numerous remarkable vegetative and reproductive traits that have allowed them to successfully adapt and diversify into a wide array of environments. More importantly, they have developed several intricate symbiotic relationships with different kinds of organisms (e.g. animals, fungi) that for centuries have attracted the attention of botanists, biologists, amateurs and naturalists. Nevertheless, despite the extensive research done so far on orchid biology and phylogenetics, very little is known about the biotic and environmental variables as well as the evolution of several key traits that seem to be linked with the successful diversification of this lineage. This dissertation is focused on three puzzling aspects of plant evolutionary biology, specifically the phylogenetic incongruence between nuclear and plastid genomes, the evolution of sexual systems, and lineage migration and isolation through time. To address these topics, I chose as a group of study the sub tribe Catasetinae, an orchid lineage including ca. 350 species restricted to the Neotropical region. They show a remarkable set of sexual systems, namely protandry and Environmental Sex Determination (ESD), that were never studied before in a phylogenetic context. My dissertation includes as well a minor part on taxonomic and floristic work devoted to other representative orchid lineages of the Neotropical flora (i.e. Epidendrum and Lepanthes). Based on vegetal material collected during field trips, my taxonomic research resulted in the description of several new species and new chorological reports contributing to the Colombian and Costa Rican Floras. Using a set of nuclear and chloroplast loci obtained from material cultivated at the Botanic Garden Munich and collected during field work in several Latin American countries, I produced a well-supported and insofar the most representatively sampled phylogeny of Catasetinae. While gathering vegetal material, I encountered several complications such as extreme scarcity of individuals and worrisome, extensive bureaucratic administrative processes to obtain collection and research permits that finally undermined my taxon sampling. By studying in detail the Catasetinae internal phylogenetic relationships independently derived from nuclear and plastid loci, I came across several well supported conflicting phylogenetic positions. Most of the traditional phylogenetic methods developed to address these conflicts aim at the inference of a species tree only. In chapter 5, I explored the utility of co-phylogenetic tools (i.e. PACo and ParaFit) to quantify the conflicts between nuclear and plastid genomes. These tools have been largely employed in host-parasite/endosymbiont studies, hence they have the power to assess the contribution of single Operational Terminal Units (OTUs) to the phylogenetic pattern observed. As a result, using the Catasetinae chloroplast and nuclear datasets and extensive simulation approaches, I demonstrate that PACo successfully detects conflicting OTUs and its performance is overall better than ParaFit. In addition, my research provided strong evidence towards the bias of input data type (i.e. phylograms and cladograms) on distance-based co-phylogenetic methods. A pipeline to execute PACo and ParaFit tools in the software R to detect conflicting sequences in either small or big datasets was designed After inferring a strongly supported phylogeny, and by carrying in-situ and ex-situ observations plus searches of specialized literature on reproductive biology, I investigated the evolution of sexual systems of Catasetinae. I relied on Ancestral State Reconstruction (ASR) approaches and Bayesian statistical frameworks (chapter 6). As a result, ASR revealed three independent gains of ESD, once in the Last Common Ancestor (LCA) of Catasetum, Cycnoches and part of Mormodes, respectively, always derived from a protandrous ancestors. In contrast, protandry appears to have evolved only once, at the LCA of Catasetum, Clowesia, Cycnoches, Dressleria and Mormodes. The last chapter of this dissertation deals with the impact of the Andean uplift, the most important orographic event in South America, on evolution of epiphytic lowland Neotropical lineages. I used as a group of study Cycnoches (a member of the Catasetinae), which includes ca. 34 species and is distributed in Neotropical lowland wet forests. To address this goal, I produced the most completely sampled phylogeny of Cycnoches, and relied on Bayesian dating and Ancestral Area Estimation (AAE) approaches. The LCA of Cycnoches lived ca. 6 million years ago (MYA) in the Amazonian region. From this area, it expanded towards Central America and Choco in multiple migrations well after main Andean mountain building episodes. In addition, stochastic character mapping showed that within-region speciation (i.e. speciation in sympatric lineages) was a key process linked to diversification and range distribution evolution in Cycnoches

    Formal methods applied to the analysis of phylogenies: Phylogenetic model checking

    Get PDF
    Los árboles filogenéticos son abstracciones útiles para modelar y caracterizar la evolución de un conjunto de especies o poblaciones respecto del tiempo. La proposición, verificación y generalización de hipótesis sobre un árbol filogenético inferido juegan un papel importante en el estudio y comprensión de las relaciones evolutivas. Actualmente, uno de los principales objetivos científicos es extraer o descubrir los mensajes biológicos implícitos y las propiedades estructurales subyacentes en la filogenia. Por ejemplo, la integración de información genética en una filogenia ayuda al descubrimiento de genes conservados en todo o parte del árbol, la identificación de posiciones covariantes en el ADN o la estimación de las fechas de divergencia entre especies. Consecuentemente, los árboles ayudan a comprender el mecanismo que gobierna la deriva evolutiva. Hoy en día, el amplio espectro de métodos y herramientas heterogéneas para el análisis de filogenias enturbia y dificulta su utilización, además del fuerte acoplamiento entre la especificación de propiedades y los algoritmos utilizados para su evaluación (principalmente scripts ad hoc). Este problema es el punto de arranque de esta tesis, donde se analiza como solución la posibilidad de introducir un entorno formal de verificación de hipótesis que, de manera automática y modular, estudie la veracidad de dichas propiedades definidas en un lenguaje genérico e independiente (en una lógica formal asociada) sobre uno de los múltiples softwares preparados para ello. La contribución principal de la tesis es la propuesta de un marco formal para la descripción, verificación y manipulación de relaciones causales entre especies de forma independiente del código utilizado para su valoración. Para ello, exploramos las características de las técnicas de model checking, un paradigma en el que una especificación expresada en lógica temporal se verifica con respecto a un modelo del sistema que representa una implementación a un cierto nivel de detalle. Se ha aplicado satisfactoriamente en la industria para el modelado de sistemas y su verificación, emergiendo del ámbito de las ciencias de la computación. Las contribuciones concretas de la tesis han sido: A) La identificación e interpretación de los árboles filogeneticos como modelos de la evolución, adaptados al entorno de las técnicas de model checking. B) La definición de una lógica temporal que captura las propiedades filogenéticas habituales junto con un método de construcción de propiedades. C) La clasificación de propiedades filogenéticas, identificando categorías de propiedades según estén centradas en la estructura del árbol, en las secuencias o sean híbridas. D) La extensión de las lógicas y modelos para contemplar propiedades cuantitativas de tiempo, probabilidad y de distancias. E) El desarrollo de un entorno para la verificación de propiedades booleanas, cuantitativas y paramétricas. F) El establecimiento de los principios para la manipulación simbolica de objetos filogenéticos, p. ej., clados. G) La explotación de las herramientas de model checking existentes, detectando sus problemas y carencias en el campo de filogenia y proponiendo mejoras. H) El desarrollo de técnicas "ad hoc" para obtener ganancia de complejidad alrededor de dos frentes: distribución de los cálculos y datos, y el uso de sistemas de información. Los puntos A-F se centran en las aportaciones conceptuales de nuestra aproximación, mientras que los puntos G-H enfatizan la parte de herramientas e implementación. Los contenidos de la tesis están contrastados por la comunidad científica mediante las siguientes publicaciones en conferencias y revistas internacionales. La introducción de model checking como entorno formal para analizar propiedades biológicas (puntos A-C) ha llevado a la publicación de nuestro primer artículo de congreso [1]. En [2], desarrollamos la verificación de hipótesis filogenéticas sobre un árbol de ejemplo construido a partir de las relaciones impuestas por un conjunto de proteínas codificadas por el ADN mitocondrial humano (ADNmt). En ese ejemplo, usamos una herramienta automática y genérica de model checking (punto G). El artículo de revista [7] resume lo básico de los artículos de congreso previos y extiende la aplicación de lógicas temporales a propiedades filogenéticas no consideradas hasta ahora. Los artículos citados aquí engloban los contenidos presentados en las Parte I--II de la tesis. El enorme tamaño de los árboles y la considerable cantidad de información asociada a los estados (p.ej., la cadena de ADN) obligan a la introducción de adaptaciones especiales en las herramientas de model checking para mantener un rendimiento razonable en la verificación de propiedades y aliviar también el problema de la explosión de estados (puntos G-H). El artículo de congreso [3] presenta las ventajas de rebanar el ADN asociado a los estados, la partición de la filogenia en pequeños subárboles y su distribución entre varias máquinas. Además, la idea original del model checking rebanado se complementa con la inclusión de una base de datos externa para el almacenamiento de secuencias. El artículo de revista [4] reúne las nociones introducidas en [3] junto con la implementación y resultados preliminares presentados [5]. Este tema se corresponde con lo presentado en la Parte III de la tesis. Para terminar, la tesis reaprovecha las extensiones de las lógicas temporales con tiempo explícito y probabilidades a fin de manipular e interrogar al árbol sobre información cuantitativa. El artículo de congreso [6] ejemplifica la necesidad de introducir probabilidades y tiempo discreto para el análisis filogenético de un fenotipo real, en este caso, el ratio de distribución de la intolerancia a la lactosa entre diversas poblaciones arraigadas en las hojas de la filogenia. Esto se corresponde con el Capítulo 13, que queda englobado dentro de las Partes IV--V. Las Partes IV--V completan los conceptos presentados en ese artículo de conferencia hacia otros dominios de aplicación, como la puntuación de árboles, y tiempo continuo (puntos E-F). La introducción de parámetros en las hipótesis filogenéticas se plantea como trabajo futuro. Referencias [1] Roberto Blanco, Gregorio de Miguel Casado, José Ignacio Requeno, and José Manuel Colom. Temporal logics for phylogenetic analysis via model checking. In Proceedings IEEE International Workshop on Mining and Management of Biological and Health Data, pages 152-157. IEEE, 2010. [2] José Ignacio Requeno, Roberto Blanco, Gregorio de Miguel Casado, and José Manuel Colom. Phylogenetic analysis using an SMV tool. In Miguel P. Rocha, Juan M. Corchado Rodríguez, Florentino Fdez-Riverola, and Alfonso Valencia, editors, Proceedings 5th International Conference on Practical Applications of Computational Biology and Bioinformatics, volume 93 of Advances in Intelligent and Soft Computing, pages 167-174. Springer, Berlin, 2011. [3] José Ignacio Requeno, Roberto Blanco, Gregorio de Miguel Casado, and José Manuel Colom. Sliced model checking for phylogenetic analysis. In Miguel P. Rocha, Nicholas Luscombe, Florentino Fdez-Riverola, and Juan M. Corchado Rodríguez, editors, Proocedings 6th International Conference on Practical Applications of Computational Biology and Bioinformatics, volume 154 of Advances in Intelligent and Soft Computing, pages 95-103. Springer, Berlin, 2012. [4] José Ignacio Requeno and José Manuel Colom. Model checking software for phylogenetic trees using distribution and database methods. Journal of Integrative Bioinformatics, 10(3):229-233, 2013. [5] José Ignacio Requeno and José Manuel Colom. Speeding up phylogenetic model checking. In Mohd Saberi Mohamad, Loris Nanni, Miguel P. Rocha, and Florentino Fdez-Riverola, editors, Proceedings 7th International Conference on Practical Applications of Computational Biology and Bioinformatics, volume 222 of Advances in Intelligent Systems and Computing, pages 119-126. Springer, Berlin, 2013. [6] José Ignacio Requeno and José Manuel Colom. Timed and probabilistic model checking over phylogenetic trees. In Miguel P. Rocha et al., editors, Proceedings 8th International Conference on Practical Applications of Computational Biology and Bioinformatics, Advances in Intelligent and Soft Computing. Springer, Berlin, 2014. [7] José Ignacio Requeno, Gregorio de Miguel Casado, Roberto Blanco, and José Manuel Colom. Temporal logics for phylogenetic analysis via model checking. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 10(4):1058-1070, 2013

    Darwinian perspectives on the evolution of human languages

    Get PDF
    Human languages evolve by a process of descent with modification in which parent languages give rise to daughter languages over time and in a manner that mimics the evolution of biological species. Descent with modification is just one of many parallels between biological and linguistic evolution that, taken together, offer up a Darwinian perspective on how languages evolve. Combined with statistical methods borrowed from evolutionary biology, this Darwinian perspective has brought new opportunities to the study of the evolution of human languages. These include the statistical inference of phylogenetic trees of languages, the study of how linguistic traits evolve over thousands of years of language change, the reconstruction of ancestral or proto-languages, and using language change to date historical events

    Reconstructing the evolution of Indo-European grammar

    Full text link
    This study uses phylogenetic methods adopted from computational biology in order to reconstruct features of Proto-Indo-European morphosyntax. We estimate the probability of the presence of typological features in Proto-Indo-European on the assumption that these features change according to a stochastic process governed by evolutionary transition rates between them. We compare these probabilities to previous reconstructions of Proto-Indo-European morphosyntax, which use either the comparative-historical method or implicational typology. We find that our reconstruction yields strong support for a canonical model (synthetic, nominative-accusative, headfinal) of the protolanguage and low support for any alternative model. Observing the evolutionary dynamics of features in our data set, we conclude that morphological features have slower rates of change, whereas syntactic traits change faster. Additionally, more frequent, unmarked traits in grammatical hierarchies have slower change rates when compared to less frequent, marked ones, which indicates that universal patterns of economy and frequency impact language change within the family. Keywords - Indo-European linguistics, historical linguistics, phylogenetic linguistics, typology, syntactic reconstructio

    Reconstructing the evolution of Indo-European grammar

    Full text link
    This study uses phylogenetic methods adopted from computational biology in order to reconstruct features of Proto-Indo-European morphosyntax. We estimate the probability of the presence of typological features in Proto-Indo-European on the assumption that these features change according to a stochastic process governed by evolutionary transition rates between them. We compare these probabilities to previous reconstructions of Proto-Indo-European morphosyntax, which use either the comparative-historical method or implicational typology. We find that our reconstruction yields strong support for a canonical model (synthetic, nominative-accusative, headfinal) of the protolanguage and low support for any alternative model. Observing the evolutionary dynamics of features in our data set, we conclude that morphological features have slower rates of change, whereas syntactic traits change faster. Additionally, more frequent, unmarked traits in grammatical hierarchies have slower change rates when compared to less frequent, marked ones, which indicates that universal patterns of economy and frequency impact language change within the family. Keywords - Indo-European linguistics, historical linguistics, phylogenetic linguistics, typology, syntactic reconstructio

    Phylogenetic signal in phonotactics

    Get PDF
    Phylogenetic methods have broad potential in linguistics beyond tree inference. Here, we show how a phylogenetic approach opens the possibility of gaining historical insights from entirely new kinds of linguistic data – in this instance, statistical phonotactics. We extract phonotactic data from 112 Pama-Nyungan vocabularies and apply tests for phylogenetic signal, quantifying the degree to which the data reflect phylogenetic history. We test three datasets: (1) binary variables recording the presence or absence of biphones (two-segment sequences) in a lexicon (2) frequencies of transitions between segments, and (3) frequencies of transitions between natural sound classes. Australian languages have been characterized as having a high degree of phonotactic homogeneity. Nevertheless, we detect phylogenetic signal in all datasets. Phylogenetic signal is greater in finer-grained frequency data than in binary data, and greatest in natural-class-based data. These results demonstrate the viability of employing a new source of readily extractable data in historical and comparative linguistics.1. Introduction 1.1 Motivations 1.2 Phonotactics as a source of historical signal 2. Phylogenetic signal 3. Materials 3.1 Language sample 3.2 Wordlists 3.3 Reference phylogeny 4. Phylogenetic signal in binary phonotactic data 4.1 Results for binary phonotactic data 4.2 Robustness checks 5. Phylogenetic signal in continuous phonotactic data 5.1 Robustness checks 5.2 Forward transitions versus backward transitions 5.3 Normalization of character values 6. Phylogenetic signal in natural-class-based characters 6.1 Natural-class-based characters versus biphones 7. Discussion 7.1 Overall robustness 7.2 Limitations 8. Conclusio

    Phylogenetic signal in phonotactics

    Full text link
    Phylogenetic methods have broad potential in linguistics beyond tree inference. Here, we show how a phylogenetic approach opens the possibility of gaining historical insights from entirely new kinds of linguistic data--in this instance, statistical phonotactics. We extract phonotactic data from 111 Pama-Nyungan vocabularies and apply tests for phylogenetic signal, quantifying the degree to which the data reflect phylogenetic history. We test three datasets: (1) binary variables recording the presence or absence of biphones (two-segment sequences) in a lexicon (2) frequencies of transitions between segments, and (3) frequencies of transitions between natural sound classes. Australian languages have been characterized as having a high degree of phonotactic homogeneity. Nevertheless, we detect phylogenetic signal in all datasets. Phylogenetic signal is greater in finer-grained frequency data than in binary data, and greatest in natural-class-based data. These results demonstrate the viability of employing a new source of readily extractable data in historical and comparative linguistics.Comment: Main text: 32 pages, 17 figures, 1 table. Supplementary Information: 17 pages, 1 figure. Code and data available at http://doi.org/10.5281/zenodo.3936353. This article is in review but not yet accepted for publication in a journa

    Matrilineal diversity and population history of Norwegians

    Get PDF
    Background While well known for its Viking past, Norway's population history and the influences that have shaped its genetic diversity are less well understood. This is particularly true with respect to its demography, migration patterns, and dialectal regions, despite there being curated historical records for the past several centuries. In this study, we undertook an analysis of mitochondrial DNA (mtDNA) diversity within the country to elaborate this history from a matrilineal genetic perspective. Methods We aggregated 1174 partial modern Norwegian mtDNA sequences from the published literature and subjected them to detailed statistical and phylogenetic analysis by dialectal regions and localities. We further contextualized the matrilineal ancestry of modern Norwegians with data from Mesolithic, Iron Age, and historic period populations. Results Modern Norwegian mtDNAs fell into eight West Eurasian (N, HV, JT, I, U, K, X, W), five East Eurasian (A, F, G, N11, Z), and one African (L2) haplogroups. Pairwise analysis of molecular variance (AMOVA) estimates for all Norwegians indicated they were differentiated from each other at 1.68% (p < 0.001). Norwegians within the same dialectal region also showed genetic similarities to each other, although differences between subpopulations within dialectal regions were also observed. In addition, certain mtDNA lineages in modern Norwegians were also found among prehistoric and historic period populations, suggesting some level of genetic continuity over hundreds to many thousands of years. Conclusions This analysis of mtDNA diversity provides a detailed picture of the genetic variation within Norway in light of its topography, settlement history, and historical migrations over the past several centuries.publishedVersio

    Comparative Reconstruction Probabilistically: The Role of Inventory and Phonotactics

    Get PDF
    I introduce a novel quantitative methodology for evaluating manual comparative reconstructions. This method is incumbent on the existence of a manual comparative reconstruction and, unlike previous quantitative methods, cannot give a result contradictory to the reconstruction. The primary goal for this framework is to reconcile traditional and quantitative methodologies and act as an objective and accessible platform for comparative reconstruction, thereby extending the scope of historical linguistics further into the past. A few theoretical corollaries of the framework are also presented. It is shown that the likelihood that a reconstruction is spurious is related to some of the phonological properties of the descendent language. This likelihood is inversely correlated with mean word-length and segmental inventory size. Additionally, most active phonological processes and cooccurrence restrictions in the language – such as phonotactic constraints, prosodic effects, segment harmony, and neutralization – all serve to increase the likelihood that a reconstruction to that language is spurious
    corecore