1,356 research outputs found

    Circumstances in which parsimony but not compatibility will be provably misleading

    Full text link
    Phylogenetic methods typically rely on an appropriate model of how data evolved in order to infer an accurate phylogenetic tree. For molecular data, standard statistical methods have provided an effective strategy for extracting phylogenetic information from aligned sequence data when each site (character) is subject to a common process. However, for other types of data (e.g. morphological data), characters can be too ambiguous, homoplastic or saturated to develop models that are effective at capturing the underlying process of change. To address this, we examine the properties of a classic but neglected method for inferring splits in an underlying tree, namely, maximum compatibility. By adopting a simple and extreme model in which each character either fits perfectly on some tree, or is entirely random (but it is not known which class any character belongs to) we are able to derive exact and explicit formulae regarding the performance of maximum compatibility. We show that this method is able to identify a set of non-trivial homoplasy-free characters, when the number nn of taxa is large, even when the number of random characters is large. By contrast, we show that a method that makes more uniform use of all the data --- maximum parsimony --- can provably estimate trees in which {\em none} of the original homoplasy-free characters support splits.Comment: 37 pages, 2 figure

    Algorithms for weighted multidimensional search and perfect phylogeny

    Get PDF
    This dissertation is a collection of papers from two independent areas: convex optimization problems in R[superscript]d and the construction of evolutionary trees;The paper on convex optimization problems in R[superscript]d gives improved algorithms for solving the Lagrangian duals of problems that have both of the following properties. First, in absence of the bad constraints, the problems can be solved in strongly polynomial time by combinatorial algorithms. Second, the number of bad constraints is fixed. As part of our solution to these problems, we extend Cole\u27s circuit simulation approach and develop a weighted version of Megiddo\u27s multidimensional search technique;The papers on evolutionary tree construction deal with the perfect phylogeny problem, where species are specified by a set of characters and each character can occur in a species in one of a fixed number of states. This problem is known to be NP-complete. The dissertation contains the following results on the perfect phylogeny problem: (1) A linear time algorithm when all the characters have two states. (2) A polynomial time algorithm when the number of character states is fixed. (3) A polynomial time algorithm when the number of characters is fixed

    Detecting adaptive evolution in phylogenetic comparative analysis using the Ornstein-Uhlenbeck model

    Full text link
    Phylogenetic comparative analysis is an approach to inferring evolutionary process from a combination of phylogenetic and phenotypic data. The last few years have seen increasingly sophisticated models employed in the evaluation of more and more detailed evolutionary hypotheses, including adaptive hypotheses with multiple selective optima and hypotheses with rate variation within and across lineages. The statistical performance of these sophisticated models has received relatively little systematic attention, however. We conducted an extensive simulation study to quantify the statistical properties of a class of models toward the simpler end of the spectrum that model phenotypic evolution using Ornstein-Uhlenbeck processes. We focused on identifying where, how, and why these methods break down so that users can apply them with greater understanding of their strengths and weaknesses. Our analysis identifies three key determinants of performance: a discriminability ratio, a signal-to-noise ratio, and the number of taxa sampled. Interestingly, we find that model-selection power can be high even in regions that were previously thought to be difficult, such as when tree size is small. On the other hand, we find that model parameters are in many circumstances difficult to estimate accurately, indicating a relative paucity of information in the data relative to these parameters. Nevertheless, we note that accurate model selection is often possible when parameters are only weakly identified. Our results have implications for more sophisticated methods inasmuch as the latter are generalizations of the case we study.Comment: 38 pages, in press at Systematic Biolog

    Finding Optimal Triangulations Parameterized by Edge Clique Cover

    Get PDF
    Peer reviewe

    Phylogenetic signal in phonotactics

    Full text link
    Phylogenetic methods have broad potential in linguistics beyond tree inference. Here, we show how a phylogenetic approach opens the possibility of gaining historical insights from entirely new kinds of linguistic data--in this instance, statistical phonotactics. We extract phonotactic data from 111 Pama-Nyungan vocabularies and apply tests for phylogenetic signal, quantifying the degree to which the data reflect phylogenetic history. We test three datasets: (1) binary variables recording the presence or absence of biphones (two-segment sequences) in a lexicon (2) frequencies of transitions between segments, and (3) frequencies of transitions between natural sound classes. Australian languages have been characterized as having a high degree of phonotactic homogeneity. Nevertheless, we detect phylogenetic signal in all datasets. Phylogenetic signal is greater in finer-grained frequency data than in binary data, and greatest in natural-class-based data. These results demonstrate the viability of employing a new source of readily extractable data in historical and comparative linguistics.Comment: Main text: 32 pages, 17 figures, 1 table. Supplementary Information: 17 pages, 1 figure. Code and data available at http://doi.org/10.5281/zenodo.3936353. This article is in review but not yet accepted for publication in a journa

    Phylogenetic signal in phonotactics

    Get PDF
    Phylogenetic methods have broad potential in linguistics beyond tree inference. Here, we show how a phylogenetic approach opens the possibility of gaining historical insights from entirely new kinds of linguistic data – in this instance, statistical phonotactics. We extract phonotactic data from 112 Pama-Nyungan vocabularies and apply tests for phylogenetic signal, quantifying the degree to which the data reflect phylogenetic history. We test three datasets: (1) binary variables recording the presence or absence of biphones (two-segment sequences) in a lexicon (2) frequencies of transitions between segments, and (3) frequencies of transitions between natural sound classes. Australian languages have been characterized as having a high degree of phonotactic homogeneity. Nevertheless, we detect phylogenetic signal in all datasets. Phylogenetic signal is greater in finer-grained frequency data than in binary data, and greatest in natural-class-based data. These results demonstrate the viability of employing a new source of readily extractable data in historical and comparative linguistics.1. Introduction 1.1 Motivations 1.2 Phonotactics as a source of historical signal 2. Phylogenetic signal 3. Materials 3.1 Language sample 3.2 Wordlists 3.3 Reference phylogeny 4. Phylogenetic signal in binary phonotactic data 4.1 Results for binary phonotactic data 4.2 Robustness checks 5. Phylogenetic signal in continuous phonotactic data 5.1 Robustness checks 5.2 Forward transitions versus backward transitions 5.3 Normalization of character values 6. Phylogenetic signal in natural-class-based characters 6.1 Natural-class-based characters versus biphones 7. Discussion 7.1 Overall robustness 7.2 Limitations 8. Conclusio

    Tree Stochastic Processes

    Get PDF
    Stochastic processes play a vital role in understanding the development of many natural and computational systems over time. In this thesis, we will study two settings where stochastic processes on trees play a significant role. The first setting is in the reconstruction of evolutionary trees from biological sequence data. Most previous work done in this area has assumed that different positions in a sequence evolve independently. This independence however is a strong assumption that has been shown to possibly cause inaccuracies in the reconstructed trees \cite{schoniger1994stochastic,tillier1995neighbor}. In our work, we provide a first step toward realizing the effects of dependency in such situations by creating a model in which two positions may evolve dependently. For two characters with transition matrices M1M_1 and M2M_2, their joint transition matrix is the tensor product M1⊗M2M_1 \otimes M_2. Our dependence model modifies the joint transition matrix by adding an `error matrix,\u27 a matrix with rows summing to 0. We show when such dependence can be detected. The second setting concerns computing in the presence of faults. In pushing the limits of computing hardware, there is tradeoff between the reliability of components and their cost (e.g. \cite{kadric2014energy}). We first examine a method of identifying faulty gates in a read-once formula when our access is limited to providing an input and reading its output. We show that determining \emph{whether} a fault exists can always be done, and that locating these faults can be done efficiently as long as the read-once formula satisfies a certain balance condition. Finally for a fixed topology, we provide a dynamic program which allows us to optimize how to allocate resources to individual gates so as to optimize the reliability of the whole system under a known input product distribution

    The pleasures and perils of Darwinizing culture (with phylogenies)

    No full text
    • …
    corecore