7 research outputs found

    Inference of parsimonious species phylogenies from multi-locus data

    Get PDF
    The main focus of this dissertation is the inference of species phylogenies, i.e. evolutionary histories of species. Species phylogenies allow us to gain insights into the mechanisms of evolution and to hypothesize past evolutionary events. They also find applications in medicine, for example, the understanding of antibiotic resistance in bacteria. The reconstruction of species phylogenies is, therefore, of both biological and practical importance. In the traditional method for inferring species trees from genetic data, we sequence a single locus in species genomes, reconstruct a gene tree, and report it as the species tree. Biologists have long acknowledged that a gene tree can be different from a species tree, thus implying that this traditional method might infer the wrong species tree. Moreover, reticulate events such as horizontal gene transfer and hybridization make the evolution of species no longer tree-like. The availability of multi-locus data provides us with excellent opportunities to resolve those long standing problems. In this dissertation, we present parsimony-based algorithms for reconciling species/gene tree incongruence that is assumed to be due solely to lineage sorting. We also describe a unified framework for detecting hybridization despite lineage sorting. To address the first problem of species/gene tree incongruence caused by lineage sorting, we present three algorithms. In Chapter 3, we present an algorithm based on an integer-linear programming (ILP) formula to infer the species tree's topology and divergence times from multiple gene trees. In Chapter 4, we describe two methods that infer the species tree by minimizing deep coalescences (MDC), a criterion introduced by Maddison in 1997. The first method is also based on an ILP formula, but it eliminates the enumeration phase of candidate species trees of the algorithm in Chapter 3. The second algorithm further eliminates the dependence on external ILP solvers by employing dynamic programming. We ran those methods on both biological and simulated data, and experimental results demonstrate their high accuracy and speed in species tree inference, which makes them suitable for analyzing multi-locus data. The second problem this dissertation deals with is reticulation (e.g., horizontal gene transfer, hybridization) detection despite lineage sorting. The phylogeny-based approach compares the evolutionary histories of different genomic regions and test them for incongruence that would indicate hybridization. However, since species tree and gene tree incongruence can also be due to lineage sorting, phylogeny-based hybridization methods might overestimate the amount of hybridization. We present in this dissertation a framework that can handle both hybridization and lineage sorting simultaneously. In this framework, we extend the MDC criterion to phylogenetic networks, and use it to propose a heuristic to detect hybridization despite lineage sorting. Empirical results on a simulated and a yeast data set show its promising performance, as well as several directions for future research

    Consistency Properties of Species Tree Inference by Minimizing Deep Coalescences

    Full text link
    Methods for inferring species trees from sets of gene trees need to account for the possibility of discordance among the gene trees. Assuming that discordance is caused by incomplete lineage sorting, species tree estimates can be obtained by finding those species trees that minimize the number of -deep- coalescence events required for a given collection of gene trees. Efficient algorithms now exist for applying the minimizing-deep-coalescence (MDC) criterion, and simulation experiments have demonstrated its promising performance. However, it has also been noted from simulation results that the MDC criterion is not always guaranteed to infer the correct species tree estimate. In this article, we investigate the consistency of the MDC criterion. Using the multipscies coalescent model, we show that there are indeed anomaly zones for the MDC criterion for asymmetric four-taxon species tree topologies, and for all species tree topologies with five or more taxa.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/90434/1/cmb-2E2010-2E0102.pd

    PhyloNet: a software package for analyzing and reconstructing reticulate evolutionary relationships

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Phylogenies, i.e., the evolutionary histories of groups of taxa, play a major role in representing the interrelationships among biological entities. Many software tools for reconstructing and evaluating such phylogenies have been proposed, almost all of which assume the underlying evolutionary history to be a tree. While trees give a satisfactory first-order approximation for many families of organisms, other families exhibit evolutionary mechanisms that cannot be represented by trees. Processes such as horizontal gene transfer (HGT), hybrid speciation, and interspecific recombination, collectively referred to as <it>reticulate evolutionary events</it>, result in <it>networks</it>, rather than trees, of relationships. Various software tools have been recently developed to analyze reticulate evolutionary relationships, which include SplitsTree4, LatTrans, EEEP, HorizStory, and T-REX.</p> <p>Results</p> <p>In this paper, we report on the PhyloNet software package, which is a suite of tools for analyzing reticulate evolutionary relationships, or <it>evolutionary networks</it>, which are rooted, directed, acyclic graphs, leaf-labeled by a set of taxa. These tools can be classified into four categories: (1) evolutionary network representation: reading/writing evolutionary networks in a newly devised compact form; (2) evolutionary network characterization: analyzing evolutionary networks in terms of three basic building blocks – trees, clusters, and tripartitions; (3) evolutionary network comparison: comparing two evolutionary networks in terms of topological dissimilarities, as well as fitness to sequence evolution under a maximum parsimony criterion; and (4) evolutionary network reconstruction: reconstructing an evolutionary network from a species tree and a set of gene trees.</p> <p>Conclusion</p> <p>The software package, PhyloNet, offers an array of utilities to allow for efficient and accurate analysis of evolutionary networks. The software package will help significantly in analyzing large data sets, as well as in studying the performance of evolutionary network reconstruction methods. Further, the software package supports the proposed eNewick format for compact representation of evolutionary networks, a feature that allows for efficient interoperability of evolutionary network software tools. Currently, all utilities in PhyloNet are invoked on the command line.</p