2,186 research outputs found

    An efficiently computed lower bound on the number of recombinations in phylogenetic networks: Theory and empirical study

    Get PDF
    AbstractPhylogenetic networks are models of sequence evolution that go beyond trees, allowing biological operations that are not tree-like. One of the most important biological operations is recombination between two sequences. An established problem [J. Hein, Reconstructing evolution of sequences subject to recombination using parsimony, Math. Biosci. 98 (1990) 185–200; J. Hein, A heuristic method to reconstruct the history of sequences subject to recombination, J. Molecular Evoluation 36 (1993) 396–405; Y. Song, J. Hein, Parsimonious reconstruction of sequence evolution and haplotype blocks: finding the minimum number of recombination events, in: Proceedings of 2003 Workshop on Algorithms in Bioinformatics, Berlin, Germany, 2003, Lecture Notes in Computer Science, Springer, Berlin; Y. Song, J. Hein, On the minimum number of recombination events in the evolutionary history of DNA sequences, J. Math. Biol. 48 (2003) 160–186; L. Wang, K. Zhang, L. Zhang, Perfect phylogenetic networks with recombination, J. Comput. Biol. 8 (2001) 69–78; S.R. Myers, R.C. Griffiths, Bounds on the minimum number of recombination events in a sample history, Genetics 163 (2003) 375–394; V. Bafna, V. Bansal, Improved recombination lower bounds for haplotype data, in: Proceedings of RECOMB, 2005; Y. Song, Y. Wu, D. Gusfield, Efficient computation of close lower and upper bounds on the minimum number of needed recombinations in the evolution of biological sequences, Bioinformatics 21 (2005) i413–i422. Bioinformatics (Suppl. 1), Proceedings of ISMB, 2005, D. Gusfield, S. Eddhu, C. Langley, Optimal, efficient reconstruction of phylogenetic networks with constrained recombination, J. Bioinform. Comput. Biol. 2(1) (2004) 173–213; D. Gusfield, Optimal, efficient reconstruction of root-unknown phylogenetic networks with constrained and structured recombination, J. Comput. Systems Sci. 70 (2005) 381–398] is to find a phylogenetic network that derives an input set of sequences, minimizing the number of recombinations used. No efficient, general algorithm is known for this problem. Several papers consider the problem of computing a lower bound on the number of recombinations needed. In this paper we establish a new, efficiently computed lower bound. This result is useful in methods to estimate the number of needed recombinations, and also to prove the optimality of algorithms for constructing phylogenetic networks under certain conditions [D. Gusfield, S. Eddhu, C. Langley, Optimal, efficient reconstruction of phylogenetic networks with constrained recombination, J. Bioinform. Comput. Biol. 2(1) (2004) 173–213; D. Gusfield, Optimal, efficient reconstruction of root-unknown phylogenetic networks with constrained and structured recombination, J. Comput. Systems Sci. 70 (2005) 381–398; D. Gusfield, Optimal, efficient reconstruction of root-unknown phylogenetic networks with constrained recombination, Technical Report, Department of Computer Science, University of California, Davis, CA, 2004]. The lower bound is based on a structural, combinatorial insight, using only the site conflicts and incompatibilities, and hence it is fundamental and applicable to many biological phenomena other than recombination, for example, when gene conversions or recurrent or back mutations or cross-species hybridizations cause the phylogenetic history to deviate from a tree structure. In addition to establishing the bound, we examine its use in more complex lower bound methods, and compare the bounds obtained to those obtained by other established lower bound methods

    Inference of Ancestral Recombination Graphs through Topological Data Analysis

    Get PDF
    The recent explosion of genomic data has underscored the need for interpretable and comprehensive analyses that can capture complex phylogenetic relationships within and across species. Recombination, reassortment and horizontal gene transfer constitute examples of pervasive biological phenomena that cannot be captured by tree-like representations. Starting from hundreds of genomes, we are interested in the reconstruction of potential evolutionary histories leading to the observed data. Ancestral recombination graphs represent potential histories that explicitly accommodate recombination and mutation events across orthologous genomes. However, they are computationally costly to reconstruct, usually being infeasible for more than few tens of genomes. Recently, Topological Data Analysis (TDA) methods have been proposed as robust and scalable methods that can capture the genetic scale and frequency of recombination. We build upon previous TDA developments for detecting and quantifying recombination, and present a novel framework that can be applied to hundreds of genomes and can be interpreted in terms of minimal histories of mutation and recombination events, quantifying the scales and identifying the genomic locations of recombinations. We implement this framework in a software package, called TARGet, and apply it to several examples, including small migration between different populations, human recombination, and horizontal evolution in finches inhabiting the Gal\'apagos Islands.Comment: 33 pages, 12 figures. The accompanying software, instructions and example files used in the manuscript can be obtained from https://github.com/RabadanLab/TARGe

    The inference of gene trees with species trees

    Get PDF
    Molecular phylogeny has focused mainly on improving models for the reconstruction of gene trees based on sequence alignments. Yet, most phylogeneticists seek to reveal the history of species. Although the histories of genes and species are tightly linked, they are seldom identical, because genes duplicate, are lost or horizontally transferred, and because alleles can co-exist in populations for periods that may span several speciation events. Building models describing the relationship between gene and species trees can thus improve the reconstruction of gene trees when a species tree is known, and vice-versa. Several approaches have been proposed to solve the problem in one direction or the other, but in general neither gene trees nor species trees are known. Only a few studies have attempted to jointly infer gene trees and species trees. In this article we review the various models that have been used to describe the relationship between gene trees and species trees. These models account for gene duplication and loss, transfer or incomplete lineage sorting. Some of them consider several types of events together, but none exists currently that considers the full repertoire of processes that generate gene trees along the species tree. Simulations as well as empirical studies on genomic data show that combining gene tree-species tree models with models of sequence evolution improves gene tree reconstruction. In turn, these better gene trees provide a better basis for studying genome evolution or reconstructing ancestral chromosomes and ancestral gene sequences. We predict that gene tree-species tree methods that can deal with genomic data sets will be instrumental to advancing our understanding of genomic evolution.Comment: Review article in relation to the "Mathematical and Computational Evolutionary Biology" conference, Montpellier, 201

    When two trees go to war

    Get PDF
    Rooted phylogenetic networks are often constructed by combining trees, clusters, triplets or characters into a single network that in some well-defined sense simultaneously represents them all. We review these four models and investigate how they are related. In general, the model chosen influences the minimum number of reticulation events required. However, when one obtains the input data from two binary trees, we show that the minimum number of reticulations is independent of the model. The number of reticulations necessary to represent the trees, triplets, clusters (in the softwired sense) and characters (with unrestricted multiple crossover recombination) are all equal. Furthermore, we show that these results also hold when not the number of reticulations but the level of the constructed network is minimised. We use these unification results to settle several complexity questions that have been open in the field for some time. We also give explicit examples to show that already for data obtained from three binary trees the models begin to diverge

    Algorithms for Visualizing Phylogenetic Networks

    Full text link
    We study the problem of visualizing phylogenetic networks, which are extensions of the Tree of Life in biology. We use a space filling visualization method, called DAGmaps, in order to obtain clear visualizations using limited space. In this paper, we restrict our attention to galled trees and galled networks and present linear time algorithms for visualizing them as DAGmaps.Comment: Appears in the Proceedings of the 24th International Symposium on Graph Drawing and Network Visualization (GD 2016
    • …
    corecore