1,443 research outputs found

    Counting Coalescent Histories

    Full text link
    Given a species tree and a gene tree, a valid coalescent history is a list of the branches of the species tree on which coalescences in the gene tree take place. I develop a recursion for the number of valid coalescent histories that exist for an arbitrary gene tree/species tree pair, when one gene lineage is studied per species. The result is obtained by defining a concept of m-extended coalescent histories, enumerating and counting these histories, and taking the special case of m = 1. As a sum over valid coalescent histories appears in a formula for the probability that a random gene tree evolving along the branches of a fixed species tree has a specified labeled topology, the enumeration of valid coalescent histories can considerably reduce the effort required for evaluating this formula.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/63175/1/cmb.2006.0109.pd

    Coalescent histories for lodgepole species trees

    Full text link
    Coalescent histories are combinatorial structures that describe for a given gene tree and species tree the possible lists of branches of the species tree on which the gene tree coalescences take place. Properties of the number of coalescent histories for gene trees and species trees affect a variety of probabilistic calculations in mathematical phylogenetics. Exact and asymptotic evaluations of the number of coalescent histories, however, are known only in a limited number of cases. Here we introduce a particular family of species trees, the \emph{lodgepole} species trees (λn)n0(\lambda_n)_{n\geq 0}, in which tree λn\lambda_n has m=2n+1m=2n+1 taxa. We determine the number of coalescent histories for the lodgepole species trees, in the case that the gene tree matches the species tree, showing that this number grows with m!!m!! in the number of taxa mm. This computation demonstrates the existence of tree families in which the growth in the number of coalescent histories is faster than exponential. Further, it provides a substantial improvement on the lower bound for the ratio of the largest number of matching coalescent histories to the smallest number of matching coalescent histories for trees with mm taxa, increasing a previous bound of (π/32)[(5m12)/(4m6)]mm(\sqrt{\pi} / 32)[(5m-12)/(4m-6)] m \sqrt{m} to [m1/(4e)]m[ \sqrt{m-1}/(4 \sqrt{e}) ]^{m}. We discuss the implications of our enumerative results for phylogenetic computations

    Importance sampling for Lambda-coalescents in the infinitely many sites model

    Full text link
    We present and discuss new importance sampling schemes for the approximate computation of the sample probability of observed genetic types in the infinitely many sites model from population genetics. More specifically, we extend the 'classical framework', where genealogies are assumed to be governed by Kingman's coalescent, to the more general class of Lambda-coalescents and develop further Hobolth et. al.'s (2008) idea of deriving importance sampling schemes based on 'compressed genetrees'. The resulting schemes extend earlier work by Griffiths and Tavar\'e (1994), Stephens and Donnelly (2000), Birkner and Blath (2008) and Hobolth et. al. (2008). We conclude with a performance comparison of classical and new schemes for Beta- and Kingman coalescents.Comment: (38 pages, 40 figures

    Inference of Ancestral Recombination Graphs through Topological Data Analysis

    Get PDF
    The recent explosion of genomic data has underscored the need for interpretable and comprehensive analyses that can capture complex phylogenetic relationships within and across species. Recombination, reassortment and horizontal gene transfer constitute examples of pervasive biological phenomena that cannot be captured by tree-like representations. Starting from hundreds of genomes, we are interested in the reconstruction of potential evolutionary histories leading to the observed data. Ancestral recombination graphs represent potential histories that explicitly accommodate recombination and mutation events across orthologous genomes. However, they are computationally costly to reconstruct, usually being infeasible for more than few tens of genomes. Recently, Topological Data Analysis (TDA) methods have been proposed as robust and scalable methods that can capture the genetic scale and frequency of recombination. We build upon previous TDA developments for detecting and quantifying recombination, and present a novel framework that can be applied to hundreds of genomes and can be interpreted in terms of minimal histories of mutation and recombination events, quantifying the scales and identifying the genomic locations of recombinations. We implement this framework in a software package, called TARGet, and apply it to several examples, including small migration between different populations, human recombination, and horizontal evolution in finches inhabiting the Gal\'apagos Islands.Comment: 33 pages, 12 figures. The accompanying software, instructions and example files used in the manuscript can be obtained from https://github.com/RabadanLab/TARGe

    A polynomial time algorithm for calculating the probability of a ranked gene tree given a species tree

    Get PDF
    In this paper, we provide a polynomial time algorithm to calculate the probability of a {\it ranked} gene tree topology for a given species tree, where a ranked tree topology is a tree topology with the internal vertices being ordered. The probability of a gene tree topology can thus be calculated in polynomial time if the number of orderings of the internal vertices is a polynomial number. However, the complexity of calculating the probability of a gene tree topology with an exponential number of rankings for a given species tree remains unknown

    Enumeration of coalescent histories for caterpillar species trees and pp-pseudocaterpillar gene trees

    Full text link
    For a fixed set XX containing nn taxon labels, an ordered pair consisting of a gene tree topology GG and a species tree SS bijectively labeled with the labels of XX possesses a set of coalescent histories -- mappings from the set of internal nodes of GG to the set of edges of SS describing possible lists of edges in SS on which the coalescences in GG take place. Enumerations of coalescent histories for gene trees and species trees have produced suggestive results regarding the pairs (G,S)(G,S) that, for a fixed nn, have the largest number of coalescent histories. We define a class of 2-cherry binary tree topologies that we term pp-pseudocaterpillars, examining coalescent histories for non-matching pairs (G,S)(G,S), in the case in which SS has a caterpillar shape and GG has a pp-pseudocaterpillar shape. Using a construction that associates coalescent histories for (G,S)(G,S) with a class of "roadblocked" monotonic paths, we identify the pp-pseudocaterpillar labeled gene tree topology that, for a fixed caterpillar labeled species tree topology, gives rise to the largest number of coalescent histories. The shape that maximizes the number of coalescent histories places the "second" cherry of the pp-pseudocaterpillar equidistantly from the root of the "first" cherry and from the tree root. A symmetry in the numbers of coalescent histories for pp-pseudocaterpillar gene trees and caterpillar species trees is seen to exist around the maximizing value of the parameter pp. The results provide insight into the factors that influence the number of coalescent histories possible for a given gene tree and species tree

    The Time Machine: A Simulation Approach for Stochastic Trees

    Full text link
    In the following paper we consider a simulation technique for stochastic trees. One of the most important areas in computational genetics is the calculation and subsequent maximization of the likelihood function associated to such models. This typically consists of using importance sampling (IS) and sequential Monte Carlo (SMC) techniques. The approach proceeds by simulating the tree, backward in time from observed data, to a most recent common ancestor (MRCA). However, in many cases, the computational time and variance of estimators are often too high to make standard approaches useful. In this paper we propose to stop the simulation, subsequently yielding biased estimates of the likelihood surface. The bias is investigated from a theoretical point of view. Results from simulation studies are also given to investigate the balance between loss of accuracy, saving in computing time and variance reduction.Comment: 22 Pages, 5 Figure
    corecore