10 research outputs found

    When two trees go to war

    Get PDF
    Rooted phylogenetic networks are often constructed by combining trees, clusters, triplets or characters into a single network that in some well-defined sense simultaneously represents them all. We review these four models and investigate how they are related. In general, the model chosen influences the minimum number of reticulation events required. However, when one obtains the input data from two binary trees, we show that the minimum number of reticulations is independent of the model. The number of reticulations necessary to represent the trees, triplets, clusters (in the softwired sense) and characters (with unrestricted multiple crossover recombination) are all equal. Furthermore, we show that these results also hold when not the number of reticulations but the level of the constructed network is minimised. We use these unification results to settle several complexity questions that have been open in the field for some time. We also give explicit examples to show that already for data obtained from three binary trees the models begin to diverge

    Bridging trees for posterior inference on Ancestral Recombination Graphs

    Get PDF
    We present a new Markov chain Monte Carlo algorithm, implemented in software Arbores, for inferring the history of a sample of DNA sequences. Our principal innovation is a bridging procedure, previously applied only for simple stochastic processes, in which the local computations within a bridge can proceed independently of the rest of the DNA sequence, facilitating large-scale parallelisation.Comment: 23 pages, 9 figures, accepted for publication in Proceedings of the Royal Society

    Constructing the Simplest Possible Phylogenetic Network from Triplets

    Get PDF
    A phylogenetic network is a directed acyclic graph that visualises an evolutionary history containing so-called reticulations such as recombinations, hybridisations or lateral gene transfers. Here we consider the construction of a simplest possible phylogenetic network consistent with an input set T, where T contains at least one phylogenetic tree on three leaves (a triplet) for each combination of three taxa. To quantify the complexity of a network we consider both the total number of reticulations and the number of reticulations per biconnected component, called the level of the network. We give polynomial-time algorithms for constructing a level-1 respectively a level-2 network that contains a minimum number of reticulations and is consistent with T (if such a network exists). In addition, we show that if T is precisely equal to the set of triplets consistent with some network, then we can construct such a network with smallest possible level in time O(|T|^{k+1}), if k is a fixed upper bound on the level of the network

    Recombinational Landscape and Population Genomics of Caenorhabditis elegans

    Get PDF
    Recombination rate and linkage disequilibrium, the latter a function of population genomic processes, are the critical parameters for mapping by linkage and association, and their patterns in Caenorhabditis elegans are poorly understood. We performed high-density SNP genotyping on a large panel of recombinant inbred advanced intercross lines (RIAILs) of C. elegans to characterize the landscape of recombination and, on a panel of wild strains, to characterize population genomic patterns. We confirmed that C. elegans autosomes exhibit discrete domains of nearly constant recombination rate, and we show, for the first time, that the pattern holds for the X chromosome as well. The terminal domains of each chromosome, spanning about 7% of the genome, exhibit effectively no recombination. The RIAILs exhibit a 5.3-fold expansion of the genetic map. With median marker spacing of 61 kb, they are a powerful resource for mapping quantitative trait loci in C. elegans. Among 125 wild isolates, we identified only 41 distinct haplotypes. The patterns of genotypic similarity suggest that some presumed wild strains are laboratory contaminants. The Hawaiian strain, CB4856, exhibits genetic isolation from the remainder of the global population, whose members exhibit ample evidence of intercrossing and recombining. The population effective recombination rate, estimated from the pattern of linkage disequilibrium, is correlated with the estimated meiotic recombination rate, but its magnitude implies that the effective rate of outcrossing is extremely low, corroborating reports of selection against recombinant genotypes. Despite the low population, effective recombination rate and extensive linkage disequilibrium among chromosomes, which are techniques that account for background levels of genomic similarity, permit association mapping in wild C. elegans strains

    A general and efficient representation of ancestral recombination graphs

    Get PDF
    As a result of recombination, adjacent nucleotides can have different paths of genetic inheritance and therefore the genealogical trees for a sample of DNA sequences vary along the genome. The structure capturing the details of these intricately interwoven paths of inheritance is referred to as an ancestral recombination graph (ARG). Classical formalisms have focused on mapping coalescence and recombination events to the nodes in an ARG. However, this approach is out of step with some modern developments, which do not represent genetic inheritance in terms of these events or explicitly infer them. We present a simple formalism that defines an ARG in terms of specific genomes and their intervals of genetic inheritance, and show how it generalizes these classical treatments and encompasses the outputs of recent methods. We discuss nuances arising from this more general structure, and argue that it forms an appropriate basis for a software standard in this rapidly growing field.</p

    Genome-wide inference of ancestral recombination graphs

    Get PDF
    The complex correlation structure of a collection of orthologous DNA sequences is uniquely captured by the "ancestral recombination graph" (ARG), a complete record of coalescence and recombination events in the history of the sample. However, existing methods for ARG inference are computationally intensive, highly approximate, or limited to small numbers of sequences, and, as a consequence, explicit ARG inference is rarely used in applied population genomics. Here, we introduce a new algorithm for ARG inference that is efficient enough to apply to dozens of complete mammalian genomes. The key idea of our approach is to sample an ARG of n chromosomes conditional on an ARG of n-1 chromosomes, an operation we call "threading." Using techniques based on hidden Markov models, we can perform this threading operation exactly, up to the assumptions of the sequentially Markov coalescent and a discretization of time. An extension allows for threading of subtrees instead of individual sequences. Repeated application of these threading operations results in highly efficient Markov chain Monte Carlo samplers for ARGs. We have implemented these methods in a computer program called ARGweaver. Experiments with simulated data indicate that ARGweaver converges rapidly to the true posterior distribution and is effective in recovering various features of the ARG for dozens of sequences generated under realistic parameters for human populations. In applications of ARGweaver to 54 human genome sequences from Complete Genomics, we find clear signatures of natural selection, including regions of unusually ancient ancestry associated with balancing selection and reductions in allele age in sites under directional selection. Preliminary results also indicate that our methods can be used to gain insight into complex features of human population structure, even with a noninformative prior distribution.Comment: 88 pages, 7 main figures, 22 supplementary figures. This version contains a substantially expanded genomic data analysi
    corecore