94 research outputs found

    Coalescent histories for lodgepole species trees

    Full text link
    Coalescent histories are combinatorial structures that describe for a given gene tree and species tree the possible lists of branches of the species tree on which the gene tree coalescences take place. Properties of the number of coalescent histories for gene trees and species trees affect a variety of probabilistic calculations in mathematical phylogenetics. Exact and asymptotic evaluations of the number of coalescent histories, however, are known only in a limited number of cases. Here we introduce a particular family of species trees, the \emph{lodgepole} species trees (λn)n0(\lambda_n)_{n\geq 0}, in which tree λn\lambda_n has m=2n+1m=2n+1 taxa. We determine the number of coalescent histories for the lodgepole species trees, in the case that the gene tree matches the species tree, showing that this number grows with m!!m!! in the number of taxa mm. This computation demonstrates the existence of tree families in which the growth in the number of coalescent histories is faster than exponential. Further, it provides a substantial improvement on the lower bound for the ratio of the largest number of matching coalescent histories to the smallest number of matching coalescent histories for trees with mm taxa, increasing a previous bound of (π/32)[(5m12)/(4m6)]mm(\sqrt{\pi} / 32)[(5m-12)/(4m-6)] m \sqrt{m} to [m1/(4e)]m[ \sqrt{m-1}/(4 \sqrt{e}) ]^{m}. We discuss the implications of our enumerative results for phylogenetic computations

    A lattice structure for ancestral configurations arising from the relationship between gene trees and species trees

    Full text link
    To a given gene tree topology GG and species tree topology SS with leaves labeled bijectively from a fixed set XX, one can associate a set of ancestral configurations, each of which encodes a set of gene lineages that can be found at a given node of a species tree. We introduce a lattice structure on ancestral configurations, studying the directed graphs that provide graphical representations of lattices of ancestral configurations. For a matching gene tree topology and species tree topology, we present a method for defining the digraph of ancestral configurations from the tree topology by using iterated cartesian products of graphs. We show that a specific set of paths on the digraph of ancestral configurations is in bijection with the set of labeled histories -- a well-known phylogenetic object that enumerates possible temporal orderings of the coalescences of a tree. For each of a series of tree families, we obtain closed-form expressions for the number of labeled histories by using this bijection to count paths on associated digraphs. Finally, we prove that our lattice construction extends to nonmatching tree pairs, and we use it to characterize pairs (G,S)(G,S) having the maximal number of ancestral configurations for a fixed GG. We discuss how the construction provides new methods for performing enumerations of combinatorial aspects of gene and species trees.Comment: 20 pages, 15 figures. This version contains reference updates, first author name update, minor changes to the tex

    Enumeration of coalescent histories for caterpillar species trees and pp-pseudocaterpillar gene trees

    Full text link
    For a fixed set XX containing nn taxon labels, an ordered pair consisting of a gene tree topology GG and a species tree SS bijectively labeled with the labels of XX possesses a set of coalescent histories -- mappings from the set of internal nodes of GG to the set of edges of SS describing possible lists of edges in SS on which the coalescences in GG take place. Enumerations of coalescent histories for gene trees and species trees have produced suggestive results regarding the pairs (G,S)(G,S) that, for a fixed nn, have the largest number of coalescent histories. We define a class of 2-cherry binary tree topologies that we term pp-pseudocaterpillars, examining coalescent histories for non-matching pairs (G,S)(G,S), in the case in which SS has a caterpillar shape and GG has a pp-pseudocaterpillar shape. Using a construction that associates coalescent histories for (G,S)(G,S) with a class of "roadblocked" monotonic paths, we identify the pp-pseudocaterpillar labeled gene tree topology that, for a fixed caterpillar labeled species tree topology, gives rise to the largest number of coalescent histories. The shape that maximizes the number of coalescent histories places the "second" cherry of the pp-pseudocaterpillar equidistantly from the root of the "first" cherry and from the tree root. A symmetry in the numbers of coalescent histories for pp-pseudocaterpillar gene trees and caterpillar species trees is seen to exist around the maximizing value of the parameter pp. The results provide insight into the factors that influence the number of coalescent histories possible for a given gene tree and species tree

    Organellar inheritance in the green lineage: insights from Ostreococcus tauri

    Get PDF
    Along the green lineage (Chlorophyta and Streptophyta), mitochondria and chloroplast are mainly uniparentally transmitted and their evolution is thus clonal. The mode of organellar inheritance in their ancestor is less certain. The inability to make clear phylogenetic inference is partly due to a lack of information for deep branching organisms in this lineage. Here, we investigate organellar evolution in the early branching green alga Ostreococcus tauri using population genomics data from the complete mitochondrial and chloroplast genomes. The haplotype structure is consistent with clonal evolution in mitochondria, while we find evidence for recombination in the chloroplast genome. The number of recombination events in the genealogy of the chloroplast suggests that recombination, and thus biparental inheritance, is not rare. Consistent with the evidence of recombination, we find that the ratio of the number of nonsynonymous to the synonymous polymorphisms per site is lower in chloroplast than in the mitochondria genome. We also find evidence for the segregation of two selfish genetic elements in the chloroplast. These results shed light on the role of recombination and the evolutionary history of organellar inheritance in the green lineage

    Similarity thresholds used in DNA sequence assembly from short reads can reduce the comparability of population histories across species

    Get PDF
    Comparing inferences among datasets generated using short read sequencing may provide insight into the concerted impacts of divergence, gene flow and selection across organisms, but comparisons are complicated by biases introduced during dataset assembly. Sequence similarity thresholds allow the de novo assembly of short reads into clusters of alleles representing different loci, but the resulting datasets are sensitive to both the similarity threshold used and to the variation naturally present in the organism under study. Thresholds that require high sequence similarity among reads for assembly (stringent thresholds) as well as highly variable species may result in datasets in which divergent alleles are lost or divided into separate loci (‘over-splitting’), whereas liberal thresholds increase the risk of paralogous loci being combined into a single locus (‘under-splitting’). Comparisons among datasets or species are therefore potentially biased if different similarity thresholds are applied or if the species differ in levels of within-lineage genetic variation. We examine the impact of a range of similarity thresholds on assembly of empirical short read datasets from populations of four different non-model bird lineages (species or species pairs) with different levels of genetic divergence. We find that, in all species, stringent similarity thresholds result in fewer alleles per locus than more liberal thresholds, which appears to be the result of high levels of over-splitting. The frequency of putative under-splitting, conversely, is low at all thresholds. Inferred genetic distances between individuals, gene tree depths, and estimates of the ancestral mutation-scaled effective population size (θ) differ depending upon the similarity threshold applied. Relative differences in inferences across species differ even when the same threshold is applied, but may be dramatically different when datasets assembled under different thresholds are compared. These differences not only complicate comparisons across species, but also preclude the application of standard mutation rates for parameter calibration. We suggest some best practices for assembling short read data to maximize comparability, such as using more liberal thresholds and examining the impact of different thresholds on each dataset

    Importance of incomplete lineage sorting and introgression in the origin of shared genetic variation between two closely related pines with overlapping distributions

    Get PDF
    Genetic variation shared between closely related species may be due to retention of ancestral polymorphisms because of incomplete lineage sorting (ILS) and/or introgression following secondary contact. It is challenging to distinguish ILS and introgression because they generate similar patterns of shared genetic diversity, but this is nonetheless essential for inferring accurately the history of species with overlapping distributions. To address this issue, we sequenced 33 independent intron loci across the genome of two closely related pine species (Pinus massoniana Lamb. and Pinus hwangshanensis Hisa) from Southeast China. Population structure analyses revealed that the species showed slightly more admixture in parapatric populations than in allopatric populations. Levels of interspecific differentiation were lower in parapatry than in allopatry. Approximate Bayesian computation suggested that the most likely speciation scenario explaining this pattern was a long period of isolation followed by a secondary contact. Ecological niche modeling suggested that a gradual range expansion of P. hwangshanensis during the Pleistocene climatic oscillations could have been the cause of the overlap. Our study therefore suggests that secondary introgression, rather than ILS, explains most of the shared nuclear genomic variation between these two species and demonstrates the complementarity of population genetics and ecological niche modeling in understanding gene flow history. Finally, we discuss the importance of contrasting results from markers with different dynamics of migration, namely nuclear, chloroplast and mitochondrial DNA

    The roles of history, geography, and environment in shaping landscape genetic variation and its applied significance

    Get PDF
    The decline and loss of species and genetic diversity as a result of anthropogenic change is occurring at an unprecedented rate, reshaping biodiversity and restructuring ecosystems. Population genetic variation is shaped by evolutionary processes and in turn determines the evolutionary potential of natural populations. Facilitated by recent improvements in DNA sequencing technologies, population genomic analyses can resolve patterns of genetic differentiation and evolutionary history, characterize the effects of evolutionary process on genome variation, and facilitate an understanding of how response to environmental variation may underlie local adaptation. Such analyses can inform conservation and restoration by establishing baseline patterns of genetic variation across the landscape, recognizing evolutionary significant units, sourcing propagules for restoration, and predicting species response to changing environmental conditions. Here, I applied high throughput DNA sequencing approaches to characterize the historical, spatial, and environmental factors shaping genetic variation in several systems of conservation and restoration significance. First, I investigated hierarchical genetic structure and evolutionary history of Hucho taimen (taimen, the world’s largest salmonid), listed as vulnerable by the International Union for Conservation of Nature (IUCN), across multiple river basins in Russia and Mongolia. Second, I characterized patterns of emergent population genetic structure of nonnative Oncorhynchus mykiss (rainbow trout) in the Lake Tahoe basin to inform reintroduction of the U.S. Endangered Species Act listed native cutthroat trout Oncorhynchus clarkii henshawi (Lahontan cutthroat trout). Rainbow trout have been widely introduced across the globe, stocked for >50 years into Lake Tahoe, and an understanding of population genetic structure may help inform strategies for successful native species reintroduction. Finally, I quantified spatial genetic structure, identified environmental variables potentially involved in local adaptation, and predicted variation in maladaptation under projected climate change across the range of Pinus muricata, a closed-cone pine occurring in a small number of isolated and disjunct stands along the coast of California, and also listed as vulnerable by the IUCN. Collectively, my research highlights the wide utility of population genomic analyses for taxa of conservation and restoration significance

    LIVING ON THE EDGE: A COMPARATIVE PHYLOGEOGRAPHIC STUDY OF REFUGIAL AND INSULAR FRAGMENTATION

    Get PDF
    Pleistocene glacial-interglacial cycles resulted in population isolation that led to inter- and intra- specific genetic divergence in many North American species. The magnitude of isolation also influenced species response to these climatic changes and set the stage for contemporary gene flow. We can refine our understanding of species response to historical climate change by identifying regions of ice-free persistence and refugia during glacial maxima, and geographic locations and genetic dynamics of post-glacial secondary contact. This dissertation examines the role of glacial cover, geographic barriers, habitat fragmentation as a result of changes in sea level, and insularity on the contemporary genetic structure of three widespread, co-distributed, and ecologically distinct small mammals across western North America, with emphasis on the Pacific Northwest. Previous work on long-tailed voles (Microtus longicaudus), northwestern deer mice (Peromyscus keeni), and dusky shrews (Sorex monticolus) was used to formulate hypotheses of geographic distribution of genetic variation, timing of divergence, and regions of glacial persistence. This dissertation uses multilocus genetic data and historical climatic conditions to address these hypotheses. I identify regions of glacial persistence, the effects of historical sea levels on island connectivity, and regions of post-glacial secondary contact of divergent lineages within M. longicaudus, P. keeni and S. monticolus. Additionally, I assess levels of endemism for the islands of Southeast Alaska. The collective findings of this dissertation improve our understanding of effects of historical range fragmentation and insularity on contemporary genetic diversity
    corecore