94 research outputs found
Coalescent histories for lodgepole species trees
Coalescent histories are combinatorial structures that describe for a given
gene tree and species tree the possible lists of branches of the species tree
on which the gene tree coalescences take place. Properties of the number of
coalescent histories for gene trees and species trees affect a variety of
probabilistic calculations in mathematical phylogenetics. Exact and asymptotic
evaluations of the number of coalescent histories, however, are known only in a
limited number of cases. Here we introduce a particular family of species
trees, the \emph{lodgepole} species trees , in which
tree has taxa. We determine the number of coalescent
histories for the lodgepole species trees, in the case that the gene tree
matches the species tree, showing that this number grows with in the
number of taxa . This computation demonstrates the existence of tree
families in which the growth in the number of coalescent histories is faster
than exponential. Further, it provides a substantial improvement on the lower
bound for the ratio of the largest number of matching coalescent histories to
the smallest number of matching coalescent histories for trees with taxa,
increasing a previous bound of
to . We discuss the implications of our
enumerative results for phylogenetic computations
A lattice structure for ancestral configurations arising from the relationship between gene trees and species trees
To a given gene tree topology and species tree topology with leaves
labeled bijectively from a fixed set , one can associate a set of ancestral
configurations, each of which encodes a set of gene lineages that can be found
at a given node of a species tree. We introduce a lattice structure on
ancestral configurations, studying the directed graphs that provide graphical
representations of lattices of ancestral configurations. For a matching gene
tree topology and species tree topology, we present a method for defining the
digraph of ancestral configurations from the tree topology by using iterated
cartesian products of graphs. We show that a specific set of paths on the
digraph of ancestral configurations is in bijection with the set of labeled
histories -- a well-known phylogenetic object that enumerates possible temporal
orderings of the coalescences of a tree. For each of a series of tree families,
we obtain closed-form expressions for the number of labeled histories by using
this bijection to count paths on associated digraphs. Finally, we prove that
our lattice construction extends to nonmatching tree pairs, and we use it to
characterize pairs having the maximal number of ancestral
configurations for a fixed . We discuss how the construction provides new
methods for performing enumerations of combinatorial aspects of gene and
species trees.Comment: 20 pages, 15 figures. This version contains reference updates, first
author name update, minor changes to the tex
Enumeration of coalescent histories for caterpillar species trees and -pseudocaterpillar gene trees
For a fixed set containing taxon labels, an ordered pair consisting
of a gene tree topology and a species tree bijectively labeled with the
labels of possesses a set of coalescent histories -- mappings from the set
of internal nodes of to the set of edges of describing possible lists
of edges in on which the coalescences in take place. Enumerations of
coalescent histories for gene trees and species trees have produced suggestive
results regarding the pairs that, for a fixed , have the largest
number of coalescent histories. We define a class of 2-cherry binary tree
topologies that we term -pseudocaterpillars, examining coalescent histories
for non-matching pairs , in the case in which has a caterpillar
shape and has a -pseudocaterpillar shape. Using a construction that
associates coalescent histories for with a class of "roadblocked"
monotonic paths, we identify the -pseudocaterpillar labeled gene tree
topology that, for a fixed caterpillar labeled species tree topology, gives
rise to the largest number of coalescent histories. The shape that maximizes
the number of coalescent histories places the "second" cherry of the
-pseudocaterpillar equidistantly from the root of the "first" cherry and
from the tree root. A symmetry in the numbers of coalescent histories for
-pseudocaterpillar gene trees and caterpillar species trees is seen to exist
around the maximizing value of the parameter . The results provide insight
into the factors that influence the number of coalescent histories possible for
a given gene tree and species tree
Organellar inheritance in the green lineage: insights from Ostreococcus tauri
Along the green lineage (Chlorophyta and Streptophyta), mitochondria and chloroplast are mainly uniparentally transmitted and their evolution is thus clonal. The mode of organellar inheritance in their ancestor is less certain. The inability to make clear phylogenetic inference is partly due to a lack of information for deep branching organisms in this lineage. Here, we investigate organellar evolution in the early branching green alga Ostreococcus tauri using population genomics data from the complete mitochondrial and chloroplast genomes. The haplotype structure is consistent with clonal evolution in mitochondria, while we find evidence for recombination in the chloroplast genome. The number of recombination events in the genealogy of the chloroplast suggests that recombination, and thus biparental inheritance, is not rare. Consistent with the evidence of recombination, we find that the ratio of the number of nonsynonymous to the synonymous polymorphisms per site is lower in chloroplast than in the mitochondria genome. We also find evidence for the segregation of two selfish genetic elements in the chloroplast. These results shed light on the role of recombination and the evolutionary history of organellar inheritance in the green lineage
Similarity thresholds used in DNA sequence assembly from short reads can reduce the comparability of population histories across species
Comparing inferences among datasets generated using short read sequencing may provide insight into the concerted impacts of divergence, gene flow and selection across organisms, but comparisons are complicated by biases introduced during dataset assembly. Sequence similarity thresholds allow the de novo assembly of short reads into clusters of alleles representing different loci, but the resulting datasets are sensitive to both the similarity threshold used and to the variation naturally present in the organism under study. Thresholds that require high sequence similarity among reads for assembly (stringent thresholds) as well as highly variable species may result in datasets in which divergent alleles are lost or divided into separate loci (‘over-splitting’), whereas liberal thresholds increase the risk of paralogous loci being combined into a single locus (‘under-splitting’). Comparisons among datasets or species are therefore potentially biased if different similarity thresholds are applied or if the species differ in levels of within-lineage genetic variation. We examine the impact of a range of similarity thresholds on assembly of empirical short read datasets from populations of four different non-model bird lineages (species or species pairs) with different levels of genetic divergence. We find that, in all species, stringent similarity thresholds result in fewer alleles per locus than more liberal thresholds, which appears to be the result of high levels of over-splitting. The frequency of putative under-splitting, conversely, is low at all thresholds. Inferred genetic distances between individuals, gene tree depths, and estimates of the ancestral mutation-scaled effective population size (θ) differ depending upon the similarity threshold applied. Relative differences in inferences across species differ even when the same threshold is applied, but may be dramatically different when datasets assembled under different thresholds are compared. These differences not only complicate comparisons across species, but also preclude the application of standard mutation rates for parameter calibration. We suggest some best practices for assembling short read data to maximize comparability, such as using more liberal thresholds and examining the impact of different thresholds on each dataset
Importance of incomplete lineage sorting and introgression in the origin of shared genetic variation between two closely related pines with overlapping distributions
Genetic variation shared between closely related species may be due to retention of ancestral polymorphisms because of incomplete lineage sorting (ILS) and/or introgression following secondary contact. It is challenging to distinguish ILS and introgression because they generate similar patterns of shared genetic diversity, but this is nonetheless essential for inferring accurately the history of species with overlapping distributions. To address this issue, we sequenced 33 independent intron loci across the genome of two closely related pine species (Pinus massoniana Lamb. and Pinus hwangshanensis Hisa) from Southeast China. Population structure analyses revealed that the species showed slightly more admixture in parapatric populations than in allopatric populations. Levels of interspecific differentiation were lower in parapatry than in allopatry. Approximate Bayesian computation suggested that the most likely speciation scenario explaining this pattern was a long period of isolation followed by a secondary contact. Ecological niche modeling suggested that a gradual range expansion of P. hwangshanensis during the Pleistocene climatic oscillations could have been the cause of the overlap. Our study therefore suggests that secondary introgression, rather than ILS, explains most of the shared nuclear genomic variation between these two species and demonstrates the complementarity of population genetics and ecological niche modeling in understanding gene flow history. Finally, we discuss the importance of contrasting results from markers with different dynamics of migration, namely nuclear, chloroplast and mitochondrial DNA
The roles of history, geography, and environment in shaping landscape genetic variation and its applied significance
The decline and loss of species and genetic diversity as a result of anthropogenic change is occurring at an unprecedented rate, reshaping biodiversity and restructuring ecosystems. Population genetic variation is shaped by evolutionary processes and in turn determines the evolutionary potential of natural populations. Facilitated by recent improvements in DNA sequencing technologies, population genomic analyses can resolve patterns of genetic differentiation and evolutionary history, characterize the effects of evolutionary process on genome variation, and facilitate an understanding of how response to environmental variation may underlie local adaptation. Such analyses can inform conservation and restoration by establishing baseline patterns of genetic variation across the landscape, recognizing evolutionary significant units, sourcing propagules for restoration, and predicting species response to changing environmental conditions. Here, I applied high throughput DNA sequencing approaches to characterize the historical, spatial, and environmental factors shaping genetic variation in several systems of conservation and restoration significance. First, I investigated hierarchical genetic structure and evolutionary history of Hucho taimen (taimen, the world’s largest salmonid), listed as vulnerable by the International Union for Conservation of Nature (IUCN), across multiple river basins in Russia and Mongolia. Second, I characterized patterns of emergent population genetic structure of nonnative Oncorhynchus mykiss (rainbow trout) in the Lake Tahoe basin to inform reintroduction of the U.S. Endangered Species Act listed native cutthroat trout Oncorhynchus clarkii henshawi (Lahontan cutthroat trout). Rainbow trout have been widely introduced across the globe, stocked for >50 years into Lake Tahoe, and an understanding of population genetic structure may help inform strategies for successful native species reintroduction. Finally, I quantified spatial genetic structure, identified environmental variables potentially involved in local adaptation, and predicted variation in maladaptation under projected climate change across the range of Pinus muricata, a closed-cone pine occurring in a small number of isolated and disjunct stands along the coast of California, and also listed as vulnerable by the IUCN. Collectively, my research highlights the wide utility of population genomic analyses for taxa of conservation and restoration significance
LIVING ON THE EDGE: A COMPARATIVE PHYLOGEOGRAPHIC STUDY OF REFUGIAL AND INSULAR FRAGMENTATION
Pleistocene glacial-interglacial cycles resulted in population isolation that led to inter- and intra- specific genetic divergence in many North American species. The magnitude of isolation also influenced species response to these climatic changes and set the stage for contemporary gene flow. We can refine our understanding of species response to historical climate change by identifying regions of ice-free persistence and refugia during glacial maxima, and geographic locations and genetic dynamics of post-glacial secondary contact. This dissertation examines the role of glacial cover, geographic barriers, habitat fragmentation as a result of changes in sea level, and insularity on the contemporary genetic structure of three widespread, co-distributed, and ecologically distinct small mammals across western North America, with emphasis on the Pacific Northwest. Previous work on long-tailed voles (Microtus longicaudus), northwestern deer mice (Peromyscus keeni), and dusky shrews (Sorex monticolus) was used to formulate hypotheses of geographic distribution of genetic variation, timing of divergence, and regions of glacial persistence. This dissertation uses multilocus genetic data and historical climatic conditions to address these hypotheses. I identify regions of glacial persistence, the effects of historical sea levels on island connectivity, and regions of post-glacial secondary contact of divergent lineages within M. longicaudus, P. keeni and S. monticolus. Additionally, I assess levels of endemism for the islands of Southeast Alaska. The collective findings of this dissertation improve our understanding of effects of historical range fragmentation and insularity on contemporary genetic diversity
- …