68 research outputs found

    Coalescent histories for lodgepole species trees

    Full text link
    Coalescent histories are combinatorial structures that describe for a given gene tree and species tree the possible lists of branches of the species tree on which the gene tree coalescences take place. Properties of the number of coalescent histories for gene trees and species trees affect a variety of probabilistic calculations in mathematical phylogenetics. Exact and asymptotic evaluations of the number of coalescent histories, however, are known only in a limited number of cases. Here we introduce a particular family of species trees, the \emph{lodgepole} species trees (λn)n≥0(\lambda_n)_{n\geq 0}, in which tree λn\lambda_n has m=2n+1m=2n+1 taxa. We determine the number of coalescent histories for the lodgepole species trees, in the case that the gene tree matches the species tree, showing that this number grows with m!!m!! in the number of taxa mm. This computation demonstrates the existence of tree families in which the growth in the number of coalescent histories is faster than exponential. Further, it provides a substantial improvement on the lower bound for the ratio of the largest number of matching coalescent histories to the smallest number of matching coalescent histories for trees with mm taxa, increasing a previous bound of (π/32)[(5m−12)/(4m−6)]mm(\sqrt{\pi} / 32)[(5m-12)/(4m-6)] m \sqrt{m} to [m−1/(4e)]m[ \sqrt{m-1}/(4 \sqrt{e}) ]^{m}. We discuss the implications of our enumerative results for phylogenetic computations

    Enumeration of coalescent histories for caterpillar species trees and pp-pseudocaterpillar gene trees

    Full text link
    For a fixed set XX containing nn taxon labels, an ordered pair consisting of a gene tree topology GG and a species tree SS bijectively labeled with the labels of XX possesses a set of coalescent histories -- mappings from the set of internal nodes of GG to the set of edges of SS describing possible lists of edges in SS on which the coalescences in GG take place. Enumerations of coalescent histories for gene trees and species trees have produced suggestive results regarding the pairs (G,S)(G,S) that, for a fixed nn, have the largest number of coalescent histories. We define a class of 2-cherry binary tree topologies that we term pp-pseudocaterpillars, examining coalescent histories for non-matching pairs (G,S)(G,S), in the case in which SS has a caterpillar shape and GG has a pp-pseudocaterpillar shape. Using a construction that associates coalescent histories for (G,S)(G,S) with a class of "roadblocked" monotonic paths, we identify the pp-pseudocaterpillar labeled gene tree topology that, for a fixed caterpillar labeled species tree topology, gives rise to the largest number of coalescent histories. The shape that maximizes the number of coalescent histories places the "second" cherry of the pp-pseudocaterpillar equidistantly from the root of the "first" cherry and from the tree root. A symmetry in the numbers of coalescent histories for pp-pseudocaterpillar gene trees and caterpillar species trees is seen to exist around the maximizing value of the parameter pp. The results provide insight into the factors that influence the number of coalescent histories possible for a given gene tree and species tree

    A lattice structure for ancestral configurations arising from the relationship between gene trees and species trees

    Full text link
    To a given gene tree topology GG and species tree topology SS with leaves labeled bijectively from a fixed set XX, one can associate a set of ancestral configurations, each of which encodes a set of gene lineages that can be found at a given node of a species tree. We introduce a lattice structure on ancestral configurations, studying the directed graphs that provide graphical representations of lattices of ancestral configurations. For a matching gene tree topology and species tree topology, we present a method for defining the digraph of ancestral configurations from the tree topology by using iterated cartesian products of graphs. We show that a specific set of paths on the digraph of ancestral configurations is in bijection with the set of labeled histories -- a well-known phylogenetic object that enumerates possible temporal orderings of the coalescences of a tree. For each of a series of tree families, we obtain closed-form expressions for the number of labeled histories by using this bijection to count paths on associated digraphs. Finally, we prove that our lattice construction extends to nonmatching tree pairs, and we use it to characterize pairs (G,S)(G,S) having the maximal number of ancestral configurations for a fixed GG. We discuss how the construction provides new methods for performing enumerations of combinatorial aspects of gene and species trees.Comment: 20 pages, 15 figures. This version contains reference updates, first author name update, minor changes to the tex

    Anomaly zones for uniformly sampled gene trees under the gene duplication and loss model

    Full text link
    Recently, there has been interest in extending long-known results about the multispecies coalescent tree to other models of gene trees. Results about the gene duplication and loss model have mathematical proofs, including species tree identifiability, estimability, and sample complexity of popular algorithms like ASTRAL. Here, this work is continued by characterizing the anomaly zones of uniformly sampled gene trees. The anomaly zone for species trees is the set of parameters where some discordant gene tree occurs with the maximal probability. The detection of anomalous gene trees is an important problem in phylogenomics, as their presence renders effective estimation methods to being positively misleading. Under the multispecies coalescent, anomaly zones are known to exist for rooted species trees with as few as four species. The gene duplication and loss process is a generalization of the generalized linear-birth death process to the rooted species tree, where each edge is treated as a single timeline with exponential-rate duplication and loss. The methods and results come from a detailed probabilistic analysis of trajectories observed from this stochastic process. It is shown that anomaly zones do not exist for rooted GDL balanced trees on four species, but they may exist for rooted caterpillar trees, as with MSC.Comment: 26 pages, 1 figur

    Resolving the Systematics of Acronictinae (Lepidoptera, Noctuidae), the Evolution of Larval Defenses, and Tracking the Gain/Loss of Complex Courtship Structures in Noctuidae

    Get PDF
    Moths and caterpillars of the noctuid genus Acronicta Oschenheimer, 1816, widely known as dagger moths, have captured the imagination of taxonomists for centuries. Morphologically enigmatic adults and highly variable larvae prompted A. R. Grote to proclaim, There would seem to be no genus which offers a more interesting field to the biologist for exploration, (1895). Without known synapomorphies for Acronicta, or the subfamily Acronictinae, their circumscriptions have changed over time. This dissertation delves into the taxonomic history of these taxa, setting the stage for a worldwide phylogenetic analysis of Acronictinae. The diversity of larval forms is considered in a tri-trophic framework, quantifying bottom up (host plant) and top down (predator) effects through measures of diet breadth, morphology, and behavior, all in a phylogenetic context. Adult courtship structures, present in some acronictine species, are scored across the family Noctuidae, to aid in the study of the evolution of complex morphological traits

    Topology of genealogical trees - theory and application

    Get PDF
    Still today, identifying loci, which underwent recent selective sweeps is a difficult task, since traces are typically obscured by other evolutionary and demographic factors, such as genetic drift or population bottleneck events. To detect candidate loci of selective sweeps, we take here an approach which considers the genealogical relationships among individuals and the topological properties of the inferred coalescent tree. Selective sweeps can produce highly unbalanced coalescent tree topologies in region close to a selective sweep site. Building on a previously known test statistic called T3, which detects bias in the balance of binary genealogical trees, we derive a new test statistic based on a log likelihood approach and we call it the LR_T3-test. We present the results of genome wide screens of the LR_T3-test applied to the 26 populations of the phase 3 data set of the human 1,000 genomes project. Furthermore, we present a measure of topological linkage disequilibrium (tLD), which is based on clustering individuals with respect to their position in the genealogy rather than clustering alleles and haplotypes. We demonstrate its application to the beforehand processed human data

    Supertree-like methods for genome-scale species tree estimation

    Get PDF
    A critical step in many biological studies is the estimation of evolutionary trees (phylogenies) from genomic data. Of particular interest is the species tree, which illustrates how a set of species evolved from a common ancestor. While species trees were previously estimated from a few regions of the genome (genes), it is now widely recognized that biological processes can cause the evolutionary histories of individual genes to differ from each other and from the species tree. This heterogeneity across the genome is phylogenetic signal that can be leveraged to estimate species evolution with greater accuracy. Hence, species tree estimation is expected to be greatly aided by current large-scale sequencing efforts, including the 5000 Insect Genomes Project, the 10000 Plant Genomes Project, the (~60000) Vertebrate Genomes Project, and the Earth BioGenome Project, which aims to assemble genomes (or at least genome-scale data) for 1.5 million eukaryotic species in the next ten years. To analyze these forthcoming datasets, species tree estimation methods must scale to thousands of species and tens of thousands of genes; however, many of the current leading methods, which are heuristics for NP-hard optimization problems, can be prohibitively expensive on datasets of this size. In this dissertation, we argue that new methods are needed to enable scalable and statistically rigorous species tree estimation pipelines; we then seek to address this challenge through the introduction of three supertree-like methods: NJMerge, TreeMerge, and FastMulRFS. For these methods, we present theoretical results (worst-case running time analyses and proofs of statistical consistency) as well as empirical results on simulated datasets (and a fungal dataset for FastMulRFS). Overall, these methods enable statistically consistent species tree estimation pipelines that achieve comparable accuracy to the dominant optimization-based approaches while dramatically reducing running time

    Insects as a model to puzzle out mechanisms of lineage diversification in the Indomalayan / Australasian archipelago

    Get PDF

    A global phylogeny of butterflies reveals their evolutionary history, ancestral hosts and biogeographic origins

    Get PDF
    Butterflies are a diverse and charismatic insect group that are thought to have evolved with plants and dispersed throughout the world in response to key geological events. However, these hypotheses have not been extensively tested because a comprehensive phylogenetic framework and datasets for butterfly larval hosts and global distributions are lacking. We sequenced 391 genes from nearly 2,300 butterfly species, sampled from 90 countries and 28 specimen collections, to reconstruct a new phylogenomic tree of butterflies representing 92% of all genera. Our phylogeny has strong support for nearly all nodes and demonstrates that at least 36 butterfly tribes require reclassification. Divergence time analyses imply an origin similar to 100 million years ago for butterflies and indicate that all but one family were present before the K/Pg extinction event. We aggregated larval host datasets and global distribution records and found that butterflies are likely to have first fed on Fabaceae and originated in what is now the Americas. Soon after the Cretaceous Thermal Maximum, butterflies crossed Beringia and diversified in the Palaeotropics. Our results also reveal that most butterfly species are specialists that feed on only one larval host plant family. However, generalist butterflies that consume two or more plant families usually feed on closely related plants

    The genome of the truffle-parasite Tolypocladium ophioglossoides and the evolution of antifungal peptaibiotics

    Full text link
    Abstract Background Two major mycoparasitic lineages, the family Hypocreaceae and the genus Tolypocladium, exist within the fungal order, Hypocreales. Peptaibiotics are a group of secondary metabolites almost exclusively described from Trichoderma species of Hypocreaceae. Peptaibiotics are produced by nonribosomal peptide synthetases (NRPSs) and have antibiotic and antifungal activities. Tolypocladium species are mainly truffle parasites, but a few species are insect pathogens. Results The draft genome sequence of the truffle parasite Tolypocladium ophioglossoides was generated and numerous secondary metabolite clusters were discovered, many of which have no known putative product. However, three large peptaibiotic gene clusters were identified using phylogenetic analyses. Peptaibiotic genes are absent from the predominantly plant and insect pathogenic lineages of Hypocreales, and are therefore exclusive to the largely mycoparasitic lineages. Using NRPS adenylation domain phylogenies and reconciliation of the domain tree with the organismal phylogeny, it is demonstrated that the distribution of these domains is likely not the product of horizontal gene transfer between mycoparasitic lineages, but represents independent losses in insect pathogenic lineages. Peptaibiotic genes are less conserved between species of Tolypocladium and are the product of complex patterns of lineage sorting and module duplication. In contrast, these genes are more conserved within the genus Trichoderma and consistent with diversification through speciation. Conclusions Peptaibiotic NRPS genes are restricted to mycoparasitic lineages of Hypocreales, based on current sampling. Phylogenomics and comparative genomics can provide insights into the evolution of secondary metabolite genes, their distribution across a broader range of taxa, and their possible function related to host specificity.http://deepblue.lib.umich.edu/bitstream/2027.42/112062/1/12864_2015_Article_1777.pd
    • …
    corecore