68 research outputs found
Coalescent histories for lodgepole species trees
Coalescent histories are combinatorial structures that describe for a given
gene tree and species tree the possible lists of branches of the species tree
on which the gene tree coalescences take place. Properties of the number of
coalescent histories for gene trees and species trees affect a variety of
probabilistic calculations in mathematical phylogenetics. Exact and asymptotic
evaluations of the number of coalescent histories, however, are known only in a
limited number of cases. Here we introduce a particular family of species
trees, the \emph{lodgepole} species trees , in which
tree has taxa. We determine the number of coalescent
histories for the lodgepole species trees, in the case that the gene tree
matches the species tree, showing that this number grows with in the
number of taxa . This computation demonstrates the existence of tree
families in which the growth in the number of coalescent histories is faster
than exponential. Further, it provides a substantial improvement on the lower
bound for the ratio of the largest number of matching coalescent histories to
the smallest number of matching coalescent histories for trees with taxa,
increasing a previous bound of
to . We discuss the implications of our
enumerative results for phylogenetic computations
Enumeration of coalescent histories for caterpillar species trees and -pseudocaterpillar gene trees
For a fixed set containing taxon labels, an ordered pair consisting
of a gene tree topology and a species tree bijectively labeled with the
labels of possesses a set of coalescent histories -- mappings from the set
of internal nodes of to the set of edges of describing possible lists
of edges in on which the coalescences in take place. Enumerations of
coalescent histories for gene trees and species trees have produced suggestive
results regarding the pairs that, for a fixed , have the largest
number of coalescent histories. We define a class of 2-cherry binary tree
topologies that we term -pseudocaterpillars, examining coalescent histories
for non-matching pairs , in the case in which has a caterpillar
shape and has a -pseudocaterpillar shape. Using a construction that
associates coalescent histories for with a class of "roadblocked"
monotonic paths, we identify the -pseudocaterpillar labeled gene tree
topology that, for a fixed caterpillar labeled species tree topology, gives
rise to the largest number of coalescent histories. The shape that maximizes
the number of coalescent histories places the "second" cherry of the
-pseudocaterpillar equidistantly from the root of the "first" cherry and
from the tree root. A symmetry in the numbers of coalescent histories for
-pseudocaterpillar gene trees and caterpillar species trees is seen to exist
around the maximizing value of the parameter . The results provide insight
into the factors that influence the number of coalescent histories possible for
a given gene tree and species tree
A lattice structure for ancestral configurations arising from the relationship between gene trees and species trees
To a given gene tree topology and species tree topology with leaves
labeled bijectively from a fixed set , one can associate a set of ancestral
configurations, each of which encodes a set of gene lineages that can be found
at a given node of a species tree. We introduce a lattice structure on
ancestral configurations, studying the directed graphs that provide graphical
representations of lattices of ancestral configurations. For a matching gene
tree topology and species tree topology, we present a method for defining the
digraph of ancestral configurations from the tree topology by using iterated
cartesian products of graphs. We show that a specific set of paths on the
digraph of ancestral configurations is in bijection with the set of labeled
histories -- a well-known phylogenetic object that enumerates possible temporal
orderings of the coalescences of a tree. For each of a series of tree families,
we obtain closed-form expressions for the number of labeled histories by using
this bijection to count paths on associated digraphs. Finally, we prove that
our lattice construction extends to nonmatching tree pairs, and we use it to
characterize pairs having the maximal number of ancestral
configurations for a fixed . We discuss how the construction provides new
methods for performing enumerations of combinatorial aspects of gene and
species trees.Comment: 20 pages, 15 figures. This version contains reference updates, first
author name update, minor changes to the tex
Anomaly zones for uniformly sampled gene trees under the gene duplication and loss model
Recently, there has been interest in extending long-known results about the
multispecies coalescent tree to other models of gene trees. Results about the
gene duplication and loss model have mathematical proofs, including species
tree identifiability, estimability, and sample complexity of popular algorithms
like ASTRAL. Here, this work is continued by characterizing the anomaly zones
of uniformly sampled gene trees. The anomaly zone for species trees is the set
of parameters where some discordant gene tree occurs with the maximal
probability. The detection of anomalous gene trees is an important problem in
phylogenomics, as their presence renders effective estimation methods to being
positively misleading. Under the multispecies coalescent, anomaly zones are
known to exist for rooted species trees with as few as four species.
The gene duplication and loss process is a generalization of the generalized
linear-birth death process to the rooted species tree, where each edge is
treated as a single timeline with exponential-rate duplication and loss. The
methods and results come from a detailed probabilistic analysis of trajectories
observed from this stochastic process. It is shown that anomaly zones do not
exist for rooted GDL balanced trees on four species, but they may exist for
rooted caterpillar trees, as with MSC.Comment: 26 pages, 1 figur
Resolving the Systematics of Acronictinae (Lepidoptera, Noctuidae), the Evolution of Larval Defenses, and Tracking the Gain/Loss of Complex Courtship Structures in Noctuidae
Moths and caterpillars of the noctuid genus Acronicta Oschenheimer, 1816, widely known as dagger moths, have captured the imagination of taxonomists for centuries. Morphologically enigmatic adults and highly variable larvae prompted A. R. Grote to proclaim, There would seem to be no genus which offers a more interesting field to the biologist for exploration, (1895). Without known synapomorphies for Acronicta, or the subfamily Acronictinae, their circumscriptions have changed over time. This dissertation delves into the taxonomic history of these taxa, setting the stage for a worldwide phylogenetic analysis of Acronictinae. The diversity of larval forms is considered in a tri-trophic framework, quantifying bottom up (host plant) and top down (predator) effects through measures of diet breadth, morphology, and behavior, all in a phylogenetic context. Adult courtship structures, present in some acronictine species, are scored across the family Noctuidae, to aid in the study of the evolution of complex morphological traits
Topology of genealogical trees - theory and application
Still today, identifying loci, which underwent recent selective sweeps is a difficult task, since traces are typically obscured by other evolutionary and demographic factors, such as genetic drift or population bottleneck events. To detect candidate loci of selective sweeps, we take here an approach which considers the genealogical relationships among individuals and the topological properties of the inferred coalescent tree. Selective sweeps can produce highly unbalanced coalescent tree topologies in region close to a selective sweep site. Building on a previously known test statistic called T3, which detects bias in the balance of binary genealogical trees, we derive a new test statistic based on a log likelihood approach and we call it the LR_T3-test.
We present the results of genome wide screens of the LR_T3-test applied to the 26 populations of the phase 3 data set of the human 1,000 genomes project.
Furthermore, we present a measure of topological linkage disequilibrium (tLD), which is based on clustering individuals with respect to their position in the genealogy rather than clustering alleles and haplotypes. We demonstrate its application to the beforehand processed human data
Supertree-like methods for genome-scale species tree estimation
A critical step in many biological studies is the estimation of evolutionary trees (phylogenies) from genomic data. Of particular interest is the species tree, which illustrates how a set of species evolved from a common ancestor. While species trees were previously estimated from a few regions of the genome (genes), it is now widely recognized that biological processes can cause the evolutionary histories of individual genes to differ from each other and from the species tree. This heterogeneity across the genome is phylogenetic signal that can be leveraged to estimate species evolution with greater accuracy. Hence, species tree estimation is expected to be greatly aided by current large-scale sequencing efforts, including the 5000 Insect Genomes Project, the 10000 Plant Genomes Project, the (~60000) Vertebrate Genomes Project, and the Earth BioGenome Project, which aims to assemble genomes (or at least genome-scale data) for 1.5 million eukaryotic species in the next ten years. To analyze these forthcoming datasets, species tree estimation methods must scale to thousands of species and tens of thousands of genes; however, many of the current leading methods, which are heuristics for NP-hard optimization problems, can be prohibitively expensive on datasets of this size. In this dissertation, we argue that new methods are needed to enable scalable and statistically rigorous species tree estimation pipelines; we then seek to address this challenge through the introduction of three supertree-like methods: NJMerge, TreeMerge, and FastMulRFS. For these methods, we present theoretical results (worst-case running time analyses and proofs of statistical consistency) as well as empirical results on simulated datasets (and a fungal dataset for FastMulRFS). Overall, these methods enable statistically consistent species tree estimation pipelines that achieve comparable accuracy to the dominant optimization-based approaches while dramatically reducing running time
A global phylogeny of butterflies reveals their evolutionary history, ancestral hosts and biogeographic origins
Butterflies are a diverse and charismatic insect group that are thought to have evolved with plants and dispersed throughout the world in response to key geological events. However, these hypotheses have not been extensively tested because a comprehensive phylogenetic framework and datasets for butterfly larval hosts and global distributions are lacking. We sequenced 391 genes from nearly 2,300 butterfly species, sampled from 90 countries and 28 specimen collections, to reconstruct a new phylogenomic tree of butterflies representing 92% of all genera. Our phylogeny has strong support for nearly all nodes and demonstrates that at least 36 butterfly tribes require reclassification. Divergence time analyses imply an origin similar to 100 million years ago for butterflies and indicate that all but one family were present before the K/Pg extinction event. We aggregated larval host datasets and global distribution records and found that butterflies are likely to have first fed on Fabaceae and originated in what is now the Americas. Soon after the Cretaceous Thermal Maximum, butterflies crossed Beringia and diversified in the Palaeotropics. Our results also reveal that most butterfly species are specialists that feed on only one larval host plant family. However, generalist butterflies that consume two or more plant families usually feed on closely related plants
The genome of the truffle-parasite Tolypocladium ophioglossoides and the evolution of antifungal peptaibiotics
Abstract
Background
Two major mycoparasitic lineages, the family Hypocreaceae and the genus Tolypocladium, exist within the fungal order, Hypocreales. Peptaibiotics are a group of secondary metabolites almost exclusively described from Trichoderma species of Hypocreaceae. Peptaibiotics are produced by nonribosomal peptide synthetases (NRPSs) and have antibiotic and antifungal activities. Tolypocladium species are mainly truffle parasites, but a few species are insect pathogens.
Results
The draft genome sequence of the truffle parasite Tolypocladium ophioglossoides was generated and numerous secondary metabolite clusters were discovered, many of which have no known putative product. However, three large peptaibiotic gene clusters were identified using phylogenetic analyses. Peptaibiotic genes are absent from the predominantly plant and insect pathogenic lineages of Hypocreales, and are therefore exclusive to the largely mycoparasitic lineages. Using NRPS adenylation domain phylogenies and reconciliation of the domain tree with the organismal phylogeny, it is demonstrated that the distribution of these domains is likely not the product of horizontal gene transfer between mycoparasitic lineages, but represents independent losses in insect pathogenic lineages. Peptaibiotic genes are less conserved between species of Tolypocladium and are the product of complex patterns of lineage sorting and module duplication. In contrast, these genes are more conserved within the genus Trichoderma and consistent with diversification through speciation.
Conclusions
Peptaibiotic NRPS genes are restricted to mycoparasitic lineages of Hypocreales, based on current sampling. Phylogenomics and comparative genomics can provide insights into the evolution of secondary metabolite genes, their distribution across a broader range of taxa, and their possible function related to host specificity.http://deepblue.lib.umich.edu/bitstream/2027.42/112062/1/12864_2015_Article_1777.pd
- …