452 research outputs found

    Inference of Many-Taxon Phylogenies

    Get PDF
    Phylogenetic trees are tree topologies that represent the evolutionary history of a set of organisms. In this thesis, we address computational challenges related to the analysis of large-scale datasets with Maximum Likelihood based phylogenetic inference. We have approached this using different strategies: reduction of memory requirements, reduction of running time, and reduction of man-hours

    Maximum Likelihood Analyses of 3,490 rbcL Sequences: Scalability of Comprehensive Inference versus Group-Specific Taxon Sampling

    Get PDF
    The constant accumulation of sequence data poses new computational and methodological challenges for phylogenetic inference, since multiple sequence alignments grow both in the horizontal (number of base pairs, phylogenomic alignments) as well as vertical (number of taxa) dimension. Put aside the ongoing controversial discussion about appropriate models, partitioning schemes, and assembly methods for phylogenomic alignments, coupled with the high computational cost to infer these, for many organismic groups, a sufficient number of taxa is often exclusively available from one or just a few genes (e.g., rbcL, matK, rDNA). In this paper we address scalability of Maximum-Likelihood-based phylogeny reconstruction with respect to the number of taxa by example of several large nested single-gene rbcL alignments comprising 400 up to 3,491 taxa. In order to test the effect of taxon sampling, we employ an appropriately adapted taxon jackknifing approach. In contrast to standard jackknifing, this taxon subsampling procedure is not conducted entirely at random, but based on drawing subsamples from empirical taxon-groups which can either be user-defined or determined by using taxonomic information from databases. Our results indicate that, despite an unfavorable number of sequences to number of base pairs ratio, i.e., many relatively short sequences, Maximum Likelihood tree searches and bootstrap analyses scale well on single-gene rbcL alignments with a dense taxon sampling up to several thousand sequences. Moreover, the newly implemented taxon subsampling procedure can be beneficial for inferring higher level relationships and interpreting bootstrap support from comprehensive analysis

    Multiple Independent Origins of Apicomplexan-Like Parasites

    Get PDF
    The apicomplexans are a group of obligate animal pathogens that include Plasmodium (malaria), Toxoplasma (toxoplasmosis), and Cryptosporidium (cryptosporidiosis) [1]. They are an extremely diverse and specious group but are nevertheless united by a distinctive suite of cytoskeletal and secretory structures related to infection, called the apical complex, which is used to recognize and gain entry into animal host cells. The apicomplexans are also known to have evolved from free-living photosynthetic ancestors and retain a relict plastid (the apicoplast), which is non-photosynthetic but houses a number of other essential metabolic pathways [2]. Their closest relatives include a mix of both photosynthetic algae (chromerids) and non-photosynthetic microbial predators (colpodellids) [3]. Genomic analyses of these free-living relatives have revealed a great deal about how the alga-parasite transition may have taken place, as well as origins of parasitism more generally [4]. Here, we show that, despite the surprisingly complex origin of apicomplexans from algae, this transition actually occurred at least three times independently. Using single-cell genomics and transcriptomics from diverse uncultivated parasites, we find that two genera previously classified within the Apicomplexa, Piridium and Platyproteum, form separately branching lineages in phylogenomic analyses. Both retain cryptic plastids with genomic and metabolic features convergent with apicomplexans. These findings suggest a predilection in this lineage for both the convergent loss of photosynthesis and transition to parasitism, resulting in multiple lineages of superficially similar animal parasites

    A MOSAIC of methods: Improving ortholog detection through integration of algorithmic diversity

    Full text link
    Ortholog detection (OD) is a critical step for comparative genomic analysis of protein-coding sequences. In this paper, we begin with a comprehensive comparison of four popular, methodologically diverse OD methods: MultiParanoid, Blat, Multiz, and OMA. In head-to-head comparisons, these methods are shown to significantly outperform one another 12-30% of the time. This high complementarity motivates the presentation of the first tool for integrating methodologically diverse OD methods. We term this program MOSAIC, or Multiple Orthologous Sequence Analysis and Integration by Cluster optimization. Relative to component and competing methods, we demonstrate that MOSAIC more than quintuples the number of alignments for which all species are present, while simultaneously maintaining or improving functional-, phylogenetic-, and sequence identity-based measures of ortholog quality. Further, we demonstrate that this improvement in alignment quality yields 40-280% more confidently aligned sites. Combined, these factors translate to higher estimated levels of overall conservation, while at the same time allowing for the detection of up to 180% more positively selected sites. MOSAIC is available as python package. MOSAIC alignments, source code, and full documentation are available at http://pythonhosted.org/bio-MOSAIC

    Inference and reconstruction of the heimdallarchaeial ancestry of eukaryotes

    Get PDF
    In the ongoing debates about eukaryogenesis—the series of evolutionary events leading to the emergence of the eukaryotic cell from prokaryotic ancestors— members of the Asgard archaea play a key part as the closest archaeal relatives of eukaryotes1. However, the nature and phylogenetic identity of the last common ancestor of Asgard archaea and eukaryotes remain unresolved2–4. Here we analyse distinct phylogenetic marker datasets of an expanded genomic sampling of Asgard archaea and evaluate competing evolutionary scenarios using state-of-the-art phylogenomic approaches. We find that eukaryotes are placed, with high confidence, as a well-nested clade within Asgard archaea and as a sister lineage to Hodarchaeales, a newly proposed order within Heimdallarchaeia. Using sophisticated gene tree and species tree reconciliation approaches, we show that analogous to the evolution of eukaryotic genomes, genome evolution in Asgard archaea involved significantly more gene duplication and fewer gene loss events compared with other archaea. Finally, we infer that the last common ancestor of Asgard archaea was probably a thermophilic chemolithotroph and that the lineage from which eukaryotes evolved adapted to mesophilic conditions and acquired the genetic potential to support a heterotrophic lifestyle. Our work provides key insights into the prokaryote-to-eukaryote transition and a platform for better understanding the emergence of cellular complexity in eukaryotic cells