3,513 research outputs found

    tRNA functional signatures classify plastids as late-branching cyanobacteria.

    Get PDF
    BackgroundEukaryotes acquired the trait of oxygenic photosynthesis through endosymbiosis of the cyanobacterial progenitor of plastid organelles. Despite recent advances in the phylogenomics of Cyanobacteria, the phylogenetic root of plastids remains controversial. Although a single origin of plastids by endosymbiosis is broadly supported, recent phylogenomic studies are contradictory on whether plastids branch early or late within Cyanobacteria. One underlying cause may be poor fit of evolutionary models to complex phylogenomic data.ResultsUsing Posterior Predictive Analysis, we show that recently applied evolutionary models poorly fit three phylogenomic datasets curated from cyanobacteria and plastid genomes because of heterogeneities in both substitution processes across sites and of compositions across lineages. To circumvent these sources of bias, we developed CYANO-MLP, a machine learning algorithm that consistently and accurately phylogenetically classifies ("phyloclassifies") cyanobacterial genomes to their clade of origin based on bioinformatically predicted function-informative features in tRNA gene complements. Classification of cyanobacterial genomes with CYANO-MLP is accurate and robust to deletion of clades, unbalanced sampling, and compositional heterogeneity in input tRNA data. CYANO-MLP consistently classifies plastid genomes into a late-branching cyanobacterial sub-clade containing single-cell, starch-producing, nitrogen-fixing ecotypes, consistent with metabolic and gene transfer data.ConclusionsPhylogenomic data of cyanobacteria and plastids exhibit both site-process heterogeneities and compositional heterogeneities across lineages. These aspects of the data require careful modeling to avoid bias in phylogenomic estimation. Furthermore, we show that amino acid recoding strategies may be insufficient to mitigate bias from compositional heterogeneities. However, the combination of our novel tRNA-specific strategy with machine learning in CYANO-MLP appears robust to these sources of bias with high accuracy in phyloclassification of cyanobacterial genomes. CYANO-MLP consistently classifies plastids as late-branching Cyanobacteria, consistent with independent evidence from signature-based approaches and some previous phylogenetic studies

    SATCHMO-JS: a webserver for simultaneous protein multiple sequence alignment and phylogenetic tree construction.

    Get PDF
    We present the jump-start simultaneous alignment and tree construction using hidden Markov models (SATCHMO-JS) web server for simultaneous estimation of protein multiple sequence alignments (MSAs) and phylogenetic trees. The server takes as input a set of sequences in FASTA format, and outputs a phylogenetic tree and MSA; these can be viewed online or downloaded from the website. SATCHMO-JS is an extension of the SATCHMO algorithm, and employs a divide-and-conquer strategy to jump-start SATCHMO at a higher point in the phylogenetic tree, reducing the computational complexity of the progressive all-versus-all HMM-HMM scoring and alignment. Results on a benchmark dataset of 983 structurally aligned pairs from the PREFAB benchmark dataset show that SATCHMO-JS provides a statistically significant improvement in alignment accuracy over MUSCLE, Multiple Alignment using Fast Fourier Transform (MAFFT), ClustalW and the original SATCHMO algorithm. The SATCHMO-JS webserver is available at http://phylogenomics.berkeley.edu/satchmo-js. The datasets used in these experiments are available for download at http://phylogenomics.berkeley.edu/satchmo-js/supplementary/

    PhyloFacts: an online structural phylogenomic encyclopedia for protein functional and structural classification

    Get PDF
    The Berkeley Phylogenomics Group presents PhyloFacts, a structural phylogenomic encyclopedia containing almost 10,000 'books' for protein families and domains, with pre-calculated structural, functional and evolutionary analyses. PhyloFacts enables biologists to avoid the systematic errors associated with function prediction by homology through the integration of a variety of experimental data and bioinformatics methods in an evolutionary framework. Users can submit sequences for classification to families and functional subfamilies. PhyloFacts is available as a worldwide web resource from

    FlowerPower: clustering proteins into domain architecture classes for phylogenomic inference of protein function

    Get PDF
    BACKGROUND: Function prediction by transfer of annotation from the top database hit in a homology search has been shown to be prone to systematic error. Phylogenomic analysis reduces these errors by inferring protein function within the evolutionary context of the entire family. However, accuracy of function prediction for multi-domain proteins depends on all members having the same overall domain structure. By contrast, most common homolog detection methods are optimized for retrieving local homologs, and do not address this requirement. RESULTS: We present FlowerPower, a novel clustering algorithm designed for the identification of global homologs as a precursor to structural phylogenomic analysis. Similar to methods such as PSIBLAST, FlowerPower employs an iterative approach to clustering sequences. However, rather than using a single HMM or profile to expand the cluster, FlowerPower identifies subfamilies using the SCI-PHY algorithm and then selects and aligns new homologs using subfamily hidden Markov models. FlowerPower is shown to outperform BLAST, PSI-BLAST and the UCSC SAM-Target 2K methods at discrimination between proteins in the same domain architecture class and those having different overall domain structures. CONCLUSION: Structural phylogenomic analysis enables biologists to avoid the systematic errors associated with annotation transfer; clustering sequences based on sharing the same domain architecture is a critical first step in this process. FlowerPower is shown to consistently identify homologous sequences having the same domain architecture as the query. AVAILABILITY: FlowerPower is available as a webserver at

    Evolutionary relationships in Panicoid grasses based on plastome phylogenomics (Panicoideae; Poaceae)

    Get PDF
    Background: Panicoideae are the second largest subfamily in Poaceae (grass family), with 212 genera and approximately 3316 species. Previous studies have begun to reveal relationships within the subfamily, but largely lack resolution and/or robust support for certain tribal and subtribal groups. This study aims to resolve these relationships, as well as characterize a putative mitochondrial insert in one linage. Results: 35 newly sequenced Panicoideae plastomes were combined in a phylogenomic study with 37 other species: 15 Panicoideae and 22 from outgroups. A robust Panicoideae topology largely congruent with previous studies was obtained, but with some incongruences with previously reported subtribal relationships. A mitochondrial DNA (mtDNA) to plastid DNA (ptDNA) transfer was discovered in the Paspalum lineage. Conclusions: The phylogenomic analysis returned a topology that largely supports previous studies. Five previously recognized subtribes appear on the topology to be non-monophyletic. Additionally, evidence for mtDNA to ptDNA transfer was identified in both Paspalum fimbriatum and P. dilatatum, and suggests a single rare event that took place in a common progenitor. Finally, the framework from this study can guide larger whole plastome sampling to discern the relationships in Cyperochloeae, Steyermarkochloeae, Gynerieae, and other incertae sedis taxa that are weakly supported or unresolved.Fil: Burke, Sean V.. Northern Illinois University; Estados UnidosFil: Wysocki, William P.. Northern Illinois University; Estados UnidosFil: Zuloaga, Fernando Omar. Consejo Nacional de Investigaciones Científicas y Técnicas. Instituto de Botánica Darwinion. Academia Nacional de Ciencias Exactas, Físicas y Naturales. Instituto de Botánica Darwinion; ArgentinaFil: Craine, Joseph M.. Jonah Ventures; Estados UnidosFil: Pires, J. Chris. University of Missouri; Estados UnidosFil: Edger, Patrick P.. Michigan State University; Estados UnidosFil: Mayfield Jones, Dustin. Donald Danforth Plant Science Center; Estados UnidosFil: Clark, Lynn G.. Iowa State University; Estados UnidosFil: Kelchner, Scot A.. University of Idaho; Estados UnidosFil: Duvall, Melvin R.. Northern Illinois University; Estados Unido

    Insights into the Ecological Roles and Evolution of Methyl-Coenzyme M Reductase-Containing Hot Spring Archaea

    Get PDF
    Several recent studies have shown the presence of genes for the key enzyme associated with archaeal methane/alkane metabolism, methyl-coenzyme M reductase (Mcr), in metagenome-assembled genomes (MAGs) divergent to existing archaeal lineages. Here, we study the mcr-containing archaeal MAGs from several hot springs, which reveal further expansion in the diversity of archaeal organisms performing methane/alkane metabolism. Significantly, an MAG basal to organisms from the phylum Thaumarchaeota that contains mcr genes, but not those for ammonia oxidation or aerobic metabolism, is identified. Together, our phylogenetic analyses and ancestral state reconstructions suggest a mostly vertical evolution of mcrABG genes among methanogens and methanotrophs, along with frequent horizontal gene transfer of mcr genes between alkanotrophs. Analysis of all mcr-containing archaeal MAGs/genomes suggests a hydrothermal origin for these microorganisms based on optimal growth temperature predictions. These results also suggest methane/alkane oxidation or methanogenesis at high temperature likely existed in a common archaeal ancestor
    corecore