299 research outputs found

    Global ecotypes in the ubiquitous marine clade SAR86

    Get PDF
    SAR86 is an abundant and ubiquitous heterotroph in the surface ocean that plays a central role in the function of marine ecosystems. We hypothesized that despite its ubiquity, different SAR86 subgroups may be endemic to specific ocean regions and functionally specialized for unique marine environments. However, the global biogeographical distributions of SAR86 genes, and the manner in which these distributions correlate with marine environments, have not been investigated. We quantified SAR86 gene content across globally distributed metagenomic samples and modeled these gene distributions as a function of 51 environmental variables. We identified five distinct clusters of genes within the SAR86 pangenome, each with a unique geographic distribution associated with specific environmental characteristics. Gene clusters are characterized by the strong taxonomic enrichment of distinct SAR86 genomes and partial assemblies, as well as differential enrichment of certain functional groups, suggesting differing functional and ecological roles of SAR86 ecotypes. We then leveraged our models and high-resolution, remote sensing-derived environmental data to predict the distributions of SAR86 gene clusters across the world’s oceans, creating global maps of SAR86 ecotype distributions. Our results reveal that SAR86 exhibits previously unknown, complex biogeography, and provide a framework for exploring geographic distributions of genetic diversity from other microbial clades

    Species-level functional profiling of metagenomes and metatranscriptomes.

    Get PDF
    Functional profiles of microbial communities are typically generated using comprehensive metagenomic or metatranscriptomic sequence read searches, which are time-consuming, prone to spurious mapping, and often limited to community-level quantification. We developed HUMAnN2, a tiered search strategy that enables fast, accurate, and species-resolved functional profiling of host-associated and environmental communities. HUMAnN2 identifies a community's known species, aligns reads to their pangenomes, performs translated search on unclassified reads, and finally quantifies gene families and pathways. Relative to pure translated search, HUMAnN2 is faster and produces more accurate gene family profiles. We applied HUMAnN2 to study clinal variation in marine metabolism, ecological contribution patterns among human microbiome pathways, variation in species' genomic versus transcriptional contributions, and strain profiling. Further, we introduce 'contributional diversity' to explain patterns of ecological assembly across different microbial community types

    iPHoP: An integrated machine learning framework to maximize host prediction for metagenome-derived viruses of archaea and bacteria

    Get PDF
    The extraordinary diversity of viruses infecting bacteria and archaea is now primarily studied through metagenomics. While metagenomes enable high-throughput exploration of the viral sequence space, metagenome-derived sequences lack key information compared to isolated viruses, in particular host association. Different computational approaches are available to predict the host(s) of uncultivated viruses based on their genome sequences, but thus far individual approaches are limited either in precision or in recall, i.e., for a number of viruses they yield erroneous predictions or no prediction at all. Here, we describe iPHoP, a two-step framework that integrates multiple methods to reliably predict host taxonomy at the genus rank for a broad range of viruses infecting bacteria and archaea, while retaining a low false discovery rate. Based on a large dataset of metagenome-derived virus genomes from the IMG/VR database, we illustrate how iPHoP can provide extensive host prediction and guide further characterization of uncultivated viruses

    Toward Accurate and Quantitative Comparative Metagenomics

    Full text link
    Shotgun metagenomics and computational analysis are used to compare the taxonomic and functional profiles of microbial communities. Leveraging this approach to understand roles of microbes in human biology and other environments requires quantitative data summaries whose values are comparable across samples and studies. Comparability is currently hampered by the use of abundance statistics that do not estimate a meaningful parameter of the microbial community and biases introduced by experimental protocols and data-cleaning approaches. Addressing these challenges, along with improving study design, data access, metadata standardization, and analysis tools, will enable accurate comparative metagenomics. We envision a future in which microbiome studies are replicable and new metagenomes are easily and rapidly integrated with existing data. Only then can the potential of metagenomics for predictive ecological modeling, well-powered association studies, and effective microbiome medicine be fully realized

    Selection of Appropriate Metagenome Taxonomic Classifiers for Ancient Microbiome Research

    Get PDF
    Metagenomics enables the study of complex microbial communities from myriad sources, including the remains of oral and gut microbiota preserved in archaeological dental calculus and paleofeces, respectively. While accurate taxonomic assignment is essential to this process, DNA damage, characteristic to ancient samples (e.g. reduction in fragment size), may reduce the accuracy of read taxonomic assignment. Using a set of in silico-generated metagenomic datasets we investigated how the addition of ancient DNA (aDNA) damage patterns influences microbial taxonomic assignment by five widely-used profilers: QIIME/UCLUST, MetaPhlAn2, MIDAS, CLARK-S, and MALT (BLAST-X-mode). In silico-generated datasets were designed to mimic dental plaque, consisting of 40, 100, and 200 microbial species/strains, both with and without simulated aDNA damage patterns. Following taxonomic assignment, the profiles were evaluated for species presence/absence, relative abundance, alpha-diversity, beta-diversity, and specific taxonomic assignment biases. Unifrac metrics indicated that both MIDAS and MetaPhlAn2 provided the most accurate community structure reconstruction. QIIME/UCLUST, CLARK-S, and MALT had the highest number of inaccurate taxonomic assignments; however, filtering out species present at lt;0.1% abundance greatly increased the accuracy of CLARK-S and MALT. All programs except CLARK-S failed to detect some species from the input file that were in their databases. Ancient DNA damage resulted in minimal differences in species detection and relative abundance between simulated ancient and modern datasets for most programs. In conclusion, taxonomic profiling biases are program-specific rather than damage-dependent, and the choice of taxonomic classification program to use should be tailored to the research question

    A genomic catalog of Earth’s microbiomes

    Get PDF
    The reconstruction of bacterial and archaeal genomes from shotgun metagenomes has enabled insights into the ecology and evolution of environmental and host-associated microbiomes. Here we applied this approach to >10,000 metagenomes collected from diverse habitats covering all of Earth’s continents and oceans, including metagenomes from human and animal hosts, engineered environments, and natural and agricultural soils, to capture extant microbial, metabolic and functional potential. This comprehensive catalog includes 52,515 metagenome-assembled genomes representing 12,556 novel candidate species-level operational taxonomic units spanning 135 phyla. The catalog expands the known phylogenetic diversity of bacteria and archaea by 44% and is broadly available for streamlined comparative analyses, interactive exploration, metabolic modeling and bulk download. We demonstrate the utility of this collection for understanding secondary-metabolite biosynthetic potential and for resolving thousands of new host linkages to uncultivated viruses. This resource underscores the value of genome-centric approaches for revealing genomic properties of uncultivated microorganisms that affect ecosystem processes

    Exploring neighborhoods in large metagenome assembly graphs reveals hidden sequence diversity

    Get PDF
    Genomes computationally inferred from large metagenomic data sets are often incomplete and may be missing functionally important content and strain variation. We introduce an information retrieval system for large metagenomic data sets that exploits the sparsity of DNA assembly graphs to efficiently extract subgraphs surround- ing an inferred genome. We apply this system to recover missing content from genome bins and show that substantial genomic se- quence variation is present in a real metagenome. Our software implementation is available at https://github.com/spacegraphcats/ spacegraphcats under the 3-Clause BSD License

    IMG/PR: a database of plasmids from genomes and metagenomes with rich annotations and metadata.

    Get PDF
    Plasmids are mobile genetic elements found in many clades of Archaea and Bacteria. They drive horizontal gene transfer, impacting ecological and evolutionary processes within microbial communities, and hold substantial importance in human health and biotechnology. To support plasmid research and provide scientists with data of an unprecedented diversity of plasmid sequences, we introduce the IMG/PR database, a new resource encompassing 699 973 plasmid sequences derived from genomes, metagenomes and metatranscriptomes. IMG/PR is the first database to provide data of plasmid that were systematically identified from diverse microbiome samples. IMG/PR plasmids are associated with rich metadata that includes geographical and ecosystem information, host taxonomy, similarity to other plasmids, functional annotation, presence of genes involved in conjugation and antibiotic resistance. The database offers diverse methods for exploring its extensive plasmid collection, enabling users to navigate plasmids through metadata-centric queries, plasmid comparisons and BLAST searches. The web interface for IMG/PR is accessible at https://img.jgi.doe.gov/pr. Plasmid metadata and sequences can be downloaded from https://genome.jgi.doe.gov/portal/IMG_PR
    • …
    corecore