Search CORE

1,251 research outputs found

Recovering complete and draft population genomes from metagenome datasets.

Author: Gilbert Jack A
Sangwan Naseer
Xia Fangfang
Publication venue: eScholarship, University of California
Publication date: 01/03/2016
Field of study

Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. Here, we provide a synthesis of available methods to bin metagenomic contigs into species-level groups and highlight how genetic diversity, sequencing depth, and coverage influence binning success. Despite the computational cost on application to deeply sequenced complex metagenomes (e.g., soil), covarying patterns of contig coverage across multiple datasets significantly improves the binning process. We also discuss and compare current genome validation methods and reveal how these methods tackle the problem of chimeric genome bins i.e., sequences from multiple species. Finally, we explore how population genome assembly can be used to uncover biogeographic trends and to characterize the effect of in situ functional constraints on the genome-wide evolution

Woods Hole Open Access Server

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

The Parallelism Motifs of Genomic Data Analysis

Author: Awan Muaaz
Azad Ariful
Brock Benjamin
Buluc Aydin
Egan Rob
Ekanayake Saliya
Ellis Marquita
Georganas Evangelos
Guidi Giulia
Hofmeyr Steven
Oliker Leonid
Selvitopi Oguz
Teodoropol Cristina
Yelick Katherine
Publication venue: 'The Royal Society'
Publication date: 20/01/2020
Field of study

Genomic data sets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share this data with the research community, but some of these genomic data analysis problems require large scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high end parallel systems today and place different requirements on programming support, software libraries, and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high performance genomics analysis, including alignment, profiling, clustering, and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or motifs that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing

arXiv.org e-Print Archive

eScholarship - University of California

Recommended from our members

Libra: scalable k-mer-based tool for massive all-vs-all metagenome comparisons

Author: Bomhoff Matthew
Choi Illyoung
Hartman John H
Hurwitz Bonnie L
Ponsero Alise J
Youens-Clark Ken
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/02/2019
Field of study

Background Shotgun metagenomics provides powerful insights into microbial community biodiversity and function. Yet, inferences from metagenomic studies are often limited by dataset size and complexity and are restricted by the availability and completeness of existing databases. De novo comparative metagenomics enables the comparison of metagenomes based on their total genetic content. Results We developed a tool called Libra that performs an all-vs-all comparison of metagenomes for precise clustering based on their k-mer content. Libra uses a scalable Hadoop framework for massive metagenome comparisons, Cosine Similarity for calculating the distance using sequence composition and abundance while normalizing for sequencing depth, and a web-based implementation in iMicrobe (http://imicrobe.us) that uses the CyVerse advanced cyberinfrastructure to promote broad use of the tool by the scientific community. Conclusions A comparison of Libra to equivalent tools using both simulated and real metagenomic datasets, ranging from 80 million to 4.2 billion reads, reveals that methods commonly implemented to reduce compute time for large datasets, such as data reduction, read count normalization, and presence/absence distance metrics, greatly diminish the resolution of large-scale comparative analyses. In contrast, Libra uses all of the reads to calculate k-mer abundance in a Hadoop architecture that can scale to any size dataset to enable global-scale analyses and link microbial signatures to biological processes.National Science Foundation [1640775]Open access journalThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]

The University of Arizona

Accurate Genome Relative Abundance Estimation Based on Shotgun Metagenomic Reads

Author: A Brady
AC McHardy
B Beszteri
B Langmead
DB Rusch
DC Richter
DH Huson
DR Kelley
EJ Biers
Emmanuel Dias-Neto
FE Angly
Fengzhu Sun
GL Rosen
GW Tyson
H Li
H Teeling
J Peterson
J Qin
Jacob A. Cram
JC Venter
Jed A. Fuhrman
JL Morgan
JS Liu
K Kurokawa
K Liolios
K Mavromatis
KE Nelson
Li C. Xia
M Monzoorul Haque
NN Diaz
PA Vaishampayan
PJ Turnbaugh
PJ Turnbaugh
PJ Turnbaugh
R Sandberg
R Stepanauskas
RJ Case
RM Engeman
S Chatterji
SF Altschul
SR Gill
T Woyke
Ting Chen
VM Markowitz
Y Chen
YW Wu
Publication venue: Public Library of Science
Publication date: 06/12/2011
Field of study

Accurate estimation of microbial community composition based on metagenomic sequencing data is fundamental for subsequent metagenomics analysis. Prevalent estimation methods are mainly based on directly summarizing alignment results or its variants; often result in biased and/or unstable estimates. We have developed a unified probabilistic framework (named GRAMMy) by explicitly modeling read assignment ambiguities, genome size biases and read distributions along the genomes. Maximum likelihood method is employed to compute Genome Relative Abundance of microbial communities using the Mixture Model theory (GRAMMy). GRAMMy has been demonstrated to give estimates that are accurate and robust across both simulated and real read benchmark datasets. We applied GRAMMy to a collection of 34 metagenomic read sets from four metagenomics projects and identified 99 frequent species (minimally 0.5% abundant in at least 50% of the data- sets) in the human gut samples. Our results show substantial improvements over previous studies, such as adjusting the over-estimated abundance for Bacteroides species for human gut samples, by providing a new reference-based strategy for metagenomic sample comparisons. GRAMMy can be used flexibly with many read assignment tools (mapping, alignment or composition-based) even with low-sensitivity mapping results from huge short-read datasets. It will be increasingly useful as an accurate and robust tool for abundance estimation with the growing size of read sets and the expanding database of reference genomes

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Unsupervised discovery of microbial population structure within metagenomes using nucleotide base composition

Author: Halgamuge Saman K.
Saeed Isaam
Tang Sen-Lin
Publication venue: Oxford University Press
Publication date: 29/11/2018
Field of study

An approach to infer the unknown microbial population structure within a metagenome is to cluster nucleotide sequences based on common patterns in base composition, otherwise referred to as binning. When functional roles are assigned to the identified populations, a deeper understanding of microbial communities can be attained, more so than gene-centric approaches that explore overall functionality. In this study, we propose an unsupervised, model-based binning method with two clustering tiers, which uses a novel transformation of the oligonucleotide frequency-derived error gradient and GC content to generate coarse groups at the first tier of clustering; and tetranucleotide frequency to refine these groups at the secondary clustering tier. The proposed method has a demonstrated improvement over PhyloPythia, S-GSOM, TACOA and TaxSOM on all three benchmarks that were used for evaluation in this study. The proposed method is then applied to a pyrosequenced metagenomic library of mud volcano sediment sampled in southwestern Taiwan, with the inferred population structure validated against complementary sequencing of 16S ribosomal RNA marker genes. Finally, the proposed method was further validated against four publicly available metagenomes, including a highly complex Antarctic whale-fall bone sample, which was previously assumed to be too complex for binning prior to functional analysis

PubMed Central

The Australian National University

Recommended from our members

Integrative analysis of the microbiome and metabolome of the human intestinal mucosal surface reveals exquisite inter-relationships

Author: Borneman James
Braun Jonathan
Fornace Albert J
Goudarzi Maryam
Graeber Thomas G
Horvath Steve
Huttenhower Curtis
McGovern Dermot PB
McHardy Ian H
Ruegger Paul M
Schwager Emma
Sonnenburg Justin L
Tong Maomeng
Weger John R
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/06/2013
Field of study

Background: Consistent compositional shifts in the gut microbiota are observed in IBD and other chronic intestinal disorders and may contribute to pathogenesis. The identities of microbial biomolecular mechanisms and metabolic products responsible for disease phenotypes remain to be determined, as do the means by which such microbial functions may be therapeutically modified. Results: The composition of the microbiota and metabolites in gut microbiome samples in 47 subjects were determined. Samples were obtained by endoscopic mucosal lavage from the cecum and sigmoid colon regions, and each sample was sequenced using the 16S rRNA gene V4 region (Illumina-HiSeq 2000 platform) and assessed by UPLC mass spectroscopy. Spearman correlations were used to identify widespread, statistically significant microbial-metabolite relationships. Metagenomes for identified microbial OTUs were imputed using PICRUSt, and KEGG metabolic pathway modules for imputed genes were assigned using HUMAnN. The resulting metabolic pathway abundances were mostly concordant with metabolite data. Analysis of the metabolome-driven distribution of OTU phylogeny and function revealed clusters of clades that were both metabolically and metagenomically similar. Conclusions: The results suggest that microbes are syntropic with mucosal metabolome composition and therefore may be the source of and/or dependent upon gut epithelial metabolites. The consistent relationship between inferred metagenomic function and assayed metabolites suggests that metagenomic composition is predictive to a reasonable degree of microbial community metabolite pools. The finding that certain metabolites strongly correlate with microbial community structure raises the possibility of targeting metabolites for monitoring and/or therapeutically manipulating microbial community function in IBD and other chronic diseases

Harvard University - DASH

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

HumGut: a comprehensive human gut prokaryotic genomes collection filtered by metagenome data

Author: Hegge Finn Terje
Hiseni Pranvera
Rudi Knut
Snipen Lars
Wilson Robert Charles
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

Background A major bottleneck in the use of metagenome sequencing for human gut microbiome studies has been the lack of a comprehensive genome collection to be used as a reference database. Several recent efforts have been made to re-construct genomes from human gut metagenome data, resulting in a huge increase in the number of relevant genomes. In this work, we aimed to create a collection of the most prevalent healthy human gut prokaryotic genomes, to be used as a reference database, including both MAGs from the human gut and ordinary RefSeq genomes. Results We screened > 5,700 healthy human gut metagenomes for the containment of > 490,000 publicly available prokaryotic genomes sourced from RefSeq and the recently announced UHGG collection. This resulted in a pool of > 381,000 genomes that were subsequently scored and ranked based on their prevalence in the healthy human metagenomes. The genomes were then clustered at a 97.5% sequence identity resolution, and cluster representatives (30,691 in total) were retained to comprise the HumGut collection. Using the Kraken2 software for classification, we find superior performance in the assignment of metagenomic reads, classifying on average 94.5% of the reads in a metagenome, as opposed to 86% with UHGG and 44% when using standard Kraken2 database. A coarser HumGut collection, consisting of genomes dereplicated at 95% sequence identity—similar to UHGG, classified 88.25% of the reads. HumGut, half the size of standard Kraken2 database and directly comparable to the UHGG size, outperforms them both. Conclusions The HumGut collection contains > 30,000 genomes clustered at a 97.5% sequence identity resolution and ranked by human gut prevalence. We demonstrate how metagenomes from IBD-patients map equally well to this collection, indicating this reference is relevant also for studies well outside the metagenome reference set used to obtain HumGut. All data and metadata, as well as helpful code, are available at http://arken.nmbu.no/~larssn/humgut/.publishedVersio

Brage NMBU

Brage INN

PubMed Central

NORA - Norwegian Open Research Archives

Satellite remote sensing data can be used to model marine microbial metabolite turnover

Author: A Ditchfield
A Toseland
A-B Martin-Cuadrado
AJ Southward
Anton F Post
B Pfeil
BB Jørgensen
Dawn Field
EL Barrett
FO Glöckner
J Ladau
J Yu
JA Fuhrman
JA Gilbert
JA Gilbert
JA Gilbert
JA Gilbert
Jack A Gilbert
JG Caporaso
KA Kilpatrick
KJ Popendorf
KL Carder
M Hügler
M Schmidt
MGI Langille
MJ Follows
N Fierer
N Fierer
NA Kamennaya
NA Kamennaya
Nicole Scott
NM Scott
OU Mason
PE Larsen
PE Larsen
Peter E Larsen
RD Graetz
RJW Brewin
RK Thauer
Rob Knight
S Archer
SC Doney
SM Gibbons
TJ Smyth
VA Smith
W Paul Bissett
X Wang
Yuki Hamada
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/07/2014
Field of study

Sampling ecosystems, even at a local scale, at the temporal and spatial resolution necessary to capture natural variability in microbial communities are prohibitively expensive. We extrapolated marine surface microbial community structure and metabolic potential from 72 16S rRNA amplicon and 8 metagenomic observations using remotely sensed environmental parameters to create a system-scale model of marine microbial metabolism for 5904 grid cells (49 km2) in the Western English Chanel, across 3 years of weekly averages. Thirteen environmental variables predicted the relative abundance of 24 bacterial Orders and 1715 unique enzyme-encoding genes that encode turnover of 2893 metabolites. The genes’ predicted relative abundance was highly correlated (Pearson Correlation 0.72, P-value <10−6) with their observed relative abundance in sequenced metagenomes. Predictions of the relative turnover (synthesis or consumption) of CO2 were significantly correlated with observed surface CO2 fugacity. The spatial and temporal variation in the predicted relative abundances of genes coding for cyanase, carbon monoxide and malate dehydrogenase were investigated along with the predicted inter-annual variation in relative consumption or production of ~3000 metabolites forming six significant temporal clusters. These spatiotemporal distributions could possibly be explained by the co-occurrence of anaerobic and aerobic metabolisms associated with localized plankton blooms or sediment resuspension, which facilitate the presence of anaerobic micro-niches. This predictive model provides a general framework for focusing future sampling and experimental design to relate biogeochemical turnover to microbial ecology

Crossref

Woods Hole Open Access Server

PubMed Central

eScholarship - University of California

NERC Open Research Archive

Dispersal strategies shape persistence and evolution of human gut bacteria

Author: Bahram Mohammad
Bork Peer
Ferretti Pamela
Frioux Clemence
Gossmann Toni
Hildebrand Falk
Kuhn Michael
Myers Pernille Neve
Nielsen Henrik Bjørn
Ozkurt Ezgi
Publication venue
Publication date: 01/01/2021
Field of study

Human gut bacterial strains can co-exist with their hosts for decades, but little is known about how these microbes persist and disperse, and evolve thereby. Here, we examined these processes in 5,278 adult and infant fecal metagenomes, longitudinally sampled in individuals and families. Our analyses revealed that a subset of gut species is extremely persistent in individuals, families, and geographic regions, represented often by locally successful strains of the phylum Bacteroidota. These ''tenacious'' bacteria show high levels of genetic adaptation to the human host but a high probability of loss upon antibiotic interventions. By contrast, heredipersistent bacteria, notably Firmicutes, often rely on dispersal strategies with weak phylogeographic patterns but strong family transmissions, likely related to sporulation. These analyses describe how different dispersal strategies can lead to the long-term persistence of human gut microbes with implications for gut flora modulations

Epsilon Open Archive

INRIA a CCSD electronic archive server

PubMed Central

Publications at Bielefeld University

MDC Repository

Online Research Database In Technology

Oskar Bordeaux

Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with bioBakery 3

Author
Publication venue: 'eLife Sciences Publications, Ltd'
Publication date: 05/05/2021
Field of study

17openInternationalBothCulture-independent analyses of microbial communities have progressed dramatically in the last decade, particularly due to advances in methods for biological profiling via shotgun metagenomics. Opportunities for improvement continue to accelerate, with greater access to multi-omics, microbial reference genomes, and strain-level diversity. To leverage these, we present bioBakery 3, a set of integrated, improved methods for taxonomic, strain-level, functional, and phylogenetic profiling of metagenomes newly developed to build on the largest set of reference sequences now available. Compared to current alternatives, MetaPhlAn 3 increases the accuracy of taxonomic profiling, and HUMAnN 3 improves that of functional potential and activity. These methods detected novel disease-microbiome links in applications to CRC (1262 metagenomes) and IBD (1635 metagenomes and 817 metatranscriptomes). Strain-level profiling of an additional 4077 metagenomes with StrainPhlAn 3 and PanPhlAn 3 unraveled the phylogenetic and functional structure of the common gut microbe Ruminococcus bromii, previously described by only 15 isolate genomes. With open-source implementations and cloud-deployable reproducible workflows, the bioBakery 3 platform can help researchers deepen the resolution, scale, and accuracy of multi-omic profiling for microbial community studies.openBeghini, Francesco; McIver, Lauren J; Blanco-Míguez, Aitor; Dubois, Leonard; Asnicar, Francesco; Maharjan, Sagun; Mailyan, Ana; Manghi, Paolo; Scholz, Matthias; Thomas, Andrew Maltez; Valles-Colomer, Mireia; Weingart, George; Zhang, Yancong; Zolfo, Moreno; Huttenhower, Curtis; Franzosa, Eric A.; Segata, NicolaBeghini, F.; Mciver, L.J.; Blanco-Míguez, A.; Dubois, L.; Asnicar, F.; Maharjan, S.; Mailyan, A.; Manghi, P.; Scholz, M.; Thomas, A.M.; Valles-Colomer, M.; Weingart, G.; Zhang, Y.; Zolfo, M.; Huttenhower, C.; Franzosa, E.A.; Segata, N

Archivio istituzionale della ricerca - Fondazione Edmund Mach