Search CORE

105 research outputs found

iPHoP: An integrated machine learning framework to maximize host prediction for metagenome-derived viruses of archaea and bacteria

Author: Camargo Antonio Pedro
Coutinho Felipe H
Dabdoub Shareef M
Dutilh Bas E
Nayfach Stephen
Roux Simon
Tritt Andrew
Publication venue
Publication date: 01/04/2023
Field of study

The extraordinary diversity of viruses infecting bacteria and archaea is now primarily studied through metagenomics. While metagenomes enable high-throughput exploration of the viral sequence space, metagenome-derived sequences lack key information compared to isolated viruses, in particular host association. Different computational approaches are available to predict the host(s) of uncultivated viruses based on their genome sequences, but thus far individual approaches are limited either in precision or in recall, i.e., for a number of viruses they yield erroneous predictions or no prediction at all. Here, we describe iPHoP, a two-step framework that integrates multiple methods to reliably predict host taxonomy at the genus rank for a broad range of viruses infecting bacteria and archaea, while retaining a low false discovery rate. Based on a large dataset of metagenome-derived virus genomes from the IMG/VR database, we illustrate how iPHoP can provide extensive host prediction and guide further characterization of uncultivated viruses

Utrecht University Repository

Recommended from our members

Automated and Accurate Estimation of Gene Family Abundance from Shotgun Metagenomes

Author: Bradley Patrick H.
Eisen Jonathan A.
Laurent Timothy J.
Nayfach Stephen
Pollard Katherine S.
Sharpton Thomas J.
Williams Alex
Wyman Stacia K.
Publication venue: 'Public Library of Science (PLoS)'
Publication date
Field of study

Shotgun metagenomic DNA sequencing is a widely applicable tool for characterizing the functions that are encoded by microbial communities. Several bioinformatic tools can be used to functionally annotate metagenomes, allowing researchers to draw inferences about the functional potential of the community and to identify putative functional biomarkers. However, little is known about how decisions made during annotation affect the reliability of the results. Here, we use statistical simulations to rigorously assess how to optimize annotation accuracy and speed, given parameters of the input data like read length and library size. We identify best practices in metagenome annotation and use them to guide the development of the Shotgun Metagenome Annotation Pipeline (ShotMAP). ShotMAP is an analytically flexible, end-to-end annotation pipeline that can be implemented either on a local computer or a cloud compute cluster. We use ShotMAP to assess how different annotation databases impact the interpretation of how marine metagenome and metatranscriptome functional capacity changes across seasons. We also apply ShotMAP to data obtained from a clinical microbiome investigation of inflammatory bowel disease. This analysis finds that gut microbiota collected from Crohn’s disease patients are functionally distinct from gut microbiota collected from either ulcerative colitis patients or healthy controls, with differential abundance of metabolic pathways related to host-microbiome interactions that may serve as putative biomarkers of diseaseData Availability Statement: The Gilbert et al. L4 metagenomes and metatranscriptomes are available from the MG-RAST database (project number 109, http://metagenomics.anl.gov/metagenomics.cgi?page=MetagenomeProject&project=109), the Qin et al. MetaHIT inflammatory bowel disease metagenomes are available in the EBI (accession ERA000116), and the Nielsen et al. MGS inflammatory bowel disease metagenomes are available in the EBI (accession ERP002061)

ScholarsArchive@OSU

Recommended from our members

Author Correction: Cryptic inoviruses revealed as pervasive in bacteria and archaea across Earth's biomes.

Author: Bondy-Denomy Joseph
Borges Adair L
Cheng Jan-Fang
Daly Rebecca A
Eloe-Fadrosh Emiley A
Ivanova Natalia N
Krupovic Mart
Kyrpides Nikos C
Matheus Carnevali Paula B
Nayfach Stephen
Roux Simon
Schulz Frederik
Sharrar Allison
Visel Axel
Woyke Tanja
Wrighton Kelly C
Publication venue: eScholarship, University of California
Publication date: 01/03/2020
Field of study

An amendment to this paper has been published and can be accessed via a link at the top of the paper

eScholarship - University of California

Toward Accurate and Quantitative Comparative Metagenomics

Author: Abubucker
Aitchison
Alivisatos
Alneberg
Ames
Anders
Arumugam
Barrett
Benjamini
Beszteri
Bokulich
Börnigen
Carneiro
Carr
Carr
Chase
Cleary
Consortium
Donia
Dubilier
Dutilh
Ehrlich
Faust
Finucane
Fisher
Fodor
Forslund
Franzosa
Gilbert
Glass
Gomez-Alvarez
Goodrich
Greenblum
Greenblum
Hingamp
Howe
Huttenhower
Jones
Joseph
Kashtan
Katherine S. Pollard
Keeling
Kennedy
Keylock
Knight
Kodama
Korem
Koren
Kuleshov
Kurtz
Ladau
Lam
Langmead
Levy
Li
Li
Lindgreen
Liu
Liu
Lozupone
Luo
Mailman
Malmstrom
Manor
Marbouty
Maurice
McMurdie
Mende
Mende
Meyer
Mizuno
Nayfach
Nayfach
Nayfach
Nayfach
Nguyen
Nielsen
Oh
O’Sullivan
Paulson
Pesant
Peterson
Poptsova
Prakash
Preheim
Quail
Rasko
Ravel
Rinke
Roberts
Robinson
Salter
Satinsky
Schloissnig
Schloss
Segata
Segata
Shapiro
Sharpton
Sharpton
Sinha
Spencer
Stepanauskas
Stephen Nayfach
Sunagawa
Sunagawa
Tanner
Tikhonov
Voigt
Wang
Weiss
Weiss
Wilson
Wilson
Wommack
Wu
Wu
Xia
Yatsunenko
Yilmaz
Zelezniak
Zhu
Publication venue: 'Elsevier BV'
Publication date: 01/08/2016
Field of study

Shotgun metagenomics and computational analysis are used to compare the taxonomic and functional profiles of microbial communities. Leveraging this approach to understand roles of microbes in human biology and other environments requires quantitative data summaries whose values are comparable across samples and studies. Comparability is currently hampered by the use of abundance statistics that do not estimate a meaningful parameter of the microbial community and biases introduced by experimental protocols and data-cleaning approaches. Addressing these challenges, along with improving study design, data access, metadata standardization, and analysis tools, will enable accurate comparative metagenomics. We envision a future in which microbiome studies are replicable and new metagenomes are easily and rapidly integrated with existing data. Only then can the potential of metagenomics for predictive ecological modeling, well-powered association studies, and effective microbiome medicine be fully realized

Crossref

PubMed Central

eScholarship - University of California

IMG/PR: a database of plasmids from genomes and metagenomes with rich annotations and metadata.

Author: Baltrus David A
Call Lee
Camargo Antonio Pedro
Castañeda-Barba Salvador
Chen I-Min A
Chu Ken
de la Cruz Fernando
Eloe-Fadrosh Emiley A
Funnell Barbara E
Hall James PJ
Huntemann Marcel
Ivanova Natalia N
Kyrpides Nikos C
Mukherjeep Supratim
Mukhopadhyay Aindrila
Nayfach Stephen
Palaniappan Krishnaveni
Ratner Anna
Reddy Tbk
Rocha Eduardo PC
Roux Simon
Stalder Thibault
Top Eva
Woyke Tanja
Publication venue: Oxford University Press (OUP)
Publication date: 01/11/2023
Field of study

Plasmids are mobile genetic elements found in many clades of Archaea and Bacteria. They drive horizontal gene transfer, impacting ecological and evolutionary processes within microbial communities, and hold substantial importance in human health and biotechnology. To support plasmid research and provide scientists with data of an unprecedented diversity of plasmid sequences, we introduce the IMG/PR database, a new resource encompassing 699 973 plasmid sequences derived from genomes, metagenomes and metatranscriptomes. IMG/PR is the first database to provide data of plasmid that were systematically identified from diverse microbiome samples. IMG/PR plasmids are associated with rich metadata that includes geographical and ecosystem information, host taxonomy, similarity to other plasmids, functional annotation, presence of genes involved in conjugation and antibiotic resistance. The database offers diverse methods for exploring its extensive plasmid collection, enabling users to navigate plasmids through metadata-centric queries, plasmid comparisons and BLAST searches. The web interface for IMG/PR is accessible at https://img.jgi.doe.gov/pr. Plasmid metadata and sequences can be downloaded from https://genome.jgi.doe.gov/portal/IMG_PR

University of Liverpool Repository

eScholarship - University of California

Digital.CSIC

Recommended from our members

Unraveling the functional dark matter through global metagenomics

Author: Acinas Silvia G.
Azad Ariful
Baker David
Baltoumas Fotis A.
Buluç Aydin
Call Lee
Camargo Antonio Pedro
Chen I. Min
Iliopoulos Ioannis
Ivanova Natalia N.
Karatzas Evangelos
Konstantinidis Konstantinos T.
Kyrpides Nikos C.
Liu Sirui
Nayfach Stephen
Novel Metagenome Protein Families Consortium
Ouzounis Christos
Ovchinnikov Sergey
Pavlopoulos Georgios A.
Pett-Ridge Jennifer
Páez-Espino A. David
Roux Simon
Selvitopi Oguz
Tiedje James M.
Visel Axel
Publication venue: Nature Publishing Group
Publication date: 01/10/2023
Field of study

30 pages, 4 figures, 1 table, supplementary information https://doi.org/10.1038/s41586-023-06583-7.-- Data availability: All of the analysed datasets along with their corresponding sequences are available from the IMG system (http://img.jgi.doe.gov/). A list of the datasets used in this study is provided in Supplementary Data 8. All data from the protein clusters, including sequences, multiple alignments, HMM profiles, 3D structure models, and taxonomic and ecosystem annotation, are available through NMPFamsDB, publicly accessible at www.nmpfamsdb.org. The 3D models are also available at ModelArchive under accession code ma-nmpfamsdb.-- Code availability: Sequence analysis was performed using Tantan (https://gitlab.com/mcfrith/tantan), BLAST (https://blast.ncbi.nlm.nih.gov/Blast.cgi), LAST (https://gitlab.com/mcfrith/last), HMMER (http://hmmer.org/) and HH-suite3 (https://github.com/soedinglab/hh-suite). Clustering was performed using HipMCL (https://bitbucket.org/azadcse/hipmcl/src/master/). Additional taxonomic annotation was performed using Whokaryote (https://github.com/LottePronk/whokaryote), EukRep (https://github.com/patrickwest/EukRep), DeepVirFinder (https://github.com/jessieren/DeepVirFinder) and MMseqs2 (https://github.com/soedinglab/MMseqs2). 3D modelling was performed using AlphaFold2 (https://github.com/deepmind/alphafold) and TrRosetta2 (https://github.com/RosettaCommons/trRosetta2). Structural alignments were performed using TMalign (https://zhanggroup.org/TM-align/) and MMalign (https://zhanggroup.org/MM-align/). All custom scripts used for the generation and analysis of the data are available at Zenodo (https://doi.org/10.5281/zenodo.8097349)Metagenomes encode an enormous diversity of proteins, reflecting a multiplicity of functions and activities1,2. Exploration of this vast sequence space has been limited to a comparative analysis against reference microbial genomes and protein families derived from those genomes. Here, to examine the scale of yet untapped functional diversity beyond what is currently possible through the lens of reference genomes, we develop a computational approach to generate reference-free protein families from the sequence space in metagenomes. We analyse 26,931 metagenomes and identify 1.17 billion protein sequences longer than 35 amino acids with no similarity to any sequences from 102,491 reference genomes or the Pfam database3. Using massively parallel graph-based clustering, we group these proteins into 106,198 novel sequence clusters with more than 100 members, doubling the number of protein families obtained from the reference genomes clustered using the same approach. We annotate these families on the basis of their taxonomic, habitat, geographical and gene neighbourhood distributions and, where sufficient sequence diversity is available, predict protein three-dimensional models, revealing novel structures. Overall, our results uncover an enormously diverse functional space, highlighting the importance of further exploring the microbial functional dark matterWith the institutional support of the ‘Severo Ochoa Centre of Excellence’ accreditation (CEX2019-000928-S)Peer reviewe

eScholarship - University of California

Digital.CSIC