Search CORE

24 research outputs found

proGenomes: a resource for consistent functional and taxonomic annotations of prokaryotic genomes

Author: Bork P.
Forslund K.
Huerta-Cepas J.
Letunic I.
Li S.S.
Mende D.R.
Sunagawa S.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 24/10/2016
Field of study

The availability of microbial genomes has opened many new avenues of research within microbiology. This has been driven primarily by comparative genomics approaches, which rely on accurate and consistent characterization of genomic sequences. It is nevertheless difficult to obtain consistent taxonomic and integrated functional annotations for defined prokaryotic clades. Thus, we developed proGenomes, a resource that provides user-friendly access to currently 25 038 high-quality genomes whose sequences and consistent annotations can be retrieved individually or by taxonomic clade. These genomes are assigned to 5306 consistent and accurate taxonomic species clusters based on previously established methodology. proGenomes also contains functional information for almost 80 million protein-coding genes, including a comprehensive set of general annotations and more focused annotations for carbohydrate-active enzymes and antibiotic resistance genes. Additionally, broad habitat information is provided for many genomes. All genomes and associated information can be downloaded by user-selected clade or multiple habitat-specific sets of representative genomes. We expect that the availability of high-quality genomes with comprehensive functional annotations will promote advances in clinical microbial genomics, functional evolution and other subfields of microbiology. proGenomes is available at http://progenomes.embl.de

Repository for Publications and Research Data

PubMed Central

UNSWorks

MDC Repository

Online-Publikations-Server der Universität Würzburg

Disentangling the impact of environmental and phylogenetic constraints on prokaryotic within-species diversity

Author: Bork P.
Hildebrand F.
Huerta-Cepas J.
Li S.S.
Luetge M.
Maistrenko O.M.
Mende D.R.
Pedro Coelho L.
Rodrigues J.F.M.
Schmidt T.S.B.
Sunagawa S.
von Mering C.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Microbial organisms inhabit virtually all environments and encompass a vast biological diversity. The pangenome concept aims to facilitate an understanding of diversity within defined phylogenetic groups. Hence, pangenomes are increasingly used to characterize the strain diversity of prokaryotic species. To understand the interdependence of pangenome features (such as the number of core and accessory genes) and to study the impact of environmental and phylogenetic constraints on the evolution of conspecific strains, we computed pangenomes for 155 phylogenetically diverse species (from ten phyla) using 7,000 high-quality genomes to each of which the respective habitats were assigned. Species habitat ubiquity was associated with several pangenome features. In particular, core-genome size was more important for ubiquity than accessory genome size. In general, environmental preferences had a stronger impact on pangenome evolution than phylogenetic inertia. Environmental preferences explained up to 49% of the variance for pangenome features, compared with 18% by phylogenetic inertia. This observation was robust when the dataset was extended to 10,100 species (59 phyla). The importance of environmental preferences was further accentuated by convergent evolution of pangenome features in a given habitat type across different phylogenetic clades. For example, the soil environment promotes expansion of pangenome size, while host-associated habitats lead to its reduction. Taken together, we explored the global principles of pangenome evolution, quantified the influence of habitat, and phylogenetic inertia on the evolution of pangenomes and identified criteria governing species ubiquity and habitat specificity

Repository for Publications and Research Data

Crossref

MDC Repository

eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses

Author: Bork P.
Cook H.
Forslund S.K.
Heller D.
Hernández-Plaza A.
Huerta-Cepas J.
Jensen L.J.
Letunic I.
Mende D.R.
Rattei T.
Szklarczyk D.
von Mering C.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 08/01/2019
Field of study

eggNOG is a public database of orthology relationships, gene evolutionary histories and functional annotations. Here, we present version 5.0, featuring a major update of the underlying genome sets, which have been expanded to 4445 representative bacteria and 168 archaea derived from 25 038 genomes, as well as 477 eukaryotic organisms and 2502 viral proteomes that were selected for diversity and filtered by genome quality. In total, 4.4M orthologous groups (OGs) distributed across 379 taxonomic levels were computed together with their associated sequence alignments, phylogenies, HMM models and functional descriptors. Precomputed evolutionary analysis provides fine-grained resolution of duplication/speciation events within each OG. Our benchmarks show that, despite doubling the amount of genomes, the quality of orthology assignments and functional annotations (80% coverage) has persisted without significant changes across this update. Finally, we improved eggNOG online services for fast functional annotation and orthology prediction of custom genomics or metagenomics datasets. All precomputed data are publicly available for downloading or via API queries at http://eggnog.embl.de

MDC Repository

GUNC: detection of chimerism and contamination in prokaryotic genomes

Author: Bork P.
Coelho L.P.
Fullam A.
Khedkar S.
Mende D.R.
Orakov A.
Schmidt T.S.B.
Szklarczyk D.
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 16/12/2020
Field of study

Genomes are critical units in microbiology, yet ascertaining quality in prokaryotic genomes remains a formidable challenge. We present GUNC (the Genome UNClutterer), a tool that accurately detects and quantifies genome chimerism based on the lineage homogeneity of individual contigs using a genome’s full complement of genes. GUNC complements existing approaches by targeting previously underdetected types of contamination: we conservatively estimate that 5.7% of genomes in GenBank, 5.2% in RefSeq, and 15-30% of pre-filtered ‘high quality’ metagenome-assembled genomes in recent studies are undetected chimeras. GUNC provides a fast and robust tool to substantially improve prokaryotic genome quality. Source code (GPLv3+): https://github.com/grp-bork/gun

Directory of Open Access Journals

PubMed Central

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

MDC Repository

proGenomes2: an improved database for accurate and consistent habitat, taxonomic and functional annotations of prokaryotic genomes

Author: Bork P.
Coelho L.P.
Forslund S.K.
Hernández-Plaza A.
Huerta-Cepas J.
Letunic I.
Maistrenko O.M.
Mende D.R.
Milanese A.
Orakov A.N.
Paoli L.
Schmidt T.S.B.
Sunagawa S.
Zeller G.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 08/01/2020
Field of study

Microbiology depends on the availability of annotated microbial genomes for many applications. Comparative genomics approaches have been a major advance, but consistent and accurate annotations of genomes can be hard to obtain. In addition, newer concepts such as the pan-genome concept are still being implemented to help answer biological questions. Hence, we present proGenomes2, which provides 87 920 high-quality genomes in a user-friendly and interactive manner. Genome sequences and annotations can be retrieved individually or by taxonomic clade. Every genome in the database has been assigned to a species cluster and most genomes could be accurately assigned to one or multiple habitats. In addition, general functional annotations and specific annotations of antibiotic resistance genes and single nucleotide variants are provided. In short, proGenomes2 provides threefold more genomes, enhanced habitat annotations, updated taxonomic and functional annotation and improved linkage to the NCBI BioSample database. The database is available at http://progenomes.embl.de/

MDC Repository

Cultivation-independent genomes greatly expand taxonomic-profiling capabilities of mOTUs across various environments

Author: Bork P.
Clayssen Q.
Karcher N.
Keller M.I.
Mende D.R.
Milanese A.
Paoli L.
Ruscheweyh H.J.
Sunagawa S.
Wirbel J.
Zeller G.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/12/2022
Field of study

BACKGROUND: Taxonomic profiling is a fundamental task in microbiome research that aims to detect and quantify the relative abundance of microorganisms in biological samples. Available methods using shotgun metagenomic data generally depend on the deposition of sequenced and taxonomically annotated genomes, usually from cultures of isolated strains, in reference databases (reference genomes). However, the majority of microorganisms have not been cultured yet. Thus, a substantial fraction of microbial community members remains unaccounted for during taxonomic profiling, particularly in samples from underexplored environments. To address this issue, we developed the mOTU profiler, a tool that enables reference genome-independent species-level profiling of metagenomes. As such, it supports the identification and quantification of both "known" and "unknown" species based on a set of select marker genes. RESULTS: We present mOTUs3, a command line tool that enables the profiling of metagenomes for >33,000 species-level operational taxonomic units. To achieve this, we leveraged the reconstruction of >600,000 draft genomes, most of which are metagenome-assembled genomes (MAGs), from diverse microbiomes, including soil, freshwater systems, and the gastrointestinal tract of ruminants and other animals, which we found to be underrepresented by reference genomes. Overall, two thirds of all species-level taxa lacked a reference genome. The cumulative relative abundance of these newly included taxa was low in well-studied microbiomes, such as the human body sites (6-11%). By contrast, they accounted for substantial proportions (ocean, freshwater, soil: 43-63%) or even the majority (pig, fish, cattle: 60-80%) of the relative abundance across diverse non-human-associated microbiomes. Using community-developed benchmarks and datasets, we found mOTUs3 to be more accurate than other methods and to be more congruent with 16S rRNA gene-based methods for taxonomic profiling. Furthermore, we demonstrate that mOTUs3 increases the resolution of well-known microbial groups into species-level taxa and helps identify new differentially abundant taxa in comparative metagenomic studies. CONCLUSIONS: We developed mOTUs3 to enable accurate species-level profiling of metagenomes. Compared to other methods, it provides a more comprehensive view of prokaryotic community diversity, in particular for currently underexplored microbiomes. To facilitate comparative analyses by the research community, it is released with >11,000 precomputed profiles for publicly available metagenomes and is freely available at: https://github.com/motu-tool/mOTUs . Video Abstract

PubMed Central

MDC Repository

Potential of fecal microbiota for early-stage detection of colorectal cancer

Author: Amiot A.
Benes V.
Boehm J.
Bork P.
Brunetti F.
Costea P.I.
Habermann N.
Hercog R.
Kloor M.
Koch M.
Kultima J.R.
Luciani A.
Mende D.R.
Schneider M.A.
Schrotz-King P.
Sobhani I.
Sunagawa S.
Tap J.
Tournigand C.
Tran Van Nhieu J.
Ulrich C.M.
Voigt A.Y.
von Knebel Doeberitz M.
Yamada T.
Zeller G.
Zimmermann J.
Publication venue: 'EMBO'
Publication date: 01/11/2014
Field of study

Several bacterial species have been implicated in the development of colorectal carcinoma (CRC), but CRC-associated changes of fecal microbiota and their potential for cancer screening remain to be explored. Here, we used metagenomic sequencing of fecal samples to identify taxonomic markers that distinguished CRC patients from tumor-free controls in a study population of 156 participants. Accuracy of metagenomic CRC detection was similar to the standard fecal occult blood test (FOBT) and when both approaches were combined, sensitivity improved > 45% relative to the FOBT, while maintaining its specificity. Accuracy of metagenomic CRC detection did not differ significantly between early- and late-stage cancer and could be validated in independent patient and control populations (N = 335) from different countries. CRC-associated changes in the fecal microbiome at least partially reflected microbial community composition at the tumor itself, indicating that observed gene pool differences may reveal tumor-related host-microbe interactions. Indeed, we deduced a metabolic shift from fiber degradation in controls to utilization of host carbohydrates and amino acids in CRC patients, accompanied by an increase of lipopolysaccharide metabolism

Crossref

Directory of Open Access Journals

PubMed Central

MDC Repository

HAL - UPEC / UPEM

A distinct lineage of giant viruses brings a rhodopsin photosystem to unicellular marine predators

Author: Bachy C.
Choi C.J.
DeLong E.F.
Hehenberger E.
Hosaka T.
Irwin N.A.T.
Iwasaki W.
Keeling P.J.
Kimura-Someya T.
Kojima K.
Kurihara R.
Leonard G.
Malmstrom R.R.
Mende D.R.
Nakajima Y.
Needham D.M.
Olson D.K.
Poirier C.
Richards T.A.
Santoro A.E.
Shirouzu M.
Sudek S.
Sudo Y.
Wilken S.
Worden A.Z.
Yoshizawa S.
Yung C.-M.
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 08/10/2019
Field of study

International Migration, Integration and Social Cohesion online publications

SPIRE: a Searchable, Planetary-scale mIcrobiome REsource

Author: Bork P.
Duan Y.
Ferretti P.
Finn R.D.
Fullam A.
Kuhn M.
Letunic I.
Maistrenko O.M.
Mende D.R.
Orakov A.
Pedro Coelho L.
Ruscheweyh H.J.
Schmidt T.S.B.
Sunagawa S.
Van Rossum T.
Publication venue: Oxford University Press
Publication date: 28/10/2023
Field of study

Meta'omic data on microbial diversity and function accrue exponentially in public repositories, but derived information is often siloed according to data type, study or sampled microbial environment. Here we present SPIRE, a Searchable Planetary-scale mIcrobiome REsource that integrates various consistently processed metagenome-derived microbial data modalities across habitats, geography and phylogeny. SPIRE encompasses 99 146 metagenomic samples from 739 studies covering a wide array of microbial environments and augmented with manually-curated contextual data. Across a total metagenomic assembly of 16 Tbp, SPIRE comprises 35 billion predicted protein sequences and 1.16 million newly constructed metagenome-assembled genomes (MAGs) of medium or high quality. Beyond mapping to the high-quality genome reference provided by proGenomes3 (http://progenomes.embl.de), these novel MAGs form 92 134 novel species-level clusters, the majority of which are unclassified at species level using current tools. SPIRE enables taxonomic profiling of these species clusters via an updated, custom mOTUs database (https://motu-tool.org/) and includes several layers of functional annotation, as well as crosslinks to several (micro-)biological databases. The resource is accessible, searchable and browsable via http://spire.embl.de

MDC Repository

proGenomes3: approaching one million accurately and consistently annotated high-quality prokaryotic genomes

Author: Bork P.
Ducarmon Q.R.
Fullam A.
Huerta-Cepas J.
Karcher N.
Khedkar S.
Kuhn M.
Larralde M.
Letunic I.
Maistrenko O.M.
Malfertheiner L.
Mende D.R.
Milanese A.
Rodrigues J.F.M.
Sanchis-López C.
Schmidt T.S.B.
Schudoma C.
Sunagawa S.
Szklarczyk D.
von Mering C.
Zeller G.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 21/11/2022
Field of study

The interpretation of genomic, transcriptomic and other microbial 'omics data is highly dependent on the availability of well-annotated genomes. As the number of publicly available microbial genomes continues to increase exponentially, the need for quality control and consistent annotation is becoming critical. We present proGenomes3, a database of 907 388 high-quality genomes containing 4 billion genes that passed stringent criteria and have been consistently annotated using multiple functional and taxonomic databases including mobile genetic elements and biosynthetic gene clusters. proGenomes3 encompasses 41 171 species-level clusters, defined based on universal single copy marker genes, for which pan-genomes and contextual habitat annotations are provided. The database is available at http://progenomes.embl.de/

Repository for Publications and Research Data

Digital.CSIC

MDC Repository