Search CORE

5 research outputs found

MIBiG 3.0 : a community-driven effort to annotate experimentally validated biosynthetic gene clusters

Author: Aguilar César
Al-Salihi Suhad A.A.
Alanjary Mohammad
Aleti Gajender
Augustijn Hannah E.
Avalon Nicole E.
Avelar-Rivas J. Abraham
Avitia-Domínguez Luis A.
Balaya Rex Devasahayam Arokia
Barona-Gómez Francisco
Bernaldo-Agüero Jordan
Bielinski Vincent A.
Biermann Friederike
Blin Kai
Booth Thomas J.
Carrion Bravo Victor J.
Castelo-Branco Raquel
Chagas Fernanda O.
Chevrette Marc G.
Collemare Jérôme
Cruz-Morales Pablo
Du Chao
Duncan Katherine R.
Egbert Susan
Gavriilidou Athina
Gayrard Damien
Gutiérrez-García Karina
Haslinger Kristina
Helfrich Eric J.N.
Jati Afif P.
Kalkreuter Edward
Kalyvas Nikolaos
Kang Kyo B.
Kautsar Satria
Kim Wonyong
Kunjapur Aditya M.
Lee Sanghoon
Li Yong-Xin
Lin Geng-Min
Linington Roger G.
Loureiro Catarina
Louwen Joris J.R.
Louwen Nico L.L.
Lund George
Medema Marnix H.
Meijer David
Navarro-Muñoz Jorge C.
Parra Jonathan
Philmus Benjamin
Pourmohsenin Bita
Pronk Lotte J.U.
Recchia Michael J.J.
Rego Adriana
Reitz Zachary L.
Robinson Serina
Rosas-Becerra L. Rodrigo
Roxborough Eve T.
Schorn Michelle A.
Scobie Darren J.
Selem-Mojica Nelly
Singh Kumar Saurabh
Sokolova Nika
Tang Xiaoyu
Terlouw Barbara R.
Tørring Thomas
Udwary Daniel
van der Hooft Justin J.J.
van Santen Jeffrey A.
Vigneshwari Aruna
Vind Kristiina
Vromans Sophie P.J.M.
Waschulin Valentin
Weber Tilmann
Williams Sam E.
Winter Jaclyn M.
Witte Thomas E.
Xie Huali
Yang Dong
Yu Jingwei
Zaroubi Liana
Zdouc Mitja
Zhong Zheng
Publication venue
Publication date: 18/11/2022
Field of study

With an ever-increasing amount of (meta)genomic data being deposited in sequence databases, (meta)genome mining for natural product biosynthetic pathways occupies a critical role in the discovery of novel pharmaceutical drugs, crop protection agents and biomaterials. The genes that encode these pathways are often organised into biosynthetic gene clusters (BGCs). In 2015, we defined the Minimum Information about a Biosynthetic Gene cluster (MIBiG): a standardised data format that describes the minimally required information to uniquely characterise a BGC. We simultaneously constructed an accompanying online database of BGCs, which has since been widely used by the community as a reference dataset for BGCs and was expanded to 2021 entries in 2019 (MIBiG 2.0). Here, we describe MIBiG 3.0, a database update comprising large-scale validation and re-annotation of existing entries and 661 new entries. Particular attention was paid to the annotation of compound structures and biological activities, as well as protein domain selectivities. Together, these new features keep the database up-to-date, and will provide new opportunities for the scientific community to use its freely available data, e.g. for the training of new machine learning models to predict sequence-structure-function relationships for diverse natural products. MIBiG 3.0 is accessible online at https://mibig.secondarymetabolites.org/

University of Strathclyde Institutional Repository

ZENODO

eScholarship - University of California

Warwick Research Archives Portal Repository

Digital.CSIC

Online Research Database In Technology

Explore Bristol Research

Comprehensive large-scale integrative analysis of omics data to accelerate specialized metabolite discovery

Author: Louwen Joris J.R.
Van Der Hooft Justin J.J.
Publication venue: 'American Society for Microbiology'
Publication date: 01/01/2021
Field of study

Microbial specialized metabolites are key mediators in host-microbiome interactions. Most of the chemical space produced by the microbiome currently remains unexplored and uncharacterized. This situation calls for new and improved methods to exploit the growing publicly available genomic and metabolomic data sets and connect the outcomes to structural and functional knowledge inferred from transcriptomics and proteomics experiments. Here, we first describe currently available approaches that support the comprehensive mining of metabolomics and genomics data. Next, we provide our vision on how to move forward toward the automated linking of omics data of specialized metabolites to their structures, biosynthesis pathways, producers, and functions

Enhanced correlation-based linking of biosynthetic gene clusters to their metabolic products through chemical class matching

Author: Louwen Joris J.R.
Medema Marnix H.
van der Hooft Justin J.J.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2023
Field of study

It is well-known that the microbiome produces a myriad of specialised metabolites with diverse functions. To better characterise their structures and identify their producers in complex samples, integrative genome and metabolome mining is becoming increasingly popular. Metabologenomic co-occurrence-based correlation scoring methods facilitate the linking of metabolite mass fragmentation spectra (MS/MS) to their cognate biosynthetic gene clusters (BGCs) based on shared absence/presence patterns of metabolites and BGCs in paired omics datasets of multiple strains. Recently, these methods have been made more readily accessible through the NPLinker platform. However, co-occurrence-based approaches usually result in too many candidate links to manually validate. To address this issue, we introduce a generic feature-based correlation method that matches chemical compound classes between BGCs and MS/MS spectra. Results: To automatically reduce the long lists of potential BGC-MS/MS spectrum links, we match natural product (NP) ontologies previously independently developed for genomics and metabolomics and developed NPClassScore: an empirical class matching score that we also implemented in the NPLinker platform. By applying NPClassScore on three paired omics datasets totalling 189 bacterial strains, we show that the number of links is reduced by on average 63% as compared to using a co-occurrence-based strategy alone. We further demonstrate that 96% of experimentally validated links in these datasets are retained and prioritised when using NPClassScore. Conclusion: The matching genome-metabolome class ontologies provide a starting point for selecting plausible candidates for BGCs and MS/MS spectra based on matching chemical compound class ontologies. NPClassScore expedites genome/metabolome data integration, as relevant BGC-metabolite links are prioritised, and researchers are faced with substantially fewer proposed BGC-MS/MS links to manually inspect. We anticipate that our addition to the NPLinker platform will aid integrative omics mining workflows in discovering novel NPs and understanding complex metabolic interactions in the microbiome. [MediaObject not available: see fulltext.]

PubMed Central

iPRESTO: Automated discovery of biosynthetic sub-clusters linked to specific natural product substructures

Author: Kautsar Satria A.
Louwen Joris J.R.
Medema Marnix H.
van der Burg Sven
van der Hooft Justin J.J.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2023
Field of study

Microbial specialised metabolism is full of valuable natural products that are applied clinically, agriculturally, and industrially. The genes that encode their biosynthesis are often physically clustered on the genome in biosynthetic gene clusters (BGCs). Many BGCs consist of multiple groups of co-evolving genes called sub-clusters that are responsible for the biosynthesis of a specific chemical moiety in a natural product. Sub-clusters therefore provide an important link between the structures of a natural product and its BGC, which can be leveraged for predicting natural product structures from sequence, as well as for linking chemical structures and metabolomics-derived mass features to BGCs. While some initial computational methodologies have been devised for sub-cluster detection, current approaches are not scalable, have only been run on small and outdated datasets, or produce an impractically large number of possible sub-clusters to mine through. Here, we constructed a scalable method for unsupervised sub-cluster detection, called iPRESTO, based on topic modelling and statistical analysis of co-occurrence patterns of enzyme-coding protein families. iPRESTO was used to mine sub-clusters across 150,000 prokaryotic BGCs from antiSMASH-DB. After annotating a fraction of the resulting sub-cluster families, we could predict a substructure for 16% of the antiSMASH-DB BGCs. Additionally, our method was able to confirm 83% of the experimentally characterised sub-clusters in MIBiG reference BGCs. Based on iPRESTO-detected sub-clusters, we could correctly identify the BGCs for xenorhabdin and salbostatin biosynthesis (which had not yet been annotated in BGC databases), as well as propose a candidate BGC for akashin biosynthesis. Additionally, we show for a collection of 145 actinobacteria how substructures can aid in linking BGCs to molecules by correlating iPRESTO-detected sub-clusters to MS/MS-derived Mass2Motifs substructure patterns. This work paves the way for deeper functional and structural annotation of microbial BGCs by improved linking of orphan molecules to their cognate gene clusters, thus facilitating accelerated natural product discovery

NPOmix: A machine learning classifier to connect mass spectrometry fragmentation data to biosynthetic gene clusters

Author: Aron Allegra T.
Bandeira Nuno
Bauermeister Anelize
Brejnrod Asker
da Silva Ricardo
Dorrestein Pieter C.
Fiore Marli F.
Gerwick Lena
Gerwick William H.
Glukhov Evgenia
Gomes Paulo Wender P.
Gurevich Alexey
Kim Hyun Woo
Leão Tiago F.
Louwen Joris J.R.
Reher Raphael
van der Hooft Justin J.J.
Wang Mingxun
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2022
Field of study

Microbial specialized metabolites are an important source of and inspiration for many pharmaceuticals, biotechnological products and play key roles in ecological processes. Untargeted metabolomics using liquid chromatography coupled with tandem mass spectrometry is an efficient technique to access metabolites from fractions and even environmental crude extracts. Nevertheless, metabolomics is limited in predicting structures or bioactivities for cryptic metabolites. Efficiently linking the biosynthetic potential inferred from (meta)genomics to the specialized metabolome would accelerate drug discovery programs by allowing metabolomics to make use of genetic predictions. Here, we present a k-nearest neighbor classifier to systematically connect mass spectrometry fragmentation spectra to their corresponding biosynthetic gene clusters (independent of their chemical class). Our new pattern-based genome mining pipeline links biosynthetic genes to metabolites that they encode for, as detected via mass spectrometry from bacterial cultures or environmental microbiomes. Using paired datasets that include validated genes-mass spectral links from the Paired Omics Data Platform, we demonstrate this approach by automatically linking 18 previously known mass spectra (17 for which the biosynthesis gene clusters can be found at the MIBiG database plus palmyramide A) to their corresponding previously experimentally validated biosynthetic genes (e.g., via nuclear magnetic resonance or genetic engineering). We illustrated a computational example of how to use our Natural Products Mixed Omics (NPOmix) tool for siderophore mining that can be reproduced by the users. We conclude that NPOmix minimizes the need for culturing (it worked well on microbiomes) and facilitates specialized metabolite prioritization based on integrative omics mining

eScholarship - University of California