18 research outputs found

    A Graphical Model for Fusing Diverse Microbiome Data

    Full text link
    This paper develops a Bayesian graphical model for fusing disparate types of count data. The motivating application is the study of bacterial communities from diverse high dimensional features, in this case transcripts, collected from different treatments. In such datasets, there are no explicit correspondences between the communities and each correspond to different factors, making data fusion challenging. We introduce a flexible multinomial-Gaussian generative model for jointly modeling such count data. This latent variable model jointly characterizes the observed data through a common multivariate Gaussian latent space that parameterizes the set of multinomial probabilities of the transcriptome counts. The covariance matrix of the latent variables induces a covariance matrix of co-dependencies between all the transcripts, effectively fusing multiple data sources. We present a computationally scalable variational Expectation-Maximization (EM) algorithm for inferring the latent variables and the parameters of the model. The inferred latent variables provide a common dimensionality reduction for visualizing the data and the inferred parameters provide a predictive posterior distribution. In addition to simulation studies that demonstrate the variational EM procedure, we apply our model to a bacterial microbiome dataset

    antiSMASH 4.0—improvements in chemistry prediction and gene cluster boundary identification

    Get PDF
    Many antibiotics, chemotherapeutics, crop protection agents and food preservatives originate from molecules produced by bacteria, fungi or plants. In recent years, genome mining methodologies have been widely adopted to identify and characterize the biosynthetic gene clusters encoding the production of such compounds. Since 2011, the ‘antibiotics and secondary metabolite analysis shell—antiSMASH’ has assisted researchers in efficiently performing this, both as a web server and a standalone tool. Here, we present the thoroughly updated antiSMASH version 4, which adds several novel features, including prediction of gene cluster boundaries using the ClusterFinder method or the newly integrated CASSIS algorithm, improved substrate specificity prediction for non-ribosomal peptide synthetase adenylation domains based on the new SANDPUMA algorithm, improved predictions for terpene and ribosomally synthesized and post-translationally modified peptides cluster products, reporting of sequence similarity to proteins encoded in experimentally characterized gene clusters on a per-protein basis and a domain-level alignment tool for comparative analysis of trans-AT polyketide synthase assembly line architectures. Additionally, several usability features have been updated and improved. Together, these improvements make antiSMASH up-to-date with the latest developments in natural product research and will further facilitate computational genome mining for the discovery of novel bioactive molecules

    MIBiG 3.0 : a community-driven effort to annotate experimentally validated biosynthetic gene clusters

    Get PDF
    With an ever-increasing amount of (meta)genomic data being deposited in sequence databases, (meta)genome mining for natural product biosynthetic pathways occupies a critical role in the discovery of novel pharmaceutical drugs, crop protection agents and biomaterials. The genes that encode these pathways are often organised into biosynthetic gene clusters (BGCs). In 2015, we defined the Minimum Information about a Biosynthetic Gene cluster (MIBiG): a standardised data format that describes the minimally required information to uniquely characterise a BGC. We simultaneously constructed an accompanying online database of BGCs, which has since been widely used by the community as a reference dataset for BGCs and was expanded to 2021 entries in 2019 (MIBiG 2.0). Here, we describe MIBiG 3.0, a database update comprising large-scale validation and re-annotation of existing entries and 661 new entries. Particular attention was paid to the annotation of compound structures and biological activities, as well as protein domain selectivities. Together, these new features keep the database up-to-date, and will provide new opportunities for the scientific community to use its freely available data, e.g. for the training of new machine learning models to predict sequence-structure-function relationships for diverse natural products. MIBiG 3.0 is accessible online at https://mibig.secondarymetabolites.org/

    On the evolution of natural product biosynthesis

    No full text
    Natural products are the raw material for drug discovery programmes. Bioactive natural products are used extensively in medicine and agriculture and have found utility as antibiotics, immunosuppressives, anti-cancer drugs and anthelminthics. Remarkably, the natural role and what mechanisms drive evolution of these molecules is relatively poorly understood. The exponential increase in genome and chemical data in recent years, coupled with technical advances in bioinformatics and genetics have enabled progress to be made in understanding the evolution of biosynthetic gene clusters and the products of their enzymatic machinery. Here we discuss the diversity of natural products, incorporating the mechanisms that govern evolution of metabolic pathways and how this can be applied to biosynthetic gene clusters. We build on the nomenclature of natural products in terms of primary, integrated, secondary and specialised metabolism and place this within an ecology-evolutionary-developmental biology framework. This eco-evo-devo framework we believe will help to clarify the nature and use of the term specialised metabolites in the future

    SANDPUMA: ensemble predictions of nonribosomal peptide chemistry reveal biosynthetic diversity across Actinobacteria

    No full text
    Nonribosomally synthesized peptides (NRPs) are natural products with widespread applications in medicine and biotechnology. Many algorithms have been developed to predict the substrate specificities of nonribosomal peptide synthetase adenylation (A) domains from DNA sequences, which enables prioritization and dereplication, and integration with other data types in discovery efforts. However, insufficient training data and a lack of clarity regarding prediction quality have impeded optimal use. Here, we introduce prediCAT, a new phylogenetics-inspired algorithm, which quantitatively estimates the degree of predictability of each A-domain. We then systematically benchmarked all algorithms on a newly gathered, independent test set of 434 A-domain sequences, showing that active-site-motif-based algorithms outperform whole-domain-based methods. Subsequently, we developed SANDPUMA, a powerful ensemble algorithm, based on newly trained versions of all high-performing algorithms, which significantly outperforms individual methods. Finally, we deployed SANDPUMA in a systematic investigation of 7635 Actinobacteria genomes, suggesting that NRP chemical diversity is much higher than previously estimated. SANDPUMA has been integrated into the widely used antiSMASH biosynthetic gene cluster analysis pipeline and is also available as an open-source, standalone tool

    Biosynthesis and function of 7-deazaguanine derivatives in bacteria and phages

    No full text
    Deazaguanine modifications play multifaceted roles in the molecular biology of DNA and tRNA, shaping diverse yet essential biological processes, including the nuanced fine-tuning of translation efficiency and the intricate modulation of codon-anticodon interactions. Beyond their roles in translation, deazaguanine modifications contribute to cellular stress resistance, self-nonself discrimination mechanisms, and host evasion defenses, directly modulating the adaptability of living organisms. Deazaguanine moieties extend beyond nucleic acid modifications, manifesting in the structural diversity of biologically active natural products. Their roles in fundamental cellular processes and their presence in biologically active natural products underscore their versatility and pivotal contributions to the intricate web of molecular interactions within living organisms. Here, we discuss the current understanding of the biosynthesis and multifaceted functions of deazaguanines, shedding light on their diverse and dynamic roles in the molecular landscape of life

    Evolution of combinatorial diversity in trans-acyltransferase polyketide synthase assembly lines across bacteria

    No full text
    Trans-acyltransferase polyketide synthases (trans-AT PKSs) are bacterial multimodular enzymes that biosynthesize diverse pharmaceutically and ecologically important polyketides. A notable feature of this natural product class is the existence of chemical hybrids that combine core moieties from different polyketide structures. To understand the prevalence, biosynthetic basis, and evolutionary patterns of this phenomenon, we developed transPACT, a phylogenomic algorithm to automate global classification of trans-AT PKS modules across bacteria and applied it to 1782 trans-AT PKS gene clusters. These analyses reveal widespread exchange patterns suggesting recombination of extended PKS module series as an important mechanism for metabolic diversification in this natural product class. For three plant-associated bacteria, i.e., the root colonizer Gynuella sunshinyii and the pathogens Xanthomonas cannabis and Pseudomonas syringae, we demonstrate the utility of this computational approach for uncovering cryptic relationships between polyketides, accelerating polyketide mining from fragmented genome sequences, and discovering polyketide variants with conserved moieties of interest. As natural combinatorial hybrids are rare among the more commonly studied cis-AT PKSs, this study paves the way towards evolutionarily informed, rational PKS engineering to produce chimeric trans-AT PKS-derived polyketides.ISSN:2041-172

    Bacillimidazoles A−F, Imidazolium-Containing Compounds Isolated from a Marine Bacillus

    No full text
    Chemical investigations of a marine sponge-associated Bacillus revealed six new imidazolium-containing compounds, bacillimidazoles A–F (1–6). Previous reports of related imidazolium-containing natural products are rare. Initially unveiled by timsTOF (trapped ion mobility spectrometry) MS data, extensive HRMS and 1D and 2D NMR analyses enabled the structural elucidation of 1–6. In addition, a plausible biosynthetic pathway to bacillimidazoles is proposed based on isotopic labeling experiments and invokes the highly reactive glycolytic adduct 2,3-butanedione. Combined, the results of structure elucidation efforts, isotopic labeling studies and bioinformatics suggest that 1–6 result from a fascinating intersection of primary and secondary metabolic pathways in Bacillus sp. WMMC1349. Antimicrobial assays revealed that, of 1–6, only compound six displayed discernible antibacterial activity, despite the close structural similarities shared by all six natural products
    corecore