29 research outputs found
HypoRiPPAtlas as an Atlas of hypothetical natural products for mass spectrometry database search
Recent analyses of public microbial genomes have found over a million biosynthetic gene clusters, the natural products of the majority of which remain
unknown. Additionally, GNPS harbors billions of mass spectra of natural products without known structures and biosynthetic genes. We bridge the gap
between large-scale genome mining and mass spectral datasets for natural
product discovery by developing HypoRiPPAtlas, an Atlas of hypothetical
natural product structures, which is ready-to-use for in silico database search
of tandem mass spectra. HypoRiPPAtlas is constructed by mining genomes
using seq2ripp, a machine-learning tool for the prediction of ribosomally
synthesized and post-translationally modified peptides (RiPPs). In HypoRiPPAtlas, we identify RiPPs in microbes and plants. HypoRiPPAtlas could be
extended to other natural product classes in the future by implementing
corresponding biosynthetic logic. This study paves the way for large-scale
explorations of biosynthetic pathways and chemical structures of microbial
and plant RiPP classes
Integrating genomics and metabolomics for scalable non-ribosomal peptide discovery.
Non-Ribosomal Peptides (NRPs) represent a biomedically important class of natural products that include a multitude of antibiotics and other clinically used drugs. NRPs are not directly encoded in the genome but are instead produced by metabolic pathways encoded by biosynthetic gene clusters (BGCs). Since the existing genome mining tools predict many putative NRPs synthesized by a given BGC, it remains unclear which of these putative NRPs are correct and how to identify post-assembly modifications of amino acids in these NRPs in a blind mode, without knowing which modifications exist in the sample. To address this challenge, here we report NRPminer, a modification-tolerant tool for NRP discovery from large (meta)genomic and mass spectrometry datasets. We show that NRPminer is able to identify many NRPs from different environments, including four previously unreported NRP families from soil-associated microbes and NRPs from human microbiota. Furthermore, in this work we demonstrate the anti-parasitic activities and the structure of two of these NRP families using direct bioactivity screening and nuclear magnetic resonance spectrometry, illustrating the power of NRPminer for discovering bioactive NRPs
American Gut: An Open Platform For Citizen Science Microbiome Research
Copyright © 2018 McDonald et al. Although much work has linked the human microbiome to specific phenotypes and lifestyle variables, data from different projects have been challenging to integrate and the extent of microbial and molecular diversity in human stool remains unknown. Using standardized protocols from the Earth Microbiome Project and sample contributions from over 10,000 citizen-scientists, together with an open research network, we compare human microbiome specimens primarily from the United States, United Kingdom, and Australia to one another and to environmental samples. Our results show an unexpected range of beta-diversity in human stool microbiomes compared to environmental samples; demonstrate the utility of procedures for removing the effects of overgrowth during room-temperature shipping for revealing phenotype correlations; uncover new molecules and kinds of molecular communities in the human stool metabolome; and examine emergent associations among the microbiome, metabolome, and the diversity of plants that are consumed (rather than relying on reductive categorical variables such as veganism, which have little or no explanatory power). We also demonstrate the utility of the living data resource and cross-cohort comparison to confirm existing associations between the microbiome and psychiatric illness and to reveal the extent of microbiome change within one individual during surgery, providing a paradigm for open microbiome research and education. IMPORTANCE We show that a citizen science, self-selected cohort shipping samples through the mail at room temperature recaptures many known microbiome results from clinically collected cohorts and reveals new ones. Of particular interest is integrating n = 1 study data with the population data, showing that the extent of microbiome change after events such as surgery can exceed differences between distinct environmental biomes, and the effect of diverse plants in the diet, which we confirm with untargeted metabolomics on hundreds of samples
American Gut: an Open Platform for Citizen Science Microbiome Research
McDonald D, Hyde E, Debelius JW, et al. American Gut: an Open Platform for Citizen Science Microbiome Research. mSystems. 2018;3(3):e00031-18
Recommended from our members
Scalable Computational Methods for Discovering Novel Natural Products
Today, as the world is stricken by the proliferation of novel infectious pathogens, we are faced with the urgent need for new anti-infective therapeutic agents. Natural products, also known as specialized metabolites, are chemical compounds produced by living organisms and have served as an excellent source for drug discovery. Many clinically used small molecules including various antimicrobial, anticancer, antiviral, and immunosuppressant drugs, are either natural products or are inspired by them. Traditionally, natural products were discovered mostly through slow and laborious experiments that often lead to rediscovering previously known compounds.Over the past decade, advancements in short/long-read (meta)genomics and tandem mass spectrometry (MS/MS) technologies provided an unprecedented resource for large-scale natural product discovery. In accordance with these advancements, scalable bioinformatics algorithms are required to leverage this massive data and enable analyses of natural products across thousands of samples. In this dissertation, I present several scalable computational methods for discovering novel natural products using the (MS/MS-based) metabolomics and/or (meta)genomics data.In the first chapter, I present CycloNovo, the first algorithm for scalable de novo sequencing of MS/MS data to discover cyclic and branch cyclic peptides (referred to as cyclopeptides). Cyclopeptides constitute a diverse and biomedically important class of natural products. CycloNovo employs de Bruijn graphs, the workhorse of DNA sequencing algorithms, for efficient cyclopeptide sequencing and revealed a wealth of novel cyclopeptides, including a large hidden cyclopeptidome in the human gut.In the following chapters, I discuss bioinformatics methods for discovering Non-Ribosomal Peptides (NRPs) that include a multitude of antibiotics and other clinically used drugs. NRPs are produced by metabolic pathways partially encoded by Biosynthetic Gene Clusters (BGCs). In the second chapter, I present NRPminer, a modification-tolerant and scalable algorithm for NRP discovery by integrating (meta)genomic and MS/MS data. NRPminer identified many novel NRPs from different origins, including novel NRPs produced by soil-associated microbes and human microbiota. Finally, I discuss the problem of identifying NRP-producing BGCs in the human gut microbiome and I show long-read metagenomic assemblies can be used to reveal many BGCs that synthesize previously unknown NRPs in the human gut microbiome
Computational aspects of dna self-assembly systems at temperature 1
In this thesis, we investigate the computational power of some variants of Winfree\u27s abstract Tile Assembly Model (aTAM) at Temperature 1 [43]. Although aTAM at temperatures higher than 1 are proved to be Turing Universal, i.e. they can simulate an arbitrary Turing Machine [43], the computational power of aTAM at temperature 1 is still an open question. It is known that some modifications of aTAM are indeed Turing Universal at temperature 1 [11, 30]. In this thesis, we first show that two variants of aTAM, namely the Staged Tile Assembly Model and Step-wise Tile Assembly Model at Temperature 1, are also Turing Universal. Next, we discuss the computational power of the self-assembly with triangular tiles and hexagonal tiles, respectively. We prove that these models can simulate arbitrary systems under aTAM and vice versa, and consequently, they have the same computational power as aTAM
Recommended from our members
metaFlye: scalable long-read metagenome assembly using repeat graphs.
Long-read sequencing technologies have substantially improved the assemblies of many isolate bacterial genomes as compared to fragmented short-read assemblies. However, assembling complex metagenomic datasets remains difficult even for state-of-the-art long-read assemblers. Here we present metaFlye, which addresses important long-read metagenomic assembly challenges, such as uneven bacterial composition and intra-species heterogeneity. First, we benchmarked metaFlye using simulated and mock bacterial communities and show that it consistently produces assemblies with better completeness and contiguity than state-of-the-art long-read assemblers. Second, we performed long-read sequencing of the sheep microbiome and applied metaFlye to reconstruct 63 complete or nearly complete bacterial genomes within single contigs. Finally, we show that long-read assembly of human microbiomes enables the discovery of full-length biosynthetic gene clusters that encode biomedically important natural products
RNA-seq data and transcriptome assembly results.
<p>Sequences were generated using 75 bp paired end reads.</p><p>RNA-seq data and transcriptome assembly results.</p
Pathway analysis for liver transcripts from <i>R. catesbeiana</i> (CAT) and <i>X. laevis</i> (LAE).
<p>Top 25 impacted pathways after TH treatment for <i>R. catesbeiana</i> ranked by the highest proportion of overall observed genes. The pathway names are indicated in the center of the figure with the total number of genes known in each IGA pathway indicated. The asterisk indicates those pathways that are found in the top 25 list of <i>X. laevis</i>. The colour coded bar plots illustrate the percentage of the total number of gene transcripts in a pathway that are downregulated (blue), non-responsive (yellow), upregulated (red) or not observed in the experiment (gray) relative to the control condition. Differentially expressed transcripts were determined using a p-value threshold of 5%.</p