4 research outputs found

    MolDiscovery: learning mass spectrometry fragmentation of small molecules

    No full text
    A large number of mass spectra from different samples have been collected, and to identify small molecules from these spectra, database searches are needed, which is challenging. Here, the authors report molDiscovery, a mass spectral database search method that uses an algorithm to generate mass spectrometry fragmentations and learns a probabilistic model to match small molecules with their mass spectra

    Nerpa: A Tool for Discovering Biosynthetic Gene Clusters of Bacterial Nonribosomal Peptides

    No full text
    Microbial natural products are a major source of bioactive compounds for drug discovery. Among these molecules, nonribosomal peptides (NRPs) represent a diverse class of natural products that include antibiotics, immunosuppressants, and anticancer agents. Recent breakthroughs in natural product discovery have revealed the chemical structure of several thousand NRPs. However, biosynthetic gene clusters (BGCs) encoding them are known only for a few hundred compounds. Here, we developed Nerpa, a computational method for the high-throughput discovery of novel BGCs responsible for producing known NRPs. After searching 13,399 representative bacterial genomes from the RefSeq repository against 8368 known NRPs, Nerpa linked 117 BGCs to their products. We further experimentally validated the predicted BGC of ngercheumicin from Photobacterium galatheae via mass spectrometry. Nerpa supports searching new genomes against thousands of known NRP structures, and novel molecular structures against tens of thousands of bacterial genomes. The availability of these tools can enhance our understanding of NRP synthesis and the function of their biosynthetic enzymes

    TeachOpenCADD goes Deep Learning: Open-source Teaching Platform Exploring Molecular DL Applications

    No full text
    TeachOpenCADD is a free online platform that offers solutions to common computer-aided drug design (CADD) tasks using Python programming and open-source data and packages. The material is presented through interactive Jupyter notebooks, accommodating users from various backgrounds and programming levels. Due to the tremendous impact of deep learning (DL) methods in drug design, the TeachOpenCADD platform has been expanded to include an introduction to molecular DL tasks. This edition provides an overview of DL and its application in drug design, highlighting the usage of diverse molecular representations in this field. The platform introduces various neural network architectures, including graph neural networks (GNNs), equivariant graph neural networks (EGNNs), and recurrent neural networks (RNNs). It demonstrates how to use these architectures for developing predictive models for molecular property and activity prediction, exemplified by the Quantum Machine 9 (QM9), ChEMBL, and Kinase Inhibitor BioActivity (KiBA) data sets. The DL edition covers methods for evaluating the performance of neural networks using uncertainty estimation. Furthermore, it introduces an application of GNNs for protein-ligand interaction predictions, incorporating protein structure and ligand information. The TeachOpenCADD platform is continuously updated with new content and is open to contributions, bug reports, and questions from the community through its GitHub repository (https://github.com/volkamerlab/teachopencadd). It can be used for self-study, classroom instruction, and research applications, accommodating users from beginners to advanced levels

    ABC-HuMi: the Atlas of Biosynthetic Gene Clusters in the Human Microbiome

    No full text
    The human microbiome has emerged as a rich source of diverse and bioactive natural products, harboring immense potential for therapeutic applications. To facilitate systematic exploration and analysis of its biosynthetic landscape, we present ABC-HuMi: the Atlas of Biosynthetic Gene Clusters (BGCs) in the Human Microbiome. ABC-HuMi integrates data from major human microbiome sequence databases and provides an expansive repository of BGCs compared to the limited coverage offered by existing resources. Employing state-of-the-art BGC prediction and analysis tools, our database ensures accurate annotation and enhanced prediction capabilities. ABC-HuMi empowers researchers with advanced browsing, filtering, and search functionality, enabling efficient exploration of the resource. At present, ABC-HuMi boasts a catalog of 19 218 representative BGCs derived from the human gut, oral, skin, respiratory and urogenital systems. By capturing the intricate biosynthetic potential across diverse human body sites, our database fosters profound insights into the molecular repertoire encoded within the human microbiome and offers a comprehensive resource for the discovery and characterization of novel bioactive compounds. The database is freely accessible at https://www.ccb.uni-saarland.de/abc_humi/
    corecore