142 research outputs found

    Protein-Ligand Scoring with Convolutional Neural Networks

    Full text link
    Computational approaches to drug discovery can reduce the time and cost associated with experimental assays and enable the screening of novel chemotypes. Structure-based drug design methods rely on scoring functions to rank and predict binding affinities and poses. The ever-expanding amount of protein-ligand binding and structural data enables the use of deep machine learning techniques for protein-ligand scoring. We describe convolutional neural network (CNN) scoring functions that take as input a comprehensive 3D representation of a protein-ligand interaction. A CNN scoring function automatically learns the key features of protein-ligand interactions that correlate with binding. We train and optimize our CNN scoring functions to discriminate between correct and incorrect binding poses and known binders and non-binders. We find that our CNN scoring function outperforms the AutoDock Vina scoring function when ranking poses both for pose prediction and virtual screening

    Accelerating Inference in Molecular Diffusion Models with Latent Representations of Protein Structure

    Full text link
    Diffusion generative models have emerged as a powerful framework for addressing problems in structural biology and structure-based drug design. These models operate directly on 3D molecular structures. Due to the unfavorable scaling of graph neural networks (GNNs) with graph size as well as the relatively slow inference speeds inherent to diffusion models, many existing molecular diffusion models rely on coarse-grained representations of protein structure to make training and inference feasible. However, such coarse-grained representations discard essential information for modeling molecular interactions and impair the quality of generated structures. In this work, we present a novel GNN-based architecture for learning latent representations of molecular structure. When trained end-to-end with a diffusion model for de novo ligand design, our model achieves comparable performance to one with an all-atom protein representation while exhibiting a 3-fold reduction in inference time.Comment: This paper appeared as a spotlight paper at the NeurIPS 2023 Generative AI and Biology Worksho

    Computing free energies with PyBrella

    Get PDF
    Calculations of the rates of disassociation between small molecules and proteins have numerous applications, including assisting rapid discovery and testing of novel drugs. Free energy calculations consider the enthalpy and entropy of the full protein-ligand-water system and so have the potential to be more accurate than faster, single-point calculations. In this study, methods are explored to predict the binding affinity of various molecules to proteins by molecular dynamics and umbrella sampling. An attempt was made to determine the potential of mean force (PMF) for the molecule, which was compared to its known binding capability. Factors including simulation resources, amounts of sampling, force strength parameters, and correlation between predicted energy and actual rate constants were considered in order to evaluate the umbrella sampling methods. Limitations in the simulation environment, such as the scaling of the PMF for sampling, biases in the SMD trajectory, and variations between ligands, were also investigated in the hope of creating a more comprehensive approach for predicting the target-molecule interaction

    A pyrazolopyran derivative preferentially inhibits the activity of human cytosolic hydroxymethyltransferase and induces cell death in lung cancer cells

    Get PDF
    Serine hydroxymethyltransferase (SHMT) is a central enzyme in the metabolic reprogramming of cancer cells, providing activated one-carbon units in the serine-glycine one-carbon metabolism. Previous studies demonstrated that the cytoplasmic isoform of SHMT (SHMT1) plays a relevant role in lung cancer. SHMT1 is overexpressed in lung cancer patients and NSCLC cell lines. Moreover, SHMT1 is required to maintain DNA integrity. Depletion in lung cancer cell lines causes cell cycle arrest and uracil accumulation and ultimately leads to apoptosis. We found that a pyrazolopyran compound, namely 2.12, preferentially inhibits SHMT1 compared to the mitochondrial counterpart SHMT2. Computational and crystallographic approaches suggest binding at the active site of SHMT1 and a competitive inhibition mechanism. A radio isotopic activity assay shows that inhibition of SHMT by 2.12 also occurs in living cells. Moreover, administration of 2.12 in A549 and H1299 lung cancer cell lines causes apoptosis at LD50 34 μM and rescue experiments underlined selectivity towards SHMT1. These data not only further highlight the relevance of the cytoplasmic isoform SHMT1 in lung cancer but, more importantly, demonstrate that, at least in vitro, it is possible to find selective inhibitors against one specific isoform of SHMT, a key target in metabolic reprogramming of many cancer types

    Improvements to the APBS biomolecular solvation software suite

    Full text link
    The Adaptive Poisson-Boltzmann Solver (APBS) software was developed to solve the equations of continuum electrostatics for large biomolecular assemblages that has provided impact in the study of a broad range of chemical, biological, and biomedical applications. APBS addresses three key technology challenges for understanding solvation and electrostatics in biomedical applications: accurate and efficient models for biomolecular solvation and electrostatics, robust and scalable software for applying those theories to biomolecular systems, and mechanisms for sharing and analyzing biomolecular electrostatics data in the scientific community. To address new research applications and advancing computational capabilities, we have continually updated APBS and its suite of accompanying software since its release in 2001. In this manuscript, we discuss the models and capabilities that have recently been implemented within the APBS software package including: a Poisson-Boltzmann analytical and a semi-analytical solver, an optimized boundary element solver, a geometry-based geometric flow solvation model, a graph theory based algorithm for determining pKaK_a values, and an improved web-based visualization tool for viewing electrostatics

    Hidden bias in the DUD-E dataset leads to misleading performance of deep learning in structure-based virtual screening

    Full text link
    Recently much effort has been invested in using convolutional neural network (CNN) models trained on 3D structural images of protein-ligand complexes to distinguish binding from non-binding ligands for virtual screening. However, the dearth of reliable protein-ligand x-ray structures and binding affinity data has required the use of constructed datasets for the training and evaluation of CNN molecular recognition models. Here, we outline various sources of bias in one such widely-used dataset, the Directory of Useful Decoys: Enhanced (DUDE). We have constructed and performed tests to investigate whether CNN models developed using DUD-E are properly learning the underlying physics of molecular recognition, as intended, or are instead learning biases inherent in the dataset itself. We find that superior enrichment efficiency in CNN models can be attributed to the analogue and decoy bias hidden in the DUD-E dataset rather than successful generalization of the pattern of proteinligand interactions. Comparing additional deep learning models trained on PDBbind datasets, we found that their enrichment performances using DUD-E are not superior to the performance of the docking program AutoDock Vina. Together, these results suggest that biases that could be present in constructed datasets should be thoroughly evaluated before applying them to machine learning based methodology development
    • …
    corecore