142 research outputs found
Protein-Ligand Scoring with Convolutional Neural Networks
Computational approaches to drug discovery can reduce the time and cost
associated with experimental assays and enable the screening of novel
chemotypes. Structure-based drug design methods rely on scoring functions to
rank and predict binding affinities and poses. The ever-expanding amount of
protein-ligand binding and structural data enables the use of deep machine
learning techniques for protein-ligand scoring.
We describe convolutional neural network (CNN) scoring functions that take as
input a comprehensive 3D representation of a protein-ligand interaction. A CNN
scoring function automatically learns the key features of protein-ligand
interactions that correlate with binding. We train and optimize our CNN scoring
functions to discriminate between correct and incorrect binding poses and known
binders and non-binders. We find that our CNN scoring function outperforms the
AutoDock Vina scoring function when ranking poses both for pose prediction and
virtual screening
Accelerating Inference in Molecular Diffusion Models with Latent Representations of Protein Structure
Diffusion generative models have emerged as a powerful framework for
addressing problems in structural biology and structure-based drug design.
These models operate directly on 3D molecular structures. Due to the
unfavorable scaling of graph neural networks (GNNs) with graph size as well as
the relatively slow inference speeds inherent to diffusion models, many
existing molecular diffusion models rely on coarse-grained representations of
protein structure to make training and inference feasible. However, such
coarse-grained representations discard essential information for modeling
molecular interactions and impair the quality of generated structures. In this
work, we present a novel GNN-based architecture for learning latent
representations of molecular structure. When trained end-to-end with a
diffusion model for de novo ligand design, our model achieves comparable
performance to one with an all-atom protein representation while exhibiting a
3-fold reduction in inference time.Comment: This paper appeared as a spotlight paper at the NeurIPS 2023
Generative AI and Biology Worksho
Computing free energies with PyBrella
Calculations of the rates of disassociation between small molecules and proteins have numerous applications, including assisting rapid discovery and testing of novel drugs. Free energy calculations consider the enthalpy and entropy of the full protein-ligand-water system and so have the potential to be more accurate than faster, single-point calculations. In this study, methods are explored to predict the binding affinity of various molecules to proteins by molecular dynamics and umbrella sampling. An attempt was made to determine the potential of mean force (PMF) for the molecule, which was compared to its known binding capability. Factors including simulation resources, amounts of sampling, force strength parameters, and correlation between predicted energy and actual rate constants were considered in order to evaluate the umbrella sampling methods. Limitations in the simulation environment, such as the scaling of the PMF for sampling, biases in the SMD trajectory, and variations between ligands, were also investigated in the hope of creating a more comprehensive approach for predicting the target-molecule interaction
A pyrazolopyran derivative preferentially inhibits the activity of human cytosolic hydroxymethyltransferase and induces cell death in lung cancer cells
Serine hydroxymethyltransferase (SHMT) is a central enzyme in the metabolic reprogramming of cancer cells, providing activated one-carbon units in the serine-glycine one-carbon metabolism. Previous studies demonstrated that the cytoplasmic isoform of SHMT (SHMT1) plays a relevant role in lung cancer. SHMT1 is overexpressed in lung cancer patients and NSCLC cell lines. Moreover, SHMT1 is required to maintain DNA integrity. Depletion in lung cancer cell lines causes cell cycle arrest and uracil accumulation and ultimately leads to apoptosis. We found that a pyrazolopyran compound, namely 2.12, preferentially inhibits SHMT1 compared to the mitochondrial counterpart SHMT2. Computational and crystallographic approaches suggest binding at the active site of SHMT1 and a competitive inhibition mechanism. A radio isotopic activity assay shows that inhibition of SHMT by 2.12 also occurs in living cells. Moreover, administration of 2.12 in A549 and H1299 lung cancer cell lines causes apoptosis at LD50 34 μM and rescue experiments underlined selectivity towards SHMT1. These data not only further highlight the relevance of the cytoplasmic isoform SHMT1 in lung cancer but, more importantly, demonstrate that, at least in vitro, it is possible to find selective inhibitors against one specific isoform of SHMT, a key target in metabolic reprogramming of many cancer types
Improvements to the APBS biomolecular solvation software suite
The Adaptive Poisson-Boltzmann Solver (APBS) software was developed to solve
the equations of continuum electrostatics for large biomolecular assemblages
that has provided impact in the study of a broad range of chemical, biological,
and biomedical applications. APBS addresses three key technology challenges for
understanding solvation and electrostatics in biomedical applications: accurate
and efficient models for biomolecular solvation and electrostatics, robust and
scalable software for applying those theories to biomolecular systems, and
mechanisms for sharing and analyzing biomolecular electrostatics data in the
scientific community. To address new research applications and advancing
computational capabilities, we have continually updated APBS and its suite of
accompanying software since its release in 2001. In this manuscript, we discuss
the models and capabilities that have recently been implemented within the APBS
software package including: a Poisson-Boltzmann analytical and a
semi-analytical solver, an optimized boundary element solver, a geometry-based
geometric flow solvation model, a graph theory based algorithm for determining
p values, and an improved web-based visualization tool for viewing
electrostatics
Hidden bias in the DUD-E dataset leads to misleading performance of deep learning in structure-based virtual screening
Recently much effort has been invested in using convolutional neural network (CNN) models trained on 3D structural images of protein-ligand complexes to distinguish binding from non-binding ligands for virtual screening. However, the dearth of reliable protein-ligand x-ray structures and binding affinity data has required the use of constructed datasets for the training and evaluation of CNN molecular recognition models. Here, we outline various sources of bias in one such widely-used dataset, the Directory of Useful Decoys: Enhanced (DUDE). We have constructed and performed tests to investigate whether CNN models developed using DUD-E are properly learning the underlying physics of molecular recognition, as intended, or are instead learning biases inherent in the dataset itself. We find that superior enrichment efficiency in CNN models can be attributed to the analogue and decoy bias hidden in the DUD-E dataset rather than successful generalization of the pattern of proteinligand interactions. Comparing additional deep learning models trained on PDBbind datasets, we found that their enrichment performances using DUD-E are not superior to the performance of the docking program AutoDock Vina. Together, these results suggest that biases that could be present in constructed datasets should be thoroughly evaluated before applying them to machine learning based methodology development
- …