3 research outputs found
From Static to Dynamic Structures: Improving Binding Affinity Prediction with a Graph-Based Deep Learning Model
Accurate prediction of the protein-ligand binding affinities is an essential
challenge in the structure-based drug design. Despite recent advance in
data-driven methods in affinity prediction, their accuracy is still limited,
partially because they only take advantage of static crystal structures while
the actual binding affinities are generally depicted by the thermodynamic
ensembles between proteins and ligands. One effective way to approximate such a
thermodynamic ensemble is to use molecular dynamics (MD) simulation. Here, we
curated an MD dataset containing 3,218 different protein-ligand complexes, and
further developed Dynaformer, which is a graph-based deep learning model.
Dynaformer was able to accurately predict the binding affinities by learning
the geometric characteristics of the protein-ligand interactions from the MD
trajectories. In silico experiments demonstrated that our model exhibits
state-of-the-art scoring and ranking power on the CASF-2016 benchmark dataset,
outperforming the methods hitherto reported. Moreover, we performed a virtual
screening on the heat shock protein 90 (HSP90) using Dynaformer that identified
20 candidates and further experimentally validated their binding affinities. We
demonstrated that our approach is more efficient, which can identify 12 hit
compounds (two were in the submicromolar range), including several newly
discovered scaffolds. We anticipate this new synergy between large-scale MD
datasets and deep learning models will provide a new route toward accelerating
the early drug discovery process.Comment: totally reorganize the texts and figure
Towards Predicting Equilibrium Distributions for Molecular Systems with Deep Learning
Advances in deep learning have greatly improved structure prediction of
molecules. However, many macroscopic observations that are important for
real-world applications are not functions of a single molecular structure, but
rather determined from the equilibrium distribution of structures. Traditional
methods for obtaining these distributions, such as molecular dynamics
simulation, are computationally expensive and often intractable. In this paper,
we introduce a novel deep learning framework, called Distributional Graphormer
(DiG), in an attempt to predict the equilibrium distribution of molecular
systems. Inspired by the annealing process in thermodynamics, DiG employs deep
neural networks to transform a simple distribution towards the equilibrium
distribution, conditioned on a descriptor of a molecular system, such as a
chemical graph or a protein sequence. This framework enables efficient
generation of diverse conformations and provides estimations of state
densities. We demonstrate the performance of DiG on several molecular tasks,
including protein conformation sampling, ligand structure sampling,
catalyst-adsorbate sampling, and property-guided structure generation. DiG
presents a significant advancement in methodology for statistically
understanding molecular systems, opening up new research opportunities in
molecular science.Comment: 80 pages, 11 figure
A knowledge-guided pre-training framework for improving molecular representation learning
Abstract Learning effective molecular feature representation to facilitate molecular property prediction is of great significance for drug discovery. Recently, there has been a surge of interest in pre-training graph neural networks (GNNs) via self-supervised learning techniques to overcome the challenge of data scarcity in molecular property prediction. However, current self-supervised learning-based methods suffer from two main obstacles: the lack of a well-defined self-supervised learning strategy and the limited capacity of GNNs. Here, we propose Knowledge-guided Pre-training of Graph Transformer (KPGT), a self-supervised learning framework to alleviate the aforementioned issues and provide generalizable and robust molecular representations. The KPGT framework integrates a graph transformer specifically designed for molecular graphs and a knowledge-guided pre-training strategy, to fully capture both structural and semantic knowledge of molecules. Through extensive computational tests on 63 datasets, KPGT exhibits superior performance in predicting molecular properties across various domains. Moreover, the practical applicability of KPGT in drug discovery has been validated by identifying potential inhibitors of two antitumor targets: hematopoietic progenitor kinase 1 (HPK1) and fibroblast growth factor receptor 1 (FGFR1). Overall, KPGT can provide a powerful and useful tool for advancing the artificial intelligence (AI)-aided drug discovery process