2,188 research outputs found
A Systematic Survey in Geometric Deep Learning for Structure-based Drug Design
Structure-based drug design (SBDD), which utilizes the three-dimensional
geometry of proteins to identify potential drug candidates, is becoming
increasingly vital in drug discovery. However, traditional methods based on
physiochemical modeling and experts' domain knowledge are time-consuming and
laborious. The recent advancements in geometric deep learning, which integrates
and processes 3D geometric data, coupled with the availability of accurate
protein 3D structure predictions from tools like AlphaFold, have significantly
propelled progress in structure-based drug design. In this paper, we
systematically review the recent progress of geometric deep learning for
structure-based drug design. We start with a brief discussion of the mainstream
tasks in structure-based drug design, commonly used 3D protein representations
and representative predictive/generative models. Then we delve into detailed
reviews for each task (binding site prediction, binding pose generation,
\emph{de novo} molecule generation, linker design, and binding affinity
prediction), including the problem setup, representative methods, datasets, and
evaluation metrics. Finally, we conclude this survey with the current
challenges and highlight potential opportunities of geometric deep learning for
structure-based drug design.Comment: 14 page
TopologyNet: Topology based deep convolutional neural networks for biomolecular property predictions
Although deep learning approaches have had tremendous success in image, video
and audio processing, computer vision, and speech recognition, their
applications to three-dimensional (3D) biomolecular structural data sets have
been hindered by the entangled geometric complexity and biological complexity.
We introduce topology, i.e., element specific persistent homology (ESPH), to
untangle geometric complexity and biological complexity. ESPH represents 3D
complex geometry by one-dimensional (1D) topological invariants and retains
crucial biological information via a multichannel image representation. It is
able to reveal hidden structure-function relationships in biomolecules. We
further integrate ESPH and convolutional neural networks to construct a
multichannel topological neural network (TopologyNet) for the predictions of
protein-ligand binding affinities and protein stability changes upon mutation.
To overcome the limitations to deep learning arising from small and noisy
training sets, we present a multitask topological convolutional neural network
(MT-TCNN). We demonstrate that the present TopologyNet architectures outperform
other state-of-the-art methods in the predictions of protein-ligand binding
affinities, globular protein mutation impacts, and membrane protein mutation
impacts.Comment: 20 pages, 8 figures, 5 table
Machine learning-guided directed evolution for protein engineering
Machine learning (ML)-guided directed evolution is a new paradigm for
biological design that enables optimization of complex functions. ML methods
use data to predict how sequence maps to function without requiring a detailed
model of the underlying physics or biological pathways. To demonstrate
ML-guided directed evolution, we introduce the steps required to build ML
sequence-function models and use them to guide engineering, making
recommendations at each stage. This review covers basic concepts relevant to
using ML for protein engineering as well as the current literature and
applications of this new engineering paradigm. ML methods accelerate directed
evolution by learning from information contained in all measured variants and
using that information to select sequences that are likely to be improved. We
then provide two case studies that demonstrate the ML-guided directed evolution
process. We also look to future opportunities where ML will enable discovery of
new protein functions and uncover the relationship between protein sequence and
function.Comment: Made significant revisions to focus on aspects most relevant to
applying machine learning to speed up directed evolutio
Improved prediction of ligand-protein binding affinities by meta-modeling
The accurate screening of candidate drug ligands against target proteins
through computational approaches is of prime interest to drug development
efforts, as filtering potential candidates would save time and expenses for
finding drugs. Such virtual screening depends in part on methods to predict the
binding affinity between ligands and proteins. Given many computational models
for binding affinity prediction with varying results across targets, we herein
develop a meta-modeling framework by integrating published empirical
structure-based docking and sequence-based deep learning models. In building
this framework, we evaluate many combinations of individual models, training
databases, and linear and nonlinear meta-modeling approaches. We show that many
of our meta-models significantly improve affinity predictions over individual
base models. Our best meta-models achieve comparable performance to
state-of-the-art exclusively structure-based deep learning tools. Overall, we
demonstrate that diverse modeling approaches can be ensembled together to gain
substantial improvement in binding affinity prediction while allowing control
over input features such as physicochemical properties or molecular
descriptors.Comment: 61 pages, 3 main tables, 6 main figures, 6 supplementary figures, and
supporting information. For 8 supplementary tables and code, see
https://github.com/Lee1701/Lee2023
DrugCLIP: Contrastive Protein-Molecule Representation Learning for Virtual Screening
Virtual screening, which identifies potential drugs from vast compound
databases to bind with a particular protein pocket, is a critical step in
AI-assisted drug discovery. Traditional docking methods are highly
time-consuming, and can only work with a restricted search library in real-life
applications. Recent supervised learning approaches using scoring functions for
binding-affinity prediction, although promising, have not yet surpassed docking
methods due to their strong dependency on limited data with reliable
binding-affinity labels. In this paper, we propose a novel contrastive learning
framework, DrugCLIP, by reformulating virtual screening as a dense retrieval
task and employing contrastive learning to align representations of binding
protein pockets and molecules from a large quantity of pairwise data without
explicit binding-affinity scores. We also introduce a biological-knowledge
inspired data augmentation strategy to learn better protein-molecule
representations. Extensive experiments show that DrugCLIP significantly
outperforms traditional docking and supervised learning methods on diverse
virtual screening benchmarks with highly reduced computation time, especially
in zero-shot setting
HAC-Net: A Hybrid Attention-Based Convolutional Neural Network for Highly Accurate Protein-Ligand Binding Affinity Prediction
Applying deep learning concepts from image detection and graph theory has
greatly advanced protein-ligand binding affinity prediction, a challenge with
enormous ramifications for both drug discovery and protein engineering. We
build upon these advances by designing a novel deep learning architecture
consisting of a 3-dimensional convolutional neural network utilizing
channel-wise attention and two graph convolutional networks utilizing
attention-based aggregation of node features. HAC-Net (Hybrid Attention-Based
Convolutional Neural Network) obtains state-of-the-art results on the PDBbind
v.2016 core set, the most widely recognized benchmark in the field. We
extensively assess the generalizability of our model using multiple train-test
splits, each of which maximizes differences between either protein structures,
protein sequences, or ligand extended-connectivity fingerprints of complexes in
the training and test sets. Furthermore, we perform 10-fold cross-validation
with a similarity cutoff between SMILES strings of ligands in the training and
test sets, and also evaluate the performance of HAC-Net on lower-quality data.
We envision that this model can be extended to a broad range of supervised
learning problems related to structure-based biomolecular property prediction.
All of our software is available as open source at
https://github.com/gregory-kyro/HAC-Net/, and the HACNet Python package is
available through PyPI
Drug Target Interaction Prediction Using Machine Learning Techniques – A Review
Drug discovery is a key process, given the rising and ubiquitous demand for medication to stay in good shape right through the course of one’s life. Drugs are small molecules that inhibit or activate the function of a protein, offering patients a host of therapeutic benefits. Drug design is the inventive process of finding new medication, based on targets or proteins. Identifying new drugs is a process that involves time and money. This is where computer-aided drug design helps cut time and costs. Drug design needs drug targets that are a protein and a drug compound, with which the interaction between a drug and a target is established. Interaction, in this context, refers to the process of discovering protein binding sites, which are protein pockets that bind with drugs. Pockets are regions on a protein macromolecule that bind to drug molecules. Researchers have been at work trying to determine new Drug Target Interactions (DTI) that predict whether or not a given drug molecule will bind to a target. Machine learning (ML) techniques help establish the interaction between drugs and their targets, using computer-aided drug design. This paper aims to explore ML techniques better for DTI prediction and boost future research. Qualitative and quantitative analyses of ML techniques show that several have been applied to predict DTIs, employing a range of classifiers. Though DTI prediction improves with negative drug target pairs (DTP), the lack of true negative DTPs has led to the use a particular dataset of drugs and targets. Using dynamic DTPs improves DTI prediction. Little attention has so far been paid to developing a new classifier for DTI classification, and there is, unquestionably, a need for better ones
- …