634 research outputs found
Atomic Convolutional Networks for Predicting Protein-Ligand Binding Affinity
Empirical scoring functions based on either molecular force fields or
cheminformatics descriptors are widely used, in conjunction with molecular
docking, during the early stages of drug discovery to predict potency and
binding affinity of a drug-like molecule to a given target. These models
require expert-level knowledge of physical chemistry and biology to be encoded
as hand-tuned parameters or features rather than allowing the underlying model
to select features in a data-driven procedure. Here, we develop a general
3-dimensional spatial convolution operation for learning atomic-level chemical
interactions directly from atomic coordinates and demonstrate its application
to structure-based bioactivity prediction. The atomic convolutional neural
network is trained to predict the experimentally determined binding affinity of
a protein-ligand complex by direct calculation of the energy associated with
the complex, protein, and ligand given the crystal structure of the binding
pose. Non-covalent interactions present in the complex that are absent in the
protein-ligand sub-structures are identified and the model learns the
interaction strength associated with these features. We test our model by
predicting the binding free energy of a subset of protein-ligand complexes
found in the PDBBind dataset and compare with state-of-the-art cheminformatics
and machine learning-based approaches. We find that all methods achieve
experimental accuracy and that atomic convolutional networks either outperform
or perform competitively with the cheminformatics based methods. Unlike all
previous protein-ligand prediction systems, atomic convolutional networks are
end-to-end and fully-differentiable. They represent a new data-driven,
physics-based deep learning model paradigm that offers a strong foundation for
future improvements in structure-based bioactivity prediction
3D Deep Learning for Biological Function Prediction from Physical Fields
Predicting the biological function of molecules, be it proteins or drug-like
compounds, from their atomic structure is an important and long-standing
problem. Function is dictated by structure, since it is by spatial interactions
that molecules interact with each other, both in terms of steric
complementarity, as well as intermolecular forces. Thus, the electron density
field and electrostatic potential field of a molecule contain the "raw
fingerprint" of how this molecule can fit to binding partners. In this paper,
we show that deep learning can predict biological function of molecules
directly from their raw 3D approximated electron density and electrostatic
potential fields. Protein function based on EC numbers is predicted from the
approximated electron density field. In another experiment, the activity of
small molecules is predicted with quality comparable to state-of-the-art
descriptor-based methods. We propose several alternative computational models
for the GPU with different memory and runtime requirements for different sizes
of molecules and of databases. We also propose application-specific
multi-channel data representations. With future improvements of training
datasets and neural network settings in combination with complementary
information sources (sequence, genomic context, expression level), deep
learning can be expected to show its generalization power and revolutionize the
field of molecular function prediction
DeepDTA: Deep Drug-Target Binding Affinity Prediction
The identification of novel drug-target (DT) interactions is a substantial
part of the drug discovery process. Most of the computational methods that have
been proposed to predict DT interactions have focused on binary classification,
where the goal is to determine whether a DT pair interacts or not. However,
protein-ligand interactions assume a continuum of binding strength values, also
called binding affinity and predicting this value still remains a challenge.
The increase in the affinity data available in DT knowledge-bases allows the
use of advanced learning techniques such as deep learning architectures in the
prediction of binding affinities. In this study, we propose a deep-learning
based model that uses only sequence information of both targets and drugs to
predict DT interaction binding affinities. The few studies that focus on DT
binding affinity prediction use either 3D structures of protein-ligand
complexes or 2D features of compounds. One novel approach used in this work is
the modeling of protein sequences and compound 1D representations with
convolutional neural networks (CNNs). The results show that the proposed deep
learning based model that uses the 1D representations of targets and drugs is
an effective approach for drug target binding affinity prediction. The model in
which high-level representations of a drug and a target are constructed via
CNNs achieved the best Concordance Index (CI) performance in one of our larger
benchmark data sets, outperforming the KronRLS algorithm and SimBoost, a
state-of-the-art method for DT binding affinity prediction.Comment: extended versio
Capsule Networks for Protein Structure Classification and Prediction
Capsule Networks have great potential to tackle problems in structural
biology because of their attention to hierarchical relationships. This paper
describes the implementation and application of a Capsule Network architecture
to the classification of RAS protein family structures on GPU-based
computational resources. The proposed Capsule Network trained on 2D and 3D
structural encodings can successfully classify HRAS and KRAS structures. The
Capsule Network can also classify a protein-based dataset derived from a
PSI-BLAST search on sequences of KRAS and HRAS mutations. Our results show an
accuracy improvement compared to traditional convolutional networks, while
improving interpretability through visualization of activation vectors
Automated discovery of GPCR bioactive ligands
While G-protein coupled receptors (GPCRs) constitute the largest class of
membrane proteins, structures and endogenous ligands of a large portion of
GPCRs remain unknown. Due to the involvement of GPCRs in various signaling
pathways and physiological roles, the identification of endogenous ligands as
well as designing novel drugs is of high interest to the research and medical
communities. Along with highlighting the recent advances in structure-based
ligand discovery, including docking and molecular dynamics, this article
focuses on the latest advances for automating the discovery of bioactive
ligands using machine learning. Machine learning is centered around the
development and applications of algorithms that can learn from data
automatically. Such an approach offers immense opportunities for bioactivity
prediction as well as quantitative structure-activity relationship studies.
This review describes the most recent and successful applications of machine
learning for bioactive ligand discovery, concluding with an outlook on deep
learning methods that are capable of automatically extracting salient
information from structural data as a promising future direction for rapid and
efficient bioactive ligand discovery
Ligand Pose Optimization with Atomic Grid-Based Convolutional Neural Networks
Docking is an important tool in computational drug discovery that aims to
predict the binding pose of a ligand to a target protein through a combination
of pose scoring and optimization. A scoring function that is differentiable
with respect to atom positions can be used for both scoring and gradient-based
optimization of poses for docking. Using a differentiable grid-based atomic
representation as input, we demonstrate that a scoring function learned by
training a convolutional neural network (CNN) to identify binding poses can
also be applied to pose optimization. We also show that an iteratively-trained
CNN that includes poses optimized by the first CNN in its training set performs
even better at optimizing randomly initialized poses than either the first CNN
scoring function or AutoDock Vina.Comment: 10 page
WideDTA: prediction of drug-target binding affinity
Motivation: Prediction of the interaction affinity between proteins and
compounds is a major challenge in the drug discovery process. WideDTA is a
deep-learning based prediction model that employs chemical and biological
textual sequence information to predict binding affinity.
Results: WideDTA uses four text-based information sources, namely the protein
sequence, ligand SMILES, protein domains and motifs, and maximum common
substructure words to predict binding affinity. WideDTA outperformed one of the
state of the art deep learning methods for drug-target binding affinity
prediction, DeepDTA on the KIBA dataset with a statistical significance. This
indicates that the word-based sequence representation adapted by WideDTA is a
promising alternative to the character-based sequence representation approach
in deep learning models for binding affinity prediction, such as the one used
in DeepDTA. In addition, the results showed that, given the protein sequence
and ligand SMILES, the inclusion of protein domain and motif information as
well as ligand maximum common substructure words do not provide additional
useful information for the deep learning model. Interestingly, however, using
only domain and motif information to represent proteins achieved similar
performance to using the full protein sequence, suggesting that important
binding relevant information is contained within the protein motifs and
domains
PADME: A Deep Learning-based Framework for Drug-Target Interaction Prediction
In silico drug-target interaction (DTI) prediction is an important and
challenging problem in biomedical research with a huge potential benefit to the
pharmaceutical industry and patients. Most existing methods for DTI prediction
including deep learning models generally have binary endpoints, which could be
an oversimplification of the problem, and those methods are typically unable to
handle cold-target problems, i.e., problems involving target protein that never
appeared in the training set. Towards this, we contrived PADME (Protein And
Drug Molecule interaction prEdiction), a framework based on Deep Neural
Networks, to predict real-valued interaction strength between compounds and
proteins without requiring feature engineering. PADME takes both compound and
protein information as inputs, so it is capable of solving cold-target (and
cold-drug) problems. To our knowledge, we are the first to combine Molecular
Graph Convolution (MGC) for compound featurization with protein descriptors for
DTI prediction. We used multiple cross-validation split schemes and evaluation
metrics to measure the performance of PADME on multiple datasets, including the
ToxCast dataset, and PADME consistently dominates baseline methods. The results
of a case study, which predicts the binding affinity between various compounds
and androgen receptor (AR), suggest PADME's potential in drug development. The
scalability of PADME is another advantage in the age of Big Data
SMILES2Vec: An Interpretable General-Purpose Deep Neural Network for Predicting Chemical Properties
Chemical databases store information in text representations, and the SMILES
format is a universal standard used in many cheminformatics software. Encoded
in each SMILES string is structural information that can be used to predict
complex chemical properties. In this work, we develop SMILES2vec, a deep RNN
that automatically learns features from SMILES to predict chemical properties,
without the need for additional explicit feature engineering. Using Bayesian
optimization methods to tune the network architecture, we show that an
optimized SMILES2vec model can serve as a general-purpose neural network for
predicting distinct chemical properties including toxicity, activity,
solubility and solvation energy, while also outperforming contemporary MLP
neural networks that uses engineered features. Furthermore, we demonstrate
proof-of-concept of interpretability by developing an explanation mask that
localizes on the most important characters used in making a prediction. When
tested on the solubility dataset, it identified specific parts of a chemical
that is consistent with established first-principles knowledge with an accuracy
of 88%. Our work demonstrates that neural networks can learn technically
accurate chemical concept and provide state-of-the-art accuracy, making
interpretable deep neural networks a useful tool of relevance to the chemical
industry.Comment: Submitted to SIGKDD 201
Using Rule-Based Labels for Weak Supervised Learning: A ChemNet for Transferable Chemical Property Prediction
With access to large datasets, deep neural networks (DNN) have achieved
human-level accuracy in image and speech recognition tasks. However, in
chemistry, data is inherently small and fragmented. In this work, we develop an
approach of using rule-based knowledge for training ChemNet, a transferable and
generalizable deep neural network for chemical property prediction that learns
in a weak-supervised manner from large unlabeled chemical databases. When
coupled with transfer learning approaches to predict other smaller datasets for
chemical properties that it was not originally trained on, we show that
ChemNet's accuracy outperforms contemporary DNN models that were trained using
conventional supervised learning. Furthermore, we demonstrate that the ChemNet
pre-training approach is equally effective on both CNN (Chemception) and RNN
(SMILES2vec) models, indicating that this approach is network architecture
agnostic and is effective across multiple data modalities. Our results indicate
a pre-trained ChemNet that incorporates chemistry domain knowledge, enables the
development of generalizable neural networks for more accurate prediction of
novel chemical properties.Comment: Submitted to SIGKDD 201
- …