4,240 research outputs found
Latent Molecular Optimization for Targeted Therapeutic Design
We devise an approach for targeted molecular design, a problem of interest in
computational drug discovery: given a target protein site, we wish to generate
a chemical with both high binding affinity to the target and satisfactory
pharmacological properties. This problem is made difficult by the enormity and
discreteness of the space of potential therapeutics, as well as the
graph-structured nature of biomolecular surface sites. Using a dataset of
protein-ligand complexes, we surmount these issues by extracting a signature of
the target site with a graph convolutional network and by encoding the discrete
chemical into a continuous latent vector space. The latter embedding permits
gradient-based optimization in molecular space, which we perform using learned
differentiable models of binding affinity and other pharmacological properties.
We show that our approach is able to efficiently optimize these multiple
objectives and discover new molecules with potentially useful binding
properties, validated via docking methods
Interpretable Deep Learning in Drug Discovery
Without any means of interpretation, neural networks that predict molecular
properties and bioactivities are merely black boxes. We will unravel these
black boxes and will demonstrate approaches to understand the learned
representations which are hidden inside these models. We show how single
neurons can be interpreted as classifiers which determine the presence or
absence of pharmacophore- or toxicophore-like structures, thereby generating
new insights and relevant knowledge for chemistry, pharmacology and
biochemistry. We further discuss how these novel pharmacophores/toxicophores
can be determined from the network by identifying the most relevant components
of a compound for the prediction of the network. Additionally, we propose a
method which can be used to extract new pharmacophores from a model and will
show that these extracted structures are consistent with literature findings.
We envision that having access to such interpretable knowledge is a crucial aid
in the development and design of new pharmaceutically active molecules, and
helps to investigate and understand failures and successes of current methods.Comment: Code available at
https://github.com/bioinf-jku/interpretable_ml_drug_discover
Modeling polypharmacy side effects with graph convolutional networks
The use of drug combinations, termed polypharmacy, is common to treat
patients with complex diseases and co-existing conditions. However, a major
consequence of polypharmacy is a much higher risk of adverse side effects for
the patient. Polypharmacy side effects emerge because of drug-drug
interactions, in which activity of one drug may change if taken with another
drug. The knowledge of drug interactions is limited because these complex
relationships are rare, and are usually not observed in relatively small
clinical testing. Discovering polypharmacy side effects thus remains an
important challenge with significant implications for patient mortality. Here,
we present Decagon, an approach for modeling polypharmacy side effects. The
approach constructs a multimodal graph of protein-protein interactions,
drug-protein target interactions, and the polypharmacy side effects, which are
represented as drug-drug interactions, where each side effect is an edge of a
different type. Decagon is developed specifically to handle such multimodal
graphs with a large number of edge types. Our approach develops a new graph
convolutional neural network for multirelational link prediction in multimodal
networks. Decagon predicts the exact side effect, if any, through which a given
drug combination manifests clinically. Decagon accurately predicts polypharmacy
side effects, outperforming baselines by up to 69%. We find that it
automatically learns representations of side effects indicative of
co-occurrence of polypharmacy in patients. Furthermore, Decagon models
particularly well side effects with a strong molecular basis, while on
predominantly non-molecular side effects, it achieves good performance because
of effective sharing of model parameters across edge types. Decagon creates
opportunities to use large pharmacogenomic and patient data to flag and
prioritize side effects for follow-up analysis.Comment: Presented at ISMB 201
Atomic Convolutional Networks for Predicting Protein-Ligand Binding Affinity
Empirical scoring functions based on either molecular force fields or
cheminformatics descriptors are widely used, in conjunction with molecular
docking, during the early stages of drug discovery to predict potency and
binding affinity of a drug-like molecule to a given target. These models
require expert-level knowledge of physical chemistry and biology to be encoded
as hand-tuned parameters or features rather than allowing the underlying model
to select features in a data-driven procedure. Here, we develop a general
3-dimensional spatial convolution operation for learning atomic-level chemical
interactions directly from atomic coordinates and demonstrate its application
to structure-based bioactivity prediction. The atomic convolutional neural
network is trained to predict the experimentally determined binding affinity of
a protein-ligand complex by direct calculation of the energy associated with
the complex, protein, and ligand given the crystal structure of the binding
pose. Non-covalent interactions present in the complex that are absent in the
protein-ligand sub-structures are identified and the model learns the
interaction strength associated with these features. We test our model by
predicting the binding free energy of a subset of protein-ligand complexes
found in the PDBBind dataset and compare with state-of-the-art cheminformatics
and machine learning-based approaches. We find that all methods achieve
experimental accuracy and that atomic convolutional networks either outperform
or perform competitively with the cheminformatics based methods. Unlike all
previous protein-ligand prediction systems, atomic convolutional networks are
end-to-end and fully-differentiable. They represent a new data-driven,
physics-based deep learning model paradigm that offers a strong foundation for
future improvements in structure-based bioactivity prediction
Synergy Effect between Convolutional Neural Networks and the Multiplicity of SMILES for Improvement of Molecular Prediction
In our study, we demonstrate the synergy effect between convolutional neural
networks and the multiplicity of SMILES. The model we propose, the so-called
Convolutional Neural Fingerprint (CNF) model, reaches the accuracy of
traditional descriptors such as Dragon (Mauri et al. [22]), RDKit (Landrum
[18]), CDK2 (Willighagen et al. [43]) and PyDescriptor (Masand and Rastija
[20]). Moreover the CNF model generally performs better than highly fine-tuned
traditional descriptors, especially on small data sets, which is of great
interest for the chemical field where data sets are generally small due to
experimental costs, the availability of molecules or accessibility to private
databases. We evaluate the CNF model along with SMILES augmentation during both
training and testing. To the best of our knowledge, this is the first time that
such a methodology is presented. We show that using the multiplicity of SMILES
during training acts as a regulariser and therefore avoids overfitting and can
be seen as ensemble learning when considered for testing.Comment: 18 pages, 7 figures, 4 table
Three-Dimensionally Embedded Graph Convolutional Network (3DGCN) for Molecule Interpretation
We present a three-dimensional graph convolutional network (3DGCN), which
predicts molecular properties and biochemical activities, based on 3D molecular
graph. In the 3DGCN, graph convolution is unified with learning operations on
the vector to handle the spatial information from molecular topology. The 3DGCN
model exhibits significantly higher performance on various tasks compared with
other deep-learning models, and has the ability of generalizing a given
conformer to targeted features regardless of its rotations in the 3D space.
More significantly, our model also can distinguish the 3D rotations of a
molecule and predict the target value, depending upon the rotation degree, in
the protein-ligand docking problem, when trained with orientation-dependent
datasets. The rotation distinguishability of 3DGCN, along with rotation
equivariance, provides a key milestone in the implementation of
three-dimensionality to the field of deep-learning chemistry that solves
challenging biochemical problems.Comment: 39 pages, 14 figures, 5 table
Uncertainty quantification of molecular property prediction using Bayesian neural network models
In chemistry, deep neural network models have been increasingly utilized in a
variety of applications such as molecular property predictions, novel molecule
designs, and planning chemical reactions. Despite the rapid increase in the use
of state-of-the-art models and algorithms, deep neural network models often
produce poor predictions in real applications because model performance is
highly dependent on the quality of training data. In the field of molecular
analysis, data are mostly obtained from either complicated chemical experiments
or approximate mathematical equations, and then quality of data may be
questioned.In this paper, we quantify uncertainties of prediction using
Bayesian neural networks in molecular property predictions. We estimate both
model-driven and data-driven uncertainties, demonstrating the usefulness of
uncertainty quantification as both a quality checker and a confidence indicator
with the three experiments. Our results manifest that uncertainty
quantification is necessary for more reliable molecular applications and
Bayesian neural network models can be a practical approach.Comment: Workshop on "Machine Learning for Molecules and Materials", NIPS
2018. arXiv admin note: substantial text overlap with arXiv:1903.0837
Constrained Bayesian Optimization for Automatic Chemical Design
Automatic Chemical Design is a framework for generating novel molecules with
optimized properties. The original scheme, featuring Bayesian optimization over
the latent space of a variational autoencoder, suffers from the pathology that
it tends to produce invalid molecular structures. First, we demonstrate
empirically that this pathology arises when the Bayesian optimization scheme
queries latent points far away from the data on which the variational
autoencoder has been trained. Secondly, by reformulating the search procedure
as a constrained Bayesian optimization problem, we show that the effects of
this pathology can be mitigated, yielding marked improvements in the validity
of the generated molecules. We posit that constrained Bayesian optimization is
a good approach for solving this class of training set mismatch in many
generative tasks involving Bayesian optimization over the latent space of a
variational autoencoder.Comment: Previous versions accepted to the NIPS 2017 Workshop on Bayesian
Optimization (BayesOpt 2017) and the NIPS 2017 Workshop on Machine Learning
for Molecules and Material
PADME: A Deep Learning-based Framework for Drug-Target Interaction Prediction
In silico drug-target interaction (DTI) prediction is an important and
challenging problem in biomedical research with a huge potential benefit to the
pharmaceutical industry and patients. Most existing methods for DTI prediction
including deep learning models generally have binary endpoints, which could be
an oversimplification of the problem, and those methods are typically unable to
handle cold-target problems, i.e., problems involving target protein that never
appeared in the training set. Towards this, we contrived PADME (Protein And
Drug Molecule interaction prEdiction), a framework based on Deep Neural
Networks, to predict real-valued interaction strength between compounds and
proteins without requiring feature engineering. PADME takes both compound and
protein information as inputs, so it is capable of solving cold-target (and
cold-drug) problems. To our knowledge, we are the first to combine Molecular
Graph Convolution (MGC) for compound featurization with protein descriptors for
DTI prediction. We used multiple cross-validation split schemes and evaluation
metrics to measure the performance of PADME on multiple datasets, including the
ToxCast dataset, and PADME consistently dominates baseline methods. The results
of a case study, which predicts the binding affinity between various compounds
and androgen receptor (AR), suggest PADME's potential in drug development. The
scalability of PADME is another advantage in the age of Big Data
Drug-Drug Adverse Effect Prediction with Graph Co-Attention
Complex or co-existing diseases are commonly treated using drug combinations,
which can lead to higher risk of adverse side effects. The detection of
polypharmacy side effects is usually done in Phase IV clinical trials, but
there are still plenty which remain undiscovered when the drugs are put on the
market. Such accidents have been affecting an increasing proportion of the
population (15% in the US now) and it is thus of high interest to be able to
predict the potential side effects as early as possible. Systematic
combinatorial screening of possible drug-drug interactions (DDI) is challenging
and expensive. However, the recent significant increases in data availability
from pharmaceutical research and development efforts offer a novel paradigm for
recovering relevant insights for DDI prediction. Accordingly, several recent
approaches focus on curating massive DDI datasets (with millions of examples)
and training machine learning models on them. Here we propose a neural network
architecture able to set state-of-the-art results on this task---using the type
of the side-effect and the molecular structure of the drugs alone---by
leveraging a co-attentional mechanism. In particular, we show the importance of
integrating joint information from the drug pairs early on when learning each
drug's representation.Comment: 8 pages, 5 figure
- …