1,391 research outputs found
Latent Molecular Optimization for Targeted Therapeutic Design
We devise an approach for targeted molecular design, a problem of interest in
computational drug discovery: given a target protein site, we wish to generate
a chemical with both high binding affinity to the target and satisfactory
pharmacological properties. This problem is made difficult by the enormity and
discreteness of the space of potential therapeutics, as well as the
graph-structured nature of biomolecular surface sites. Using a dataset of
protein-ligand complexes, we surmount these issues by extracting a signature of
the target site with a graph convolutional network and by encoding the discrete
chemical into a continuous latent vector space. The latter embedding permits
gradient-based optimization in molecular space, which we perform using learned
differentiable models of binding affinity and other pharmacological properties.
We show that our approach is able to efficiently optimize these multiple
objectives and discover new molecules with potentially useful binding
properties, validated via docking methods
Graph Convolutional Neural Networks for Polymers Property Prediction
A fast and accurate predictive tool for polymer properties is demanding and
will pave the way to iterative inverse design. In this work, we apply graph
convolutional neural networks (GCNN) to predict the dielectric constant and
energy bandgap of polymers. Using density functional theory (DFT) calculated
properties as the ground truth, GCNN can achieve remarkable agreement with DFT
results. Moreover, we show that GCNN outperforms other machine learning
algorithms. Our work proves that GCNN relies only on morphological data of
polymers and removes the requirement for complicated hand-crafted descriptors,
while still offering accuracy in fast predictions.Comment: Accepted for NIPS 2018 Workshop on Machine Learning for Molecules and
Material
Feature Assisted bi-directional LSTM Model for Protein-Protein Interaction Identification from Biomedical Texts
Knowledge about protein-protein interactions is essential in understanding
the biological processes such as metabolic pathways, DNA replication, and
transcription etc. However, a majority of the existing Protein-Protein
Interaction (PPI) systems are dependent primarily on the scientific literature,
which is yet not accessible as a structured database. Thus, efficient
information extraction systems are required for identifying PPI information
from the large collection of biomedical texts. Most of the existing systems
model the PPI extraction task as a classification problem and are tailored to
the handcrafted feature set including domain dependent features. In this paper,
we present a novel method based on deep bidirectional long short-term memory
(B-LSTM) technique that exploits word sequences and dependency path related
information to identify PPI information from text. This model leverages joint
modeling of proteins and relations in a single unified framework, which we name
as Shortest Dependency Path B-LSTM (sdpLSTM) model. We perform experiments on
two popular benchmark PPI datasets, namely AiMed & BioInfer. The evaluation
shows the F1-score values of 86.45% and 77.35% on AiMed and BioInfer,
respectively. Comparisons with the existing systems show that our proposed
approach attains state-of-the-art performance
Deep Learning Based Regression and Multi-class Models for Acute Oral Toxicity Prediction with Automatic Chemical Feature Extraction
For quantitative structure-property relationship (QSPR) studies in
chemoinformatics, it is important to get interpretable relationship between
chemical properties and chemical features. However, the predictive power and
interpretability of QSPR models are usually two different objectives that are
difficult to achieve simultaneously. A deep learning architecture using
molecular graph encoding convolutional neural networks (MGE-CNN) provided a
universal strategy to construct interpretable QSPR models with high predictive
power. Instead of using application-specific preset molecular descriptors or
fingerprints, the models can be resolved using raw and pertinent features
without manual intervention or selection. In this study, we developed acute
oral toxicity (AOT) models of compounds using the MGE-CNN architecture as a
case study. Three types of high-level predictive models: regression model
(deepAOT-R), multi-classification model (deepAOT-C) and multi-task model
(deepAOT-CR) for AOT evaluation were constructed. These models highly
outperformed previously reported models. For the two external datasets
containing 1673 (test set I) and 375 (test set II) compounds, the R2 and mean
absolute error (MAE) of deepAOT-R on the test set I were 0.864 and 0.195, and
the prediction accuracy of deepAOT-C was 95.5% and 96.3% on the test set I and
II, respectively. The two external prediction accuracy of deepAOT-CR is 95.0%
and 94.1%, while the R2 and MAE are 0.861 and 0.204 for test set I,
respectively.Comment: 36 pages, 4 figure
Simulating Execution Time of Tensor Programs using Graph Neural Networks
Optimizing the execution time of tensor program, e.g., a convolution,
involves finding its optimal configuration. Searching the configuration space
exhaustively is typically infeasible in practice. In line with recent research
using TVM, we propose to learn a surrogate model to overcome this issue. The
model is trained on an acyclic graph called an abstract syntax tree, and
utilizes a graph convolutional network to exploit structure in the graph. We
claim that a learnable graph-based data processing is a strong competitor to
heuristic-based feature extraction. We present a new dataset of graphs
corresponding to configurations and their execution time for various tensor
programs. We provide baselines for a runtime prediction task.Comment: All authors contributed equally. Accepted as a workshop paper at
Representation Learning on Graphs and Manifolds @ ICLR 2019. Fixed values in
Table
Molecular Inverse-Design Platform for Material Industries
The discovery of new materials has been the essential force which brings a
discontinuous improvement to industrial products' performance. However, the
extra-vast combinatorial design space of material structures exceeds human
experts' capability to explore all, thereby hampering material development. In
this paper, we present a material industry-oriented web platform of an
AI-driven molecular inverse-design system, which automatically designs brand
new molecular structures rapidly and diversely. Different from existing
inverse-design solutions, in this system, the combination of substructure-based
feature encoding and molecular graph generation algorithms allows a user to
gain high-speed, interpretable, and customizable design process. Also, a
hierarchical data structure and user-oriented UI provide a flexible and
intuitive workflow. The system is deployed on IBM's and our client's cloud
servers and has been used by 5 partner companies. To illustrate actual
industrial use cases, we exhibit inverse-design of sugar and dye molecules,
that were carried out by experimental chemists in those client companies.
Compared to general human chemist's standard performance, the molecular design
speed was accelerated more than 10 times, and greatly increased variety was
observed in the inverse-designed molecules without loss of chemical realism.Comment: 9 pages, 7 figures, Accepted to KDD 202
Machine learning and AI-based approaches for bioactive ligand discovery and GPCR-ligand recognition
In the last decade, machine learning and artificial intelligence applications
have received a significant boost in performance and attention in both academic
research and industry. The success behind most of the recent state-of-the-art
methods can be attributed to the latest developments in deep learning. When
applied to various scientific domains that are concerned with the processing of
non-tabular data, for example, image or text, deep learning has been shown to
outperform not only conventional machine learning but also highly specialized
tools developed by domain experts. This review aims to summarize AI-based
research for GPCR bioactive ligand discovery with a particular focus on the
most recent achievements and research trends. To make this article accessible
to a broad audience of computational scientists, we provide instructive
explanations of the underlying methodology, including overviews of the most
commonly used deep learning architectures and feature representations of
molecular data. We highlight the latest AI-based research that has led to the
successful discovery of GPCR bioactive ligands. However, an equal focus of this
review is on the discussion of machine learning-based technology that has been
applied to ligand discovery in general and has the potential to pave the way
for successful GPCR bioactive ligand discovery in the future. This review
concludes with a brief outlook highlighting the recent research trends in deep
learning, such as active learning and semi-supervised learning, which have
great potential for advancing bioactive ligand discovery.Comment: 2nd submission fixed the mis-formatted quotation characters (i.e.,
\^a
Generating equilibrium molecules with deep neural networks
Discovery of atomistic systems with desirable properties is a major challenge
in chemistry and material science. Here we introduce a novel, autoregressive,
convolutional deep neural network architecture that generates molecular
equilibrium structures by sequentially placing atoms in three-dimensional
space. The model estimates the joint probability over molecular configurations
with tractable conditional probabilities which only depend on distances between
atoms and their nuclear charges. It combines concepts from state-of-the-art
atomistic neural networks with auto-regressive generative models for images and
speech. We demonstrate that the architecture is capable of generating molecules
close to equilibrium for constitutional isomers of COH
Application of generative autoencoder in de novo molecular design
A major challenge in computational chemistry is the generation of novel
molecular structures with desirable pharmacological and physiochemical
properties. In this work, we investigate the potential use of autoencoder, a
deep learning methodology, for de novo molecular design. Various generative
autoencoders were used to map molecule structures into a continuous latent
space and vice versa and their performance as structure generator was assessed.
Our results show that the latent space preserves chemical similarity principle
and thus can be used for the generation of analogue structures. Furthermore,
the latent space created by autoencoders were searched systematically to
generate novel compounds with predicted activity against dopamine receptor type
2 and compounds similar to known active compounds not included in the training
set were identified
Leveraging binding-site structure for drug discovery with point-cloud methods
Computational drug discovery strategies can be broadly placed in two
categories: ligand-based methods which identify novel molecules by similarity
with known ligands, and structure-based methods which predict molecules with
high-affinity to a given 3D structure (e.g. a protein). However, ligand-based
methods do not leverage information about the binding site, and structure-based
approaches rely on the knowledge of a finite set of ligands binding the target.
In this work, we introduce TarLig, a novel approach that aims to bridge the gap
between ligand and structure-based approaches. We use the 3D structure of the
binding site as input to a model which predicts the ligand preferences of the
binding site. The resulting predictions could then offer promising seeds and
constraints in the chemical space search, based on the binding site structure.
TarLig outperforms standard models by introducing a data-alignment and
augmentation technique. The recent popularity of Volumetric 3DCNN pipelines in
structural bioinformatics suggests that this extra step could help a wide range
of methods to improve their results with minimal modifications
- …