Search CORE

1,391 research outputs found

Latent Molecular Optimization for Targeted Therapeutic Design

Author: Aumentado-Armstrong Tristan
Publication venue
Publication date: 05/09/2018
Field of study

We devise an approach for targeted molecular design, a problem of interest in computational drug discovery: given a target protein site, we wish to generate a chemical with both high binding affinity to the target and satisfactory pharmacological properties. This problem is made difficult by the enormity and discreteness of the space of potential therapeutics, as well as the graph-structured nature of biomolecular surface sites. Using a dataset of protein-ligand complexes, we surmount these issues by extracting a signature of the target site with a graph convolutional network and by encoding the discrete chemical into a continuous latent vector space. The latter embedding permits gradient-based optimization in molecular space, which we perform using learned differentiable models of binding affinity and other pharmacological properties. We show that our approach is able to efficiently optimize these multiple objectives and discover new molecules with potentially useful binding properties, validated via docking methods

arXiv.org e-Print Archive

Graph Convolutional Neural Networks for Polymers Property Prediction

Author: Chandrasekhar Vijay Ramaseshan
Hippalgaonkar Kedar
Kumar Jatin Nitin
Savitha Ramasamy
Zeng Minggang
Zeng Zeng
Publication venue
Publication date: 15/11/2018
Field of study

A fast and accurate predictive tool for polymer properties is demanding and will pave the way to iterative inverse design. In this work, we apply graph convolutional neural networks (GCNN) to predict the dielectric constant and energy bandgap of polymers. Using density functional theory (DFT) calculated properties as the ground truth, GCNN can achieve remarkable agreement with DFT results. Moreover, we show that GCNN outperforms other machine learning algorithms. Our work proves that GCNN relies only on morphological data of polymers and removes the requirement for complicated hand-crafted descriptors, while still offering accuracy in fast predictions.Comment: Accepted for NIPS 2018 Workshop on Machine Learning for Molecules and Material

arXiv.org e-Print Archive

Feature Assisted bi-directional LSTM Model for Protein-Protein Interaction Identification from Biomedical Texts

Author: Bhattacharyya Pushpak
Ekbal Asif
Kumar Ankit
Saha Sriparna
Yadav Shweta
Publication venue
Publication date: 05/07/2018
Field of study

Knowledge about protein-protein interactions is essential in understanding the biological processes such as metabolic pathways, DNA replication, and transcription etc. However, a majority of the existing Protein-Protein Interaction (PPI) systems are dependent primarily on the scientific literature, which is yet not accessible as a structured database. Thus, efficient information extraction systems are required for identifying PPI information from the large collection of biomedical texts. Most of the existing systems model the PPI extraction task as a classification problem and are tailored to the handcrafted feature set including domain dependent features. In this paper, we present a novel method based on deep bidirectional long short-term memory (B-LSTM) technique that exploits word sequences and dependency path related information to identify PPI information from text. This model leverages joint modeling of proteins and relations in a single unified framework, which we name as Shortest Dependency Path B-LSTM (sdpLSTM) model. We perform experiments on two popular benchmark PPI datasets, namely AiMed & BioInfer. The evaluation shows the F1-score values of 86.45% and 77.35% on AiMed and BioInfer, respectively. Comparisons with the existing systems show that our proposed approach attains state-of-the-art performance

arXiv.org e-Print Archive

Deep Learning Based Regression and Multi-class Models for Acute Oral Toxicity Prediction with Automatic Chemical Feature Extraction

Author: Lai Luhua
Pei Jianfeng
Xu Youjun
Publication venue
Publication date: 04/05/2017
Field of study

For quantitative structure-property relationship (QSPR) studies in chemoinformatics, it is important to get interpretable relationship between chemical properties and chemical features. However, the predictive power and interpretability of QSPR models are usually two different objectives that are difficult to achieve simultaneously. A deep learning architecture using molecular graph encoding convolutional neural networks (MGE-CNN) provided a universal strategy to construct interpretable QSPR models with high predictive power. Instead of using application-specific preset molecular descriptors or fingerprints, the models can be resolved using raw and pertinent features without manual intervention or selection. In this study, we developed acute oral toxicity (AOT) models of compounds using the MGE-CNN architecture as a case study. Three types of high-level predictive models: regression model (deepAOT-R), multi-classification model (deepAOT-C) and multi-task model (deepAOT-CR) for AOT evaluation were constructed. These models highly outperformed previously reported models. For the two external datasets containing 1673 (test set I) and 375 (test set II) compounds, the R2 and mean absolute error (MAE) of deepAOT-R on the test set I were 0.864 and 0.195, and the prediction accuracy of deepAOT-C was 95.5% and 96.3% on the test set I and II, respectively. The two external prediction accuracy of deepAOT-CR is 95.0% and 94.1%, while the R2 and MAE are 0.861 and 0.204 for test set I, respectively.Comment: 36 pages, 4 figure

arXiv.org e-Print Archive

Simulating Execution Time of Tensor Programs using Graph Neural Networks

Author: Lepert Romain
Tomczak Jakub M.
Wiggers Auke
Publication venue
Publication date: 27/11/2019
Field of study

Optimizing the execution time of tensor program, e.g., a convolution, involves finding its optimal configuration. Searching the configuration space exhaustively is typically infeasible in practice. In line with recent research using TVM, we propose to learn a surrogate model to overcome this issue. The model is trained on an acyclic graph called an abstract syntax tree, and utilizes a graph convolutional network to exploit structure in the graph. We claim that a learnable graph-based data processing is a strong competitor to heuristic-based feature extraction. We present a new dataset of graphs corresponding to configurations and their execution time for various tensor programs. We provide baselines for a runtime prediction task.Comment: All authors contributed equally. Accepted as a workshop paper at Representation Learning on Graphs and Manifolds @ ICLR 2019. Fixed values in Table

arXiv.org e-Print Archive

Molecular Inverse-Design Platform for Material Industries

Author: Bocanett Wolf
Cheng Yenwei
Fujita Akihiro
Hama Toshiyuki
Hino Katsuhiko
Hirose Shuichi
Hongo Takumi
Hsu Hsiang-Han
Kogoh Makoto
Nakano Daiju
Nakashika Hideaki
Orii Yasumitsu
Pitera Jed W.
Piunova Victoria A.
Sanders Daniel P.
Takeda Seiji
Toda Hiroki
Tsuchiya Yuta
Yano Kentaro
Zubarev Dmitry
Publication venue
Publication date: 16/05/2020
Field of study

The discovery of new materials has been the essential force which brings a discontinuous improvement to industrial products' performance. However, the extra-vast combinatorial design space of material structures exceeds human experts' capability to explore all, thereby hampering material development. In this paper, we present a material industry-oriented web platform of an AI-driven molecular inverse-design system, which automatically designs brand new molecular structures rapidly and diversely. Different from existing inverse-design solutions, in this system, the combination of substructure-based feature encoding and molecular graph generation algorithms allows a user to gain high-speed, interpretable, and customizable design process. Also, a hierarchical data structure and user-oriented UI provide a flexible and intuitive workflow. The system is deployed on IBM's and our client's cloud servers and has been used by 5 partner companies. To illustrate actual industrial use cases, we exhibit inverse-design of sugar and dye molecules, that were carried out by experimental chemists in those client companies. Compared to general human chemist's standard performance, the molecular design speed was accelerated more than 10 times, and greatly increased variety was observed in the inverse-designed molecules without loss of chemical realism.Comment: 9 pages, 7 figures, Accepted to KDD 202

arXiv.org e-Print Archive

Machine learning and AI-based approaches for bioactive ligand discovery and GPCR-ligand recognition

Author: Kaufman Benjamin
Raschka Sebastian
Publication venue: 'Elsevier BV'
Publication date: 06/06/2020
Field of study

In the last decade, machine learning and artificial intelligence applications have received a significant boost in performance and attention in both academic research and industry. The success behind most of the recent state-of-the-art methods can be attributed to the latest developments in deep learning. When applied to various scientific domains that are concerned with the processing of non-tabular data, for example, image or text, deep learning has been shown to outperform not only conventional machine learning but also highly specialized tools developed by domain experts. This review aims to summarize AI-based research for GPCR bioactive ligand discovery with a particular focus on the most recent achievements and research trends. To make this article accessible to a broad audience of computational scientists, we provide instructive explanations of the underlying methodology, including overviews of the most commonly used deep learning architectures and feature representations of molecular data. We highlight the latest AI-based research that has led to the successful discovery of GPCR bioactive ligands. However, an equal focus of this review is on the discussion of machine learning-based technology that has been applied to ligand discovery in general and has the potential to pave the way for successful GPCR bioactive ligand discovery in the future. This review concludes with a brief outlook highlighting the recent research trends in deep learning, such as active learning and semi-supervised learning, which have great potential for advancing bioactive ligand discovery.Comment: 2nd submission fixed the mis-formatted quotation characters (i.e., \^a

arXiv.org e-Print Archive

Generating equilibrium molecules with deep neural networks

Author: Gastegger Michael
Gebauer Niklas W. A.
Schütt Kristof T.
Publication venue
Publication date: 26/10/2018
Field of study

Discovery of atomistic systems with desirable properties is a major challenge in chemistry and material science. Here we introduce a novel, autoregressive, convolutional deep neural network architecture that generates molecular equilibrium structures by sequentially placing atoms in three-dimensional space. The model estimates the joint probability over molecular configurations with tractable conditional probabilities which only depend on distances between atoms and their nuclear charges. It combines concepts from state-of-the-art atomistic neural networks with auto-regressive generative models for images and speech. We demonstrate that the architecture is capable of generating molecules close to equilibrium for constitutional isomers of C

_7

_2

_{10}

arXiv.org e-Print Archive

Application of generative autoencoder in de novo molecular design

Author: Bajorath Jürgen
Blaschke Thomas
Chen Hongming
Engkvist Ola
Olivecrona Marcus
Publication venue
Publication date: 21/11/2017
Field of study

A major challenge in computational chemistry is the generation of novel molecular structures with desirable pharmacological and physiochemical properties. In this work, we investigate the potential use of autoencoder, a deep learning methodology, for de novo molecular design. Various generative autoencoders were used to map molecule structures into a continuous latent space and vice versa and their performance as structure generator was assessed. Our results show that the latent space preserves chemical similarity principle and thus can be used for the generation of analogue structures. Furthermore, the latent space created by autoencoders were searched systematically to generate novel compounds with predicted activity against dopamine receptor type 2 and compounds similar to known active compounds not included in the training set were identified

arXiv.org e-Print Archive

Leveraging binding-site structure for drug discovery with point-cloud methods

Author: Mallet Vincent
Moitessier Nicolas
Oliver Carlos G.
Waldispuhl Jerome
Publication venue
Publication date: 28/05/2019
Field of study

Computational drug discovery strategies can be broadly placed in two categories: ligand-based methods which identify novel molecules by similarity with known ligands, and structure-based methods which predict molecules with high-affinity to a given 3D structure (e.g. a protein). However, ligand-based methods do not leverage information about the binding site, and structure-based approaches rely on the knowledge of a finite set of ligands binding the target. In this work, we introduce TarLig, a novel approach that aims to bridge the gap between ligand and structure-based approaches. We use the 3D structure of the binding site as input to a model which predicts the ligand preferences of the binding site. The resulting predictions could then offer promising seeds and constraints in the chemical space search, based on the binding site structure. TarLig outperforms standard models by introducing a data-alignment and augmentation technique. The recent popularity of Volumetric 3DCNN pipelines in structural bioinformatics suggests that this extra step could help a wide range of methods to improve their results with minimal modifications

arXiv.org e-Print Archive