1,391 research outputs found

    Latent Molecular Optimization for Targeted Therapeutic Design

    Full text link
    We devise an approach for targeted molecular design, a problem of interest in computational drug discovery: given a target protein site, we wish to generate a chemical with both high binding affinity to the target and satisfactory pharmacological properties. This problem is made difficult by the enormity and discreteness of the space of potential therapeutics, as well as the graph-structured nature of biomolecular surface sites. Using a dataset of protein-ligand complexes, we surmount these issues by extracting a signature of the target site with a graph convolutional network and by encoding the discrete chemical into a continuous latent vector space. The latter embedding permits gradient-based optimization in molecular space, which we perform using learned differentiable models of binding affinity and other pharmacological properties. We show that our approach is able to efficiently optimize these multiple objectives and discover new molecules with potentially useful binding properties, validated via docking methods

    Graph Convolutional Neural Networks for Polymers Property Prediction

    Full text link
    A fast and accurate predictive tool for polymer properties is demanding and will pave the way to iterative inverse design. In this work, we apply graph convolutional neural networks (GCNN) to predict the dielectric constant and energy bandgap of polymers. Using density functional theory (DFT) calculated properties as the ground truth, GCNN can achieve remarkable agreement with DFT results. Moreover, we show that GCNN outperforms other machine learning algorithms. Our work proves that GCNN relies only on morphological data of polymers and removes the requirement for complicated hand-crafted descriptors, while still offering accuracy in fast predictions.Comment: Accepted for NIPS 2018 Workshop on Machine Learning for Molecules and Material

    Feature Assisted bi-directional LSTM Model for Protein-Protein Interaction Identification from Biomedical Texts

    Full text link
    Knowledge about protein-protein interactions is essential in understanding the biological processes such as metabolic pathways, DNA replication, and transcription etc. However, a majority of the existing Protein-Protein Interaction (PPI) systems are dependent primarily on the scientific literature, which is yet not accessible as a structured database. Thus, efficient information extraction systems are required for identifying PPI information from the large collection of biomedical texts. Most of the existing systems model the PPI extraction task as a classification problem and are tailored to the handcrafted feature set including domain dependent features. In this paper, we present a novel method based on deep bidirectional long short-term memory (B-LSTM) technique that exploits word sequences and dependency path related information to identify PPI information from text. This model leverages joint modeling of proteins and relations in a single unified framework, which we name as Shortest Dependency Path B-LSTM (sdpLSTM) model. We perform experiments on two popular benchmark PPI datasets, namely AiMed & BioInfer. The evaluation shows the F1-score values of 86.45% and 77.35% on AiMed and BioInfer, respectively. Comparisons with the existing systems show that our proposed approach attains state-of-the-art performance

    Deep Learning Based Regression and Multi-class Models for Acute Oral Toxicity Prediction with Automatic Chemical Feature Extraction

    Full text link
    For quantitative structure-property relationship (QSPR) studies in chemoinformatics, it is important to get interpretable relationship between chemical properties and chemical features. However, the predictive power and interpretability of QSPR models are usually two different objectives that are difficult to achieve simultaneously. A deep learning architecture using molecular graph encoding convolutional neural networks (MGE-CNN) provided a universal strategy to construct interpretable QSPR models with high predictive power. Instead of using application-specific preset molecular descriptors or fingerprints, the models can be resolved using raw and pertinent features without manual intervention or selection. In this study, we developed acute oral toxicity (AOT) models of compounds using the MGE-CNN architecture as a case study. Three types of high-level predictive models: regression model (deepAOT-R), multi-classification model (deepAOT-C) and multi-task model (deepAOT-CR) for AOT evaluation were constructed. These models highly outperformed previously reported models. For the two external datasets containing 1673 (test set I) and 375 (test set II) compounds, the R2 and mean absolute error (MAE) of deepAOT-R on the test set I were 0.864 and 0.195, and the prediction accuracy of deepAOT-C was 95.5% and 96.3% on the test set I and II, respectively. The two external prediction accuracy of deepAOT-CR is 95.0% and 94.1%, while the R2 and MAE are 0.861 and 0.204 for test set I, respectively.Comment: 36 pages, 4 figure

    Simulating Execution Time of Tensor Programs using Graph Neural Networks

    Full text link
    Optimizing the execution time of tensor program, e.g., a convolution, involves finding its optimal configuration. Searching the configuration space exhaustively is typically infeasible in practice. In line with recent research using TVM, we propose to learn a surrogate model to overcome this issue. The model is trained on an acyclic graph called an abstract syntax tree, and utilizes a graph convolutional network to exploit structure in the graph. We claim that a learnable graph-based data processing is a strong competitor to heuristic-based feature extraction. We present a new dataset of graphs corresponding to configurations and their execution time for various tensor programs. We provide baselines for a runtime prediction task.Comment: All authors contributed equally. Accepted as a workshop paper at Representation Learning on Graphs and Manifolds @ ICLR 2019. Fixed values in Table

    Molecular Inverse-Design Platform for Material Industries

    Full text link
    The discovery of new materials has been the essential force which brings a discontinuous improvement to industrial products' performance. However, the extra-vast combinatorial design space of material structures exceeds human experts' capability to explore all, thereby hampering material development. In this paper, we present a material industry-oriented web platform of an AI-driven molecular inverse-design system, which automatically designs brand new molecular structures rapidly and diversely. Different from existing inverse-design solutions, in this system, the combination of substructure-based feature encoding and molecular graph generation algorithms allows a user to gain high-speed, interpretable, and customizable design process. Also, a hierarchical data structure and user-oriented UI provide a flexible and intuitive workflow. The system is deployed on IBM's and our client's cloud servers and has been used by 5 partner companies. To illustrate actual industrial use cases, we exhibit inverse-design of sugar and dye molecules, that were carried out by experimental chemists in those client companies. Compared to general human chemist's standard performance, the molecular design speed was accelerated more than 10 times, and greatly increased variety was observed in the inverse-designed molecules without loss of chemical realism.Comment: 9 pages, 7 figures, Accepted to KDD 202

    Machine learning and AI-based approaches for bioactive ligand discovery and GPCR-ligand recognition

    Full text link
    In the last decade, machine learning and artificial intelligence applications have received a significant boost in performance and attention in both academic research and industry. The success behind most of the recent state-of-the-art methods can be attributed to the latest developments in deep learning. When applied to various scientific domains that are concerned with the processing of non-tabular data, for example, image or text, deep learning has been shown to outperform not only conventional machine learning but also highly specialized tools developed by domain experts. This review aims to summarize AI-based research for GPCR bioactive ligand discovery with a particular focus on the most recent achievements and research trends. To make this article accessible to a broad audience of computational scientists, we provide instructive explanations of the underlying methodology, including overviews of the most commonly used deep learning architectures and feature representations of molecular data. We highlight the latest AI-based research that has led to the successful discovery of GPCR bioactive ligands. However, an equal focus of this review is on the discussion of machine learning-based technology that has been applied to ligand discovery in general and has the potential to pave the way for successful GPCR bioactive ligand discovery in the future. This review concludes with a brief outlook highlighting the recent research trends in deep learning, such as active learning and semi-supervised learning, which have great potential for advancing bioactive ligand discovery.Comment: 2nd submission fixed the mis-formatted quotation characters (i.e., \^a

    Generating equilibrium molecules with deep neural networks

    Full text link
    Discovery of atomistic systems with desirable properties is a major challenge in chemistry and material science. Here we introduce a novel, autoregressive, convolutional deep neural network architecture that generates molecular equilibrium structures by sequentially placing atoms in three-dimensional space. The model estimates the joint probability over molecular configurations with tractable conditional probabilities which only depend on distances between atoms and their nuclear charges. It combines concepts from state-of-the-art atomistic neural networks with auto-regressive generative models for images and speech. We demonstrate that the architecture is capable of generating molecules close to equilibrium for constitutional isomers of C7_7O2_2H10_{10}

    Application of generative autoencoder in de novo molecular design

    Full text link
    A major challenge in computational chemistry is the generation of novel molecular structures with desirable pharmacological and physiochemical properties. In this work, we investigate the potential use of autoencoder, a deep learning methodology, for de novo molecular design. Various generative autoencoders were used to map molecule structures into a continuous latent space and vice versa and their performance as structure generator was assessed. Our results show that the latent space preserves chemical similarity principle and thus can be used for the generation of analogue structures. Furthermore, the latent space created by autoencoders were searched systematically to generate novel compounds with predicted activity against dopamine receptor type 2 and compounds similar to known active compounds not included in the training set were identified

    Leveraging binding-site structure for drug discovery with point-cloud methods

    Full text link
    Computational drug discovery strategies can be broadly placed in two categories: ligand-based methods which identify novel molecules by similarity with known ligands, and structure-based methods which predict molecules with high-affinity to a given 3D structure (e.g. a protein). However, ligand-based methods do not leverage information about the binding site, and structure-based approaches rely on the knowledge of a finite set of ligands binding the target. In this work, we introduce TarLig, a novel approach that aims to bridge the gap between ligand and structure-based approaches. We use the 3D structure of the binding site as input to a model which predicts the ligand preferences of the binding site. The resulting predictions could then offer promising seeds and constraints in the chemical space search, based on the binding site structure. TarLig outperforms standard models by introducing a data-alignment and augmentation technique. The recent popularity of Volumetric 3DCNN pipelines in structural bioinformatics suggests that this extra step could help a wide range of methods to improve their results with minimal modifications
    • …
    corecore