29,345 research outputs found
Curiosity in exploring chemical spaces: Intrinsic rewards for deep molecular reinforcement learning
Computer-aided design of molecules has the potential to disrupt the field of drug and material discovery. Machine learning, and deep learning, in particular, have been topics where the field has been developing at a rapid pace. Reinforcement learning is a particularly promising approach since it allows for molecular design without prior knowledge. However, the search space is vast and efficient exploration is desirable when using reinforcement learning agents. In this study, we propose an algorithm to aid efficient exploration. The algorithm is inspired by a concept known in the literature as curiosity. We show on three benchmarks that a curious agent finds better performing molecules. This indicates an exciting new research direction for reinforcement learning agents that can explore the chemical space out of their own motivation. This has the potential to eventually lead to unexpected new molecules that no human has thought about so far
Machine Learning Model for Repurposing Drugs to Target Viral Diseases
With recent events, such as the Covid-19 pandemic, it is increasingly important to develop strategies to combat viral diseases. Due to technological advancements, computer-aided drug design and machine learning (ML)-based hit identification strategies have gained popularity. Applying these techniques to identify novel scaffolds and/or repurpose existing therapeutics for viral diseases is a promising approach. As an avenue to improve existing classification models for antiviral applications, this thesis aimed to make improvements to non-binding data selection within these models. We created a classification model using molecular fingerprints to assess the performance of machine learning predictions when the model is trained using randomly selected and rationally selected non-binding datasets. Our analyses revealed that machine learning predictions can be improved using a rational selection approach. We further used this approach and trained three machine learning models based on XGBoost, Random Forest, and Support Vector Machine to predict potential inhibitors for the SARS-CoV2 main protease (Mpro) enzyme. Probability-ranked hits from the combined model were further analyzed using classical structure-based methods. The binding modes and affinities of the hits were identified using AutoDock Vina, and molecular dynamics simulations-enabled MM-GBSA calculations. The top hits identified from this multi-step screening approach revealed potential candidates that show improved affinity and stability than existing non-covalent Mpro inhibitors. Thus, our approach and the model could be useful for screening large ligand libraries
Recommended from our members
Quantitative surface field analysis: learning causal models to predict ligand binding affinity and pose.
We introduce the QuanSA method for inducing physically meaningful field-based models of ligand binding pockets based on structure-activity data alone. The method is closely related to the QMOD approach, substituting a learned scoring field for a pocket constructed of molecular fragments. The problem of mutual ligand alignment is addressed in a general way, and optimal model parameters and ligand poses are identified through multiple-instance machine learning. We provide algorithmic details along with performance results on sixteen structure-activity data sets covering many pharmaceutically relevant targets. In particular, we show how models initially induced from small data sets can extrapolatively identify potent new ligands with novel underlying scaffolds with very high specificity. Further, we show that combining predictions from QuanSA models with those from physics-based simulation approaches is synergistic. QuanSA predictions yield binding affinities, explicit estimates of ligand strain, associated ligand pose families, and estimates of structural novelty and confidence. The method is applicable for fine-grained lead optimization as well as potent new lead identification
TeachOpenCADD: a teaching platform for computer-aided drug design using open source packages and data
Owing to the increase in freely available software and data for cheminformatics and structural bioinformatics, research for computer-aided drug design (CADD) is more and more built on modular, reproducible, and easy-to-share pipelines. While documentation for such tools is available, there are only a few freely accessible examples that teach the underlying concepts focused on CADD, especially addressing users new to the field. Here, we present TeachOpenCADD, a teaching platform developed by students for students, using open source compound and protein data as well as basic and CADD-related Python packages. We provide interactive Jupyter notebooks for central CADD topics, integrating theoretical background and practical code. TeachOpenCADD is freely available on GitHub: https://github.com/volkamerlab/TeachOpenCAD
Retrosynthetic reaction prediction using neural sequence-to-sequence models
We describe a fully data driven model that learns to perform a retrosynthetic
reaction prediction task, which is treated as a sequence-to-sequence mapping
problem. The end-to-end trained model has an encoder-decoder architecture that
consists of two recurrent neural networks, which has previously shown great
success in solving other sequence-to-sequence prediction tasks such as machine
translation. The model is trained on 50,000 experimental reaction examples from
the United States patent literature, which span 10 broad reaction types that
are commonly used by medicinal chemists. We find that our model performs
comparably with a rule-based expert system baseline model, and also overcomes
certain limitations associated with rule-based expert systems and with any
machine learning approach that contains a rule-based expert system component.
Our model provides an important first step towards solving the challenging
problem of computational retrosynthetic analysis
- …