Search CORE

51 research outputs found

How often do follow-on activities occur - trends seen in a patent database for GPCRs

Author: Muresan Sorel
Tyrchan Christian
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Springer - Publisher Connector

PubMed Central

Generating Focussed Molecule Libraries for Drug Discovery with Recurrent Neural Networks

Author: Kogej Thierry
Segler Marwin H. S.
Tyrchan Christian
Waller Mark P.
Publication venue
Publication date: 05/01/2017
Field of study

In de novo drug design, computational strategies are used to generate novel molecules with good affinity to the desired biological target. In this work, we show that recurrent neural networks can be trained as generative models for molecular structures, similar to statistical language models in natural language processing. We demonstrate that the properties of the generated molecules correlate very well with the properties of the molecules used to train the model. In order to enrich libraries with molecules active towards a given biological target, we propose to fine-tune the model with small sets of molecules, which are known to be active against that target. Against Staphylococcus aureus, the model reproduced 14% of 6051 hold-out test molecules that medicinal chemists designed, whereas against Plasmodium falciparum (Malaria) it reproduced 28% of 1240 test molecules. When coupled with a scoring function, our model can perform the complete de novo drug design cycle to generate large sets of novel molecules for drug discovery.Comment: 17 pages, 17 figure

arXiv.org e-Print Archive

Directory of Open Access Journals

FigShare

Comparing manual and automated extraction of chemical entities from documents

Author: Christian Tyrchan
Sorel Muresan
T Engel
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Utilizing Reinforcement Learning for de novo Drug Design

Author: Chehreghani Morteza Haghir
Engkvist Ola
Svensson Hampus Gummesson
Tyrchan Christian
Publication venue
Publication date: 30/01/2024
Field of study

Deep learning-based approaches for generating novel drug molecules with specific properties have gained a lot of interest in the last few years. Recent studies have demonstrated promising performance for string-based generation of novel molecules utilizing reinforcement learning. In this paper, we develop a unified framework for using reinforcement learning for de novo drug design, wherein we systematically study various on- and off-policy reinforcement learning algorithms and replay buffers to learn an RNN-based policy to generate novel molecules predicted to be active against the dopamine receptor DRD2. Our findings suggest that it is advantageous to use at least both top-scoring and low-scoring molecules for updating the policy when structural diversity is essential. Using all generated molecules at an iteration seems to enhance performance stability for on-policy algorithms. In addition, when replaying high, intermediate, and low-scoring molecules, off-policy algorithms display the potential of improving the structural diversity and number of active molecules generated, but possibly at the cost of a longer exploration phase. Our work provides an open-source framework enabling researchers to investigate various reinforcement learning methods for de novo drug design

arXiv.org e-Print Archive

Transformer-based molecular optimization beyond matched molecular pairs

Author: Bjerrum Esben Jannik
Czechtizky Werngard
Engkvist Ola
He Jiazhen
Nittinger Eva
Patronov Atanas
Tyrchan Christian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

Molecular optimization aims to improve the drug profile of a starting molecule. It is a fundamental problem in drug discovery but challenging due to (i) the requirement of simultaneous optimization of multiple properties and (ii) the large chemical space to explore. Recently, deep learning methods have been proposed to solve this task by mimicking the chemist\u27s intuition in terms of matched molecular pairs (MMPs). Although MMPs is a widely used strategy by medicinal chemists, it offers limited capability in terms of exploring the space of structural modifications, therefore does not cover the complete space of solutions. Often more general transformations beyond the nature of MMPs are feasible and/or necessary, e.g. simultaneous modifications of the starting molecule at different places including the core scaffold. This study aims to provide a general methodology that offers more general structural modifications beyond MMPs. In particular, the same Transformer architecture is trained on different datasets. These datasets consist of a set of molecular pairs which reflect different types of transformations. Beyond MMP transformation, datasets reflecting general structural changes are constructed from ChEMBL based on two approaches: Tanimoto similarity (allows for multiple modifications) and scaffold matching (allows for multiple modifications but keep the scaffold constant) respectively. We investigate how the model behavior can be altered by tailoring the dataset while using the same model architecture. Our results show that the models trained on differently prepared datasets transform a given starting molecule in a way that it reflects the nature of the dataset used for training the model. These models could complement each other and unlock the capability for the chemists to pursue different options for improving a starting molecule

PubMed Central

Chalmers Research

Autonomous Drug Design with Multi-Armed Bandits

Author: Bjerrum Esben Jannik
Chehreghani Morteza Haghir
Engkvist Ola
Svensson Hampus Gummesson
Tyrchan Christian
Publication venue
Publication date: 01/01/2022
Field of study

Recent developments in artificial intelligence and automation support a new drug design paradigm: autonomous drug design. Under this paradigm, generative models can provide suggestions on thousands of molecules with specific properties, and automated laboratories can potentially make, test and analyze molecules with minimal human supervision. However, since still only a limited number of molecules can be synthesized and tested, an obvious challenge is how to efficiently select among provided suggestions in a closed-loop system. We formulate this task as a stochastic multi-armed bandit problem with multiple plays, volatile arms and similarity information. To solve this task, we adapt previous work on multi-armed bandits to this setting, and compare our solution with random sampling, greedy selection and decaying-epsilon-greedy selection strategies. According to our simulation results, our approach has the potential to perform better exploration and exploitation of the chemical space for autonomous drug design

arXiv.org e-Print Archive

Chalmers Research

Randomized SMILES strings improve the quality of molecular generative models

Author: Arús-Pous Josep
Bjerrum Esben Jannik
Chen Hongming
Engkvist Ola
Johansson Simon Viet
Prykhodko Oleksii
Reymond Jean-Louis
Tyrchan Christian
Publication venue: Springer
Publication date: 01/01/2019
Field of study

Recurrent Neural Networks (RNNs) trained with a set of molecules represented as unique (canonical) SMILES strings, have shown the capacity to create large chemical spaces of valid and meaningful structures. Herein we perform an extensive benchmark on models trained with subsets of GDB-13 of different sizes (1 million, 10,000 and 1000), with different SMILES variants (canonical, randomized and DeepSMILES), with two different recurrent cell types (LSTM and GRU) and with different hyperparameter combinations. To guide the benchmarks new metrics were developed that define how well a model has generalized the training set. The generated chemical space is evaluated with respect to its uniformity, closedness and completeness. Results show that models that use LSTM cells trained with 1 million randomized SMILES, a non-unique molecular string representation, are able to generalize to larger chemical spaces than the other approaches and they represent more accurately the target chemical space. Specifically, a model was trained with randomized SMILES that was able to generate almost all molecules from GDB-13 with a quasi-uniform probability. Models trained with smaller samples show an even bigger improvement when trained with randomized SMILES models. Additionally, models were trained on molecules obtained from ChEMBL and illustrate again that training with randomized SMILES lead to models having a better representation of the drug-like chemical space. Namely, the model trained with randomized SMILES was able to generate at least double the amount of unique molecules with the same distribution of properties comparing to one trained with canonical SMILES

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Bern Open Repository and Information System (BORIS)

Annotated chemical patent corpus: A gold standard for text mining

Author: Akhondi S.A. (Saber)
Boppana K. (Kiran)
Jagarlapudi S.A.R.P. (Sarma A. R. P.)
Klenner A.G. (Alexander G.)
Kors J.A. (Jan)
Lowe D. (Daniel)
Manchala A.K. (Anil K.)
Muresan C. (Cornelia)
Sayle R. (Roger)
Tyrchan C. (Christian)
Zimmermann M. (Marc)
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2014
Field of study

Exploring the chemical and biological space covered by patent applications is crucial in early-stage medicinal chemistry activities. Patent analysis can provide understanding of compound prior art, novelty checking, validation of biological assays, and identification of new starting points for chemical exploration. Extracting chemical and biological entities from patents through manual extraction by expert curators can take substantial amount of time and resources. Text mining methods can help to ease this process. To validate the performance of such methods, a manually annotated patent corpus is essential. In this study we have produced a large gold standard chemical patent corpus. We developed annotation guidelines and selected 200 full patents from the World Intellectual Property Organization, United States Patent and Trademark Office, and European Patent Office. The patents were pre-annotated automatically and made available to four independent annotator groups each consisting of two to ten annotators. The annotators marked chemicals in different subclasses, diseases, t

Crossref

Directory of Open Access Journals

Fraunhofer-ePrints

PubMed Central

EUR Research Repository

Erasmus University Digital Repository

Design and synthesis of soluble and cell-permeable PI3Kδ inhibitors for long-acting inhaled administration

Author: Björhall Karin
Bonn Britta K.
Carlsson Johan
Chen Yunhua
Eriksson Anders
Fredlund Linda
Hao Hai'e
Holden Neil S.
Karabelas Kostas
Lindmark Helena
Liu Feifei
Pemberton Nils
Perry Matthew W. D.
Petersen Jens
Rodrigo Blomqvist Sandra
Smith Reed W.
Svensson Tor
Terstiege Ina
Tyrchan Christian
Yang Wenzhen
Zhao Shuchun
Öster Linda
Publication venue: 'American Chemical Society (ACS)'
Publication date: 18/05/2017
Field of study

PI3Kδ is a lipid kinase that is believed to be important in the migration and activation of cells of the immune system. Inhibition is hypothesised to provide a powerful yet selective immunomodulatory effect that may be beneficial for the treatment of conditions such as asthma or rheumatoid arthritis. In this work we describe the identification of inhibitors based on a thiazolopyridone core structure and their subsequent optimisation for inhalation. The initially identified compound (13) had good potency and isoform selectivity but was not suitable for inhalation. Addition of basic substituents to a region of the molecule pointing to solvent was tolerated (enzyme inhibition pIC50 >9) and by careful manipulation of the pKa and lipophilicity we were able to discover compounds (20b, 20f) with good lung retention and cell potency that could be taken forward to in-vivo studies where significant target engagement could be demonstrated

University of Lincoln Institutional Repository

FigShare

Matched Molecular Pair Analysis in Short: Algorithms, Applications and Limitations

Author: Christian Tyrchan
Emma Evertsson
Publication venue: 'Elsevier BV'
Publication date: 13/12/2016
Field of study

Molecular matched pair (MMP) analysis has been used for more than 40 years within molecular design and is still an important tool to analyse potency data and other compound properties. The methods used to find matched pairs range from manual inspection, through supervised methods to unsupervised methods, which are able to find previously unknown molecular pairs. Recent publications demonstrate the value of automatic MMP analysis of publicly available bioactivity databases. The MMP concept has its limitations, but because of its easy to use and intuitive nature, it will remain one of the most important tools in the toolbox of many drug designers

Directory of Open Access Journals

PubMed Central