11 research outputs found

    Chemically Aware Model Builder (camb): an R package for property and bioactivity modelling of small molecules.

    Get PDF
    BACKGROUND: In silico predictive models have proved to be valuable for the optimisation of compound potency, selectivity and safety profiles in the drug discovery process. RESULTS: camb is an R package that provides an environment for the rapid generation of quantitative Structure-Property and Structure-Activity models for small molecules (including QSAR, QSPR, QSAM, PCM) and is aimed at both advanced and beginner R users. camb's capabilities include the standardisation of chemical structure representation, computation of 905 one-dimensional and 14 fingerprint type descriptors for small molecules, 8 types of amino acid descriptors, 13 whole protein sequence descriptors, filtering methods for feature selection, generation of predictive models (using an interface to the R package caret), as well as techniques to create model ensembles using techniques from the R package caretEnsemble). Results can be visualised through high-quality, customisable plots (R package ggplot2). CONCLUSIONS: Overall, camb constitutes an open-source framework to perform the following steps: (1) compound standardisation, (2) molecular and protein descriptor calculation, (3) descriptor pre-processing and model training, visualisation and validation, and (4) bioactivity/property prediction for new molecules. camb aims to speed model generation, in order to provide reproducibility and tests of robustness. QSPR and proteochemometric case studies are included which demonstrate camb's application.Graphical abstractFrom compounds and data to models: a complete model building workflow in one package

    Field-based Proteochemometric Models Derived from 3D Protein Structures : A Novel Approach to Visualize Affinity and Selectivity Features

    Get PDF
    Designing drugs that are selective is crucial in pharmaceutical research to avoid unwanted side effects. To decipher selectivity of drug targets, computational approaches that utilize the sequence and structural information of the protein binding pockets are frequently exploited. In addition to methods that rely only on protein information, quantitative approaches such as proteochemometrics (PCM) use the combination of protein and ligand descriptions to derive quantitative relationships with binding affinity. PCM aims to explain cross-interactions between the different proteins and ligands, hence facilitating our understanding of selectivity. The main goal of this dissertation is to develop and apply field-based PCM to improve the understanding of relevant molecular interactions through visual illustrations. Field-based description that depends on the 3D structural information of proteins enhances visual interpretability of PCM models relative to the frequently used sequence-based descriptors for proteins. In these field-based PCM studies, knowledge-based fields that explain polarity and lipophilicity of the binding pockets and WaterMap-derived fields that elucidate the positions and energetics of water molecules are used together with the various 2D / 3D ligand descriptors to investigate the selectivity profiles of kinases and serine proteases. Field-based PCM is first applied to protein kinases, for which designing selective inhibitors has always been a challenge, owing to their highly similar ATP binding pockets. Our studies show that the method could be successfully applied to pinpoint the regions influencing the binding affinity and selectivity of kinases. As an extension of the initial studies conducted on a set of 50 kinases and 80 inhibitors, field-based PCM was used to build classification models on a large dataset (95 kinases and 1572 inhibitors) to distinguish active from inactive ligands. The prediction of the bioactivities of external test set compounds or kinases with accuracies over 80% (Matthews correlation coefficient, MCC: ~0.50) and area under the ROC curve (AUC) above 0.8 together with the visual inspection of the regions promoting activity demonstrates the ability of field-based PCM to generate both predictive and visually interpretable models. Further, the application of this method to serine proteases provides an overview of the sub-pocket specificities, which is crucial for inhibitor design. Additionally, alignment-independent Zernike descriptors derived from fields were used in PCM models to study the influence of protein superimpositions on field comparisons and subsequent PCM modelling.Lääketutkimuksessa selektiivisten lääkeaineiden suunnittelu on ratkaisevan tärkeää haittavaikutusten välttämiseksi. Kohdeselektiivisyyden selvittämiseen käytetään usein tietokoneavusteisia menetelmiä, jotka hyödyntävät proteiinien sitoutumiskohtien sekvenssi- ja rakennetietoja. Proteiinilähtöisten menetelmien lisäksi kvantitatiiviset menetelmät kuten proteokemometria (proteochemometrics, PCM) yhdistävät sekä proteiinin että ligandin tietoja muodostaessaan kvantitatiivisen suhteen sitoutumisaffiniteettiin. PCM pyrkii selittämään eri proteiinien ja ligandien vuorovaikutuksia ja näin auttaa ymmärtämään selektiivisyyttä. Väitöstutkimuksen tavoitteena oli kehittää ja hyödyntää kenttäpohjaista proteokemometriaa, joka auttaa ymmärtämään relevantteja molekyylitasoisia vuorovaikutuksia visuaalisen esitystavan kautta. Proteiinin kolmiulotteisesta rakenteesta riippuva kenttäpohjainen kuvaus helpottaa PCM-mallien tulkintaa, etenkin usein käytettyihin sekvenssipohjaisiin kuvauksiin verrattuna. Näissä kenttäpohjaisissa PCM-mallinnuksissa käytettiin tietoperustaisia sitoutumistaskun polaarisuutta ja lipofiilisyyttä kuvaavia kenttiä ja WaterMap-ohjelman tuottamia vesimolekyylien sijaintia ja energiaa havainnollistavia kenttiä yhdessä lukuisten ligandia kuvaavien 2D- ja 3D-deskriptorien kanssa. Malleja sovellettiin kinaasien ja seriiniproteaasien selektiivisyysprofiilien tutkimukseen. Tutkimuksen ensimmäisessä osassa kenttäpohjaista PCM-mallinnusta sovellettiin proteiinikinaaseihin, joille selektiivisten inhibiittorien suunnittelu on haastavaa samankaltaisten ATP sitoutumistaskujen takia. Tutkimuksemme osoitti menetelmän soveltuvan kinaasien sitoutumisaffiniteettia ja selektiivisyyttä ohjaavien alueiden osoittamiseen. Jatkona 50 kinaasia ja 80 inhibiittoria käsittäneelle alkuperäiselle tutkimukselle rakensimme kenttäpohjaisia PCM-luokittelumalleja suuremmalle joukolle kinaaseja (95) ja inhibiittoreita (1572) erotellaksemme aktiiviset ja inaktiiviset ligandit toisistaan. Ulkoisen testiyhdiste- tai testikinaasijoukon bioaktiivisuuksien ennustaminen yli 80 % tarkkuudella (Matthews korrelaatiokerroin, MCC noin 0,50) ja ROC-käyrän alle jäävä ala (AUC) yli 0,8 yhdessä aktiivisuutta tukevien alueiden visuaalisen tarkastelun kanssa osoittivat kenttäpohjaisen PCM:n pystyvän tuottamaan sekä ennustavia että visuaalisesti ymmärrettäviä malleja. Tutkimuksen toisessa osassa metodin soveltaminen seriiniproteaaseihin tuotti yleisnäkemyksen sitoutumistaskun eri osien spesifisyyksistä, mikä on ensiarvoisen tärkeää inhibiittorien suunnittelulle. Lisäksi kentistä johdettuja, proteiinien päällekkäinasettelusta riippumattomia Zernike-deskriptoreita hyödynnettiin PCM-malleissa arvioidaksemme proteiinien päällekkäinasettelun vaikutusta kenttien vertailuun ja sen jälkeiseen PCM-mallinnukseen

    Kekulescope: Prediction of cancer cell line sensitivity and compound potency using convolutional neural networks trained on compound images

    Get PDF
    The application of convolutional neural networks (ConvNets) to harness high-content screening images or 2D compound representations is gaining increasing attention in drug discovery. However, existing applications often require large data sets for training, or sophisticated pretraining schemes. Here, we show using 33 IC50 data sets from ChEMBL 23 that the in vitro activity of compounds on cancer cell lines and protein targets can be accurately predicted on a continuous scale from their Kekule structure representations alone by extending existing architectures, which were pretrained on unrelated image data sets. We show that the predictive power of the generated models is comparable to that of Random Forest (RF) models and fully-connected Deep Neural Networks trained on circular (Morgan) fingerprints. Notably, including additional fully-connected layers further increases the predictive power of the ConvNets by up to 10%. Analysis of the predictions generated by RF models and ConvNets shows that by simply averaging the output of the RF models and ConvNets we obtain significantly lower errors in prediction for multiple data sets, although the effect size is small, than those obtained with either model alone, indicating that the features extracted by the convolutional layers of the ConvNets provide complementary predictive signal to Morgan fingerprints. Lastly, we show that multi-task ConvNets trained on compound images permit to model COX isoform selectivity on a continuous scale with errors in prediction comparable to the uncertainty of the data. Overall, in this work we present a set of ConvNet architectures for the prediction of compound activity from their Kekule structure representations with state-of-the-art performance, that require no generation of compound descriptors or use of sophisticated image processing techniques

    Développement de nouvelles approches protéo-chimiométriques appliquées à l'étude des interactions et de la sélectivité des inhibiteurs de kinases

    Get PDF
    The human kinome contains 518 proteins. They share a common mechanism of protein phosphorylation known to play an important role in cellular signaling pathways. Impaired kinase function is recognized to be involved in severe diseases like cancer. Due to high structural similarity between protein kinases, development of potent and selective kinase inhibitors is a challenging task. The selectivity of kinase inhibitors may lead to side effects potentially harmful. In this thesis, we first developed new selectivity metrics to determine inhibitor selectivity directly from biological inhibition data. Compared to existing metrics, the new selectivity scores can be applied on diverse inhibition data types. Second, we developed a proteometric approach in order to understand why some protein kinases are never inhibited by Type II inhibitors. The statistical model built for this purpose allowed us to identify several discriminant residues of which few of them correspond to experimentally described residues of interest. Third, using a new 3D protein kinase descriptor, we developed and validated novel proteo-chemometrics approaches to study and discover new kinase inhibitors.Le kinome humain comprend 518 protéines. Elles participent au processus de phosphorylation des protéines qui joue un rôle important dans les voies de signalisation cellulaire. Leur dérégulation est connue comme étant une cause de nombreuses maladies graves telle que les cancers. Du fait de leur grande similarité structurale des protéines kinases, il est difficile de développer des inhibiteurs qui soient à la fois efficaces et sélectifs. L’absence de sélectivité conduit le plus souvent à des effets secondaires particulièrement néfastes pour l’organisme. Au cours de cette thèse, nous avons d’abord développé de nouvelles métriques dont le but est de déterminer la sélectivité d’inhibiteurs à partir de données d’inhibition. Elles présentent l’avantage, comparées à d’autres métriques, d’être applicables sur n’importe quel type de données. Dans un deuxième temps, nous avons développé une approche protéométrique dans le but de comprendre pourquoi certaines protéines kinases ne sont jamais inhibées par des inhibiteurs de Type II. Le modèle statistique mis en place nous a permis d’identifier plusieurs résidus discriminants dont certains déjà décrits expérimentalement dans la littérature. Dans un troisième temps, nous avons développé un nouveau descripteur 3D de protéines kinases avec lequel nous avons mis en place et validé des modèles protéo-chimiométriques visant à étudier et découvrir de nouveaux inhibiteurs

    IN SILICO METHODS FOR DRUG DESIGN AND DISCOVERY

    Get PDF
    Computer-aided drug design (CADD) methodologies are playing an ever-increasing role in drug discovery that are critical in the cost-effective identification of promising drug candidates. These computational methods are relevant in limiting the use of animal models in pharmacological research, for aiding the rational design of novel and safe drug candidates, and for repositioning marketed drugs, supporting medicinal chemists and pharmacologists during the drug discovery trajectory.Within this field of research, we launched a Research Topic in Frontiers in Chemistry in March 2019 entitled “In silico Methods for Drug Design and Discovery,” which involved two sections of the journal: Medicinal and Pharmaceutical Chemistry and Theoretical and Computational Chemistry. For the reasons mentioned, this Research Topic attracted the attention of scientists and received a large number of submitted manuscripts. Among them 27 Original Research articles, five Review articles, and two Perspective articles have been published within the Research Topic. The Original Research articles cover most of the topics in CADD, reporting advanced in silico methods in drug discovery, while the Review articles offer a point of view of some computer-driven techniques applied to drug research. Finally, the Perspective articles provide a vision of specific computational approaches with an outlook in the modern era of CADD

    Machine Learning Applications for Drug Repurposing

    Full text link
    The cost of bringing a drug to market is astounding and the failure rate is intimidating. Drug discovery has been of limited success under the conventional reductionist model of one-drug-one-gene-one-disease paradigm, where a single disease-associated gene is identified and a molecular binder to the specific target is subsequently designed. Under the simplistic paradigm of drug discovery, a drug molecule is assumed to interact only with the intended on-target. However, small molecular drugs often interact with multiple targets, and those off-target interactions are not considered under the conventional paradigm. As a result, drug-induced side effects and adverse reactions are often neglected until a very late stage of the drug discovery, where the discovery of drug-induced side effects and potential drug resistance can decrease the value of the drug and even completely invalidate the use of the drug. Thus, a new paradigm in drug discovery is needed. Structural systems pharmacology is a new paradigm in drug discovery that the drug activities are studied by data-driven large-scale models with considerations of the structures and drugs. Structural systems pharmacology will model, on a genome scale, the energetic and dynamic modifications of protein targets by drug molecules as well as the subsequent collective effects of drug-target interactions on the phenotypic drug responses. To date, however, few experimental and computational methods can determine genome-wide protein-ligand interaction networks and the clinical outcomes mediated by them. As a result, the majority of proteins have not been charted for their small molecular ligands; we have a limited understanding of drug actions. To address the challenge, this dissertation seeks to develop and experimentally validate innovative computational methods to infer genome-wide protein-ligand interactions and multi-scale drug-phenotype associations, including drug-induced side effects. The hypothesis is that the integration of data-driven bioinformatics tools with structure-and-mechanism-based molecular modeling methods will lead to an optimal tool for accurately predicting drug actions and drug associated phenotypic responses, such as side effects. This dissertation starts by reviewing the current status of computational drug discovery for complex diseases in Chapter 1. In Chapter 2, we present REMAP, a one-class collaborative filtering method to predict off-target interactions from protein-ligand interaction network. In our later work, REMAP was integrated with structural genomics and statistical machine learning methods to design a dual-indication polypharmacological anticancer therapy. In Chapter 3, we extend REMAP, the core method in Chapter 2, into a multi-ranked collaborative filtering algorithm, WINTF, and present relevant mathematical justifications. Chapter 4 is an application of WINTF to repurpose an FDA-approved drug diazoxide as a potential treatment for triple negative breast cancer, a deadly subtype of breast cancer. In Chapter 5, we present a multilayer extension of REMAP, applied to predict drug-induced side effects and the associated biological pathways. In Chapter 6, we close this dissertation by presenting a deep learning application to learn biochemical features from protein sequence representation using a natural language processing method

    Drug Repurposing

    Get PDF
    This book focuses on various aspects and applications of drug repurposing, the understanding of which is important for treating diseases. Due to the high costs and time associated with the new drug discovery process, the inclination toward drug repurposing is increasing for common as well as rare diseases. A major focus of this book is understanding the role of drug repurposing to develop drugs for infectious diseases, including antivirals, antibacterial and anticancer drugs, as well as immunotherapeutics
    corecore