50 research outputs found

    Co-regularised support vector regression

    Get PDF
    We consider a semi-supervised learning scenario for regression, where only few labelled examples, many unlabelled instances and different data representations (multiple views) are available. For this setting, we extend support vector regression with a co-regularisation term and obtain co-regularised support vector regression (CoSVR). In addition to labelled data, co-regularisation includes information from unlabelled examples by ensuring that models trained on different views make similar predictions. Ligand affinity prediction is an important real-world problem that fits into this scenario. The characterisation of the strength of protein-ligand bonds is a crucial step in the process of drug discovery and design. We introduce variants of the base CoSVR algorithm and discuss their theoretical and computational properties. For the CoSVR function class we provide a theoretical bound on the Rademacher complexity. Finally, we demonstrate the usefulness of CoSVR for the affinity prediction task and evaluate its performance empirically on different protein-ligand datasets. We show that CoSVR outperforms co-regularised least squares regression as well as existing state-of-the-art approaches for affinity prediction

    Computational exploration of molecular receptive fields in the olfactory bulb reveals a glomerulus-centric chemical map

    Get PDF
    © The Author(s) 2020. This article is licensed under a Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.Progress in olfactory research is currently hampered by incomplete knowledge about chemical receptive ranges of primary receptors. Moreover, the chemical logic underlying the arrangement of computational units in the olfactory bulb has still not been resolved. We undertook a large-scale approach at characterising molecular receptive ranges (MRRs) of glomeruli in the dorsal olfactory bulb (dOB) innervated by the MOR18-2 olfactory receptor, also known as Olfr78, with human ortholog OR51E2. Guided by an iterative approach that combined biological screening and machine learning, we selected 214 odorants to characterise the response of MOR18-2 and its neighbouring glomeruli. We found that a combination of conventional physico-chemical and vibrational molecular descriptors performed best in predicting glomerular responses using nonlinear Support-Vector Regression. We also discovered several previously unknown odorants activating MOR18-2 glomeruli, and obtained detailed MRRs of MOR18-2 glomeruli and their neighbours. Our results confirm earlier findings that demonstrated tunotopy, that is, glomeruli with similar tuning curves tend to be located in spatial proximity in the dOB. In addition, our results indicate chemotopy, that is, a preference for glomeruli with similar physico-chemical MRR descriptions being located in spatial proximity. Together, these findings suggest the existence of a partial chemical map underlying glomerular arrangement in the dOB. Our methodology that combines machine learning and physiological measurements lights the way towards future high-throughput studies to deorphanise and characterise structure-activity relationships in olfaction.Peer reviewe

    Exact and efficient top-K inference for multi-target prediction by querying separable linear relational models

    Get PDF
    Many complex multi-target prediction problems that concern large target spaces are characterised by a need for efficient prediction strategies that avoid the computation of predictions for all targets explicitly. Examples of such problems emerge in several subfields of machine learning, such as collaborative filtering, multi-label classification, dyadic prediction and biological network inference. In this article we analyse efficient and exact algorithms for computing the top-KK predictions in the above problem settings, using a general class of models that we refer to as separable linear relational models. We show how to use those inference algorithms, which are modifications of well-known information retrieval methods, in a variety of machine learning settings. Furthermore, we study the possibility of scoring items incompletely, while still retaining an exact top-K retrieval. Experimental results in several application domains reveal that the so-called threshold algorithm is very scalable, performing often many orders of magnitude more efficiently than the naive approach

    Predicting and Testing Helix-Mimetic Inhibitors of the p53-Mdm2 Interaction

    Get PDF
    Aberrant protein-protein interactions (PPIs) are found in many disease states. Consequently, there is a need for PPI inhibitors for use as research tools and pharmaceutical lead compounds. Computational methods could greatly assist with the search for new PPIs. Oligobenzamides are novel PPI inhibitors which can theoretically be produced to display any sequence of side chains. Understanding the nature of oligobenzamide binding is important for identification of the most efficient strategy of predicting oligobenzamide inhibitors. The prediction of oligobenzamide affinities using thermodynamic integration and implicit solvent methods is described. Affinities of oligobenzamides for Mdm2 predicted using implicit solvent methods bore a moderate correlation with measured affinities. Examination of MM-PBSA results using analysis of variance revealed that it is not necessary to run simulations with every member of a large combinatorial library in order to predict their relative affinities because within a particular binding site, the degree of interaction between the side chains is small. However, it could be useful to separate molecules based on their predicted binding pose because oligobenzamides can bind to Mdm2 in many different ways, depending on the choice of side chains. This insight will be valuable for future attempts to predict oligobenzamide affinities. The 1H-15N HSQC NMR spectrum peaks of 15N-labelled Mdm2 L33E were assigned to facilitate the future validation of binding poses. An oligoamide was shown using NMR to bind in the correct place. However, NMR testing revealed that oligobenzamides can aggregate in aqueous solution despite being soluble. A novel FRET-based method was also developed which can be used to test potential inhibitors with a low solubility and high absorbance during their development. It was adapted for a microwell plate to facilitate future high throughput screening and an assay involving Cherry-labelled Mdm2 was tested which could be developed into an in vivo assay in the future

    Artificial Intelligence in Oncology Drug Discovery and Development

    Get PDF
    There exists a profound conflict at the heart of oncology drug development. The efficiency of the drug development process is falling, leading to higher costs per approved drug, at the same time personalised medicine is limiting the target market of each new medicine. Even as the global economic burden of cancer increases, the current paradigm in drug development is unsustainable. In this book, we discuss the development of techniques in machine learning for improving the efficiency of oncology drug development and delivering cost-effective precision treatment. We consider how to structure data for drug repurposing and target identification, how to improve clinical trials and how patients may view artificial intelligence

    The role of common genetic variants for predicting the modulation of cardiovascular outcomes

    Get PDF
    Attrition is a major issue in the drug development process with 79% of clinical failures due to safety and efficacy concerns. Genetic research can provide supporting evidence of a clear causal relationship between the drug target and disease or reveal unintended effects through associations with non-relevant phenotypes informing on potential drug safety. However, due to the underlying genetic architecture, it is often unclear which gene or variant in the loci identified through genetic analyses is driving the association. Due to recent advancements in CRISPR-Cas9 gene-editing, it is now possible to relatively easily perform whole gene knock-out studies and single base-edits to validate genetic findings of the most likely causal variant and gene. Utilising a combination of genetic approaches and functional studies can provide supporting evidence of the therapeutic profile and potential effects of drug therapies and improve our overall understanding of biological pathways and disease mechanisms. The primary aim of this thesis is to provide genetic data to support the ongoing clinical development of hypoxia-inducible factor (HIF)-prolyl hydroxylase inhibitors (PHIs) for treating anaemia of chronic kidney disease (CKD). Genome-wide association studies (GWAS) were used to identify genetic variants lying within or nearby genes encoding the drug target (prolyl hydroxylase [PHD] enzymes). These identified variants were used in Mendelian Randomisation analysis and phenome-wide association studies to genetically mirror the pharmaceutical effects of PHIs and investigate cardiovascular safety. Functional validation studies were employed to functionally validate a genetic variant for use as a proxy and to obtain a better understanding of the downstream causal pathways and biological mechanisms of the drug target. In summary, this thesis demonstrates how a combination of genetic analyses and functional validation studies is a powerful approach to validate GWAS results and further characterise therapeutic effects. This PhD project identified relevant genetic markers to genetically proxy therapeutic modulation of biomarker levels through PHD inhibition and could potentially inform further research using patient-level clinical data from Phase III trials

    Development and Evaluation of ADME Models Using Proprietary and Opensource Data

    Get PDF
    Absorption, Distribution, Metabolism and Elimination (ADME) properties are important factors in the drug discovery pipeline. Literature ADME data are often collected in large chemical databases like ChEMBL, which might be an asset to improve the prediction of ADME properties. Pharmaceutical companies build ADME Quantitative Structure Property Relationships (QSPR) models using proprietary data and thus the inclusion of literature data might be a valuable source for the development of predictive models. The aim of this study was to investigate whether merging literature and proprietary data could improve the predictive activity of proprietary models and enlarge their applicability domain (AD). ADME predictive models for Caco-2 (A to B) permeability and LogD7.4 were built with data extracted from Evotec and ChEMBL database. Predictive models were developed for each property and three different training sets were used based on: proprietary compounds (Evotec models), literature compounds (ChEMBL models) and a merged set of proprietary and literature compounds (Evotec+ChEMBL models). The Random Forest (RF), Partial Least Squares (PLS) and Support Vector Regression (SVR) were used to develop the models. The performance of the models was evaluated by using two types of test sets: a diverse test set (20 % compounds of available data randomly selected) and a temporal test set (data published after the models were built). The descriptors that used were the physiochemical descriptors, the structural Molecular Access System (MACCS) descriptors and the Partial equalisation of orbital electronegativity – van der Walls surface areas (Peoe-VSA) descriptors. The AD of the models was evaluated with four distance to model metrics, which were the: kNN with Euclidean distance, kNN with Manhattan distance, Leverage and Mahalanobis distance. The ability of an existing Evotec Caco-2 permeability model to assess literature compounds (extracted from ChEMBL) was evaluated. The literature test set was predicted with a higher RMSE compared to the RMSE in prediction for internal compounds. Additionally, a number of literature compounds was found to be outside the AD of the Evotec model, thus highlighting an area of improvement for proprietary Evotec models. Furthermore, the effect of the inclusion of literature data in the existing Caco-2 permeability and LogD7.4 Evotec proprietary models was evaluated. The RF algorithm was the highest performing method for the development of Caco-2 permeability models and the SVR for the LogD7.4 models. In addition, the leverage method proved to be the most appropriate for the evaluation of the models’ AD. The permeability model built merging literature and proprietary data (Evotec+ChEMBL model) predicted a literature temporal test set with an RMSE of 0.68 while the Evotec model showed an RMSE of 0.74. Even in the case of the Evotec temporal test set, the two models performed similarly and the AD of the mixed models (incorporating both literature and proprietary data) was enlarged. The 86.15% of the compounds in the proprietary temporal test set were within the AD of the Evotec+ChEMBL model, while 76.50% of the compounds of the same test set appeared to be within the AD of the Evotec model. Similarly, the LogD7.4 Evotec+ChEMBL model predicted a literature temporal test set with an RMSE of 0.77 while the Evotec model showed an RMSE of 0.83. Even in the case of the Evotec temporal test set, the two models performed similarly but the AD of the mixed models (incorporating both literature and proprietary data) was enlarged. The 94.86% of the compounds in the proprietary temporal test set were within the AD of the Evotec+ChEMBL model, while 88.49% of the compounds of the same test set appeared to be within the AD of the Evotec model. This study demonstrated that the inclusion of public ADME data into proprietary models improved the performance of proprietary models and enlarged at the same time their AD. The methodology presented herein will be applied by Evotec computational scientists to re-build the Caco-2 and LogD7.4 Evotec proprietary models considering literature data as discussed in this thesis
    corecore