79 research outputs found

    A constructive approach for discovering new drug leads: Using a kernel methodology for the inverse-QSAR problem

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The inverse-QSAR problem seeks to find a new molecular descriptor from which one can recover the structure of a molecule that possess a desired activity or property. Surprisingly, there are very few papers providing solutions to this problem. It is a difficult problem because the molecular descriptors involved with the inverse-QSAR algorithm must adequately address the forward QSAR problem for a given biological activity if the subsequent recovery phase is to be meaningful. In addition, one should be able to construct a feasible molecule from such a descriptor. The difficulty of recovering the molecule from its descriptor is the major limitation of most inverse-QSAR methods.</p> <p>Results</p> <p>In this paper, we describe the reversibility of our previously reported descriptor, the vector space model molecular descriptor (VSMMD) based on a vector space model that is suitable for kernel studies in QSAR modeling. Our inverse-QSAR approach can be described using five steps: (1) generate the VSMMD for the compounds in the training set; (2) map the VSMMD in the input space to the kernel feature space using an appropriate kernel function; (3) design or generate a new point in the kernel feature space using a kernel feature space algorithm; (4) map the feature space point back to the input space of descriptors using a pre-image approximation algorithm; (5) build the molecular structure template using our VSMMD molecule recovery algorithm.</p> <p>Conclusion</p> <p>The empirical results reported in this paper show that our strategy of using kernel methodology for an inverse-Quantitative Structure-Activity Relationship is sufficiently powerful to find a meaningful solution for practical problems.</p

    定量的構造物性相関/定量的構造活性相関モデルの逆解析を利用した化学構造創出に関する研究

    Get PDF
    学位の種別: 課程博士審査委員会委員 : (主査)東京大学教授 船津 公人, 東京大学教授 酒井 康行, 東京大学准教授 杉山 弘和, 東京大学准教授 伊藤 大知, 京都大学特任教授 奧野 恭史, スイス連邦工科大学教授 Gisbert SchneiderUniversity of Tokyo(東京大学

    Building predictive unbound brain-to-plasma concentration ratio (Kp,uu,brain) models

    Get PDF
    Abstract The blood-brain barrier (BBB) constitutes a dynamic membrane primarily evolved to protect the brain from exposure to harmful xenobiotics. The distribution of synthesized drugs across the blood-brain barrier (BBB) is a vital parameter to consider in drug discovery projects involving a central nervous system (CNS) target, since the molecules should be capable of crossing the major hurdle, BBB. In contrast, the peripherally acting drugs have to be designed optimally to minimize brain exposure which could possibly result in undue side effects. It is thus important to establish the BBB permeability of molecules early in the drug discovery pipeline. Previously, most of the in-silico attempts for the prediction of brain exposure have relied on the total drug distribution between the blood plasma and the brain. However, it is now understood that the unbound brain-to-plasma concentration ratio ( Kp,uu,brain) is the parameter that precisely indicates the BBB availability of compounds. Kp,uu,brain describes the free drug concentration of the drug molecule in the brain, which, according to the free drug hypothesis, is the parameter that causes the relevant pharmacological response at the target site. Current work involves revisiting a model built in 2011 and uploaded in an in-house server and checking for its performance on the data collected since then. This gave a satisfying result showing the stability of the model. The old dataset was then further extended with the temporal dataset in order to update the model. This is important to maintain a substantial chemical space so as to ensure a good predictability with unknown data. Using other methods and descriptors not used in the previous study, a further improvement in the model performance was achieved. Attempts were also made in order to interpret the model by identifying the most influential descriptors in the model.Popular science summary: Predictive model for unbound brain-to-plasma concentration ratio Blood-brain barrier (BBB) is a dynamic interface evolved to protect the brain from exposure to toxic xenobiotics and to maintain homeostasis. Distribution of drugs across BBB is critical for any drug discovery project. A drug designed for a target in brain has to pass through the BBB in sufficient concentrations to elicit the desired therapeutic effect. On the other hand, a drug designed for a non-CNS target should be kept away from the brain to avoid fatal side effects. Unbound brain-to-plasma concentration ratio, Kp,uu,brain is a parameter that describes the distribution of a molecule across the BBB. It represents the free drug concentration in the brain, which is the fraction that elicits the pharmacological effect on the CNS. The experimental measurement of this parameter is time consuming and laborious. Computational prediction of such properties thus prove to be of a great utility in reducing the time and resources spent by aiding in the early elimination of compounds possessing undesirable qualities. This helps in reducing late stage compound attrition (failure rate) which has always been a major problem for pharmaceutical industries. Quantitative Structure Activity Relationship (QSAR) is an approach that attempts to establish a meaningful relationship between the chemical structure of a molecule and its chemical/biological activity. Once established, this relationship can be used to predict the activity of a new compound based on its chemical structure. In a typical QSAR experiment, the chemical structures are often represented in terms of numerical values called molecular descriptors. The thesis work utilized machine learning algorithm (Support Vector Machine and Random forest) to define the structure -activity relationship. A predictive model for estimating the unbound brain-to-plasma concentration ratio (Kp,uu,brain) was developed based on a training set of in-house compounds and was mounted in an in-house program (C-lab) in 2011 for routine use. The thesis project involved validating the existing model and updating the model by extending the dataset with the data collected since 2011. Different combinations of Machine Learning algorithms, modeling approaches and molecular descriptors (calculated numerical values representing of chemical structures) were used to build the models. Further, combining the prediction from these models, consensus models were built and validated. Two-class classification models were also evaluated based on categorizing compounds into BBB positive (crosses BBB) or negative (does not cross BBB). The validation of the old model using temporal test set (Kp,uu,brain data collected since 2011) gave a promising result showing stability and good predictive power. However, it is very important to keep the chemical space updated, which defines the purpose for updating the model. The new model (a consensus model with five components) shows a significant improvement in terms of the predictive power along with an improvement in the classification performance. This model will be uploaded to C-lab and will be accessible for use within AstraZeneca. Advisors: Hongming Chen, Ola Engkvist (Computational Chemistry, AstraZeneca R&D Mölndal) Master´s Degree Project 60 credits in Bioinformatics (2014) Department of Biology., Lund Universit

    New developments on the cheminformatics open workflow environment CDK-Taverna

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The computational processing and analysis of small molecules is at heart of cheminformatics and structural bioinformatics and their application in e.g. metabolomics or drug discovery. Pipelining or workflow tools allow for the Lego™-like, graphical assembly of I/O modules and algorithms into a complex workflow which can be easily deployed, modified and tested without the hassle of implementing it into a monolithic application. The CDK-Taverna project aims at building a free open-source cheminformatics pipelining solution through combination of different open-source projects such as Taverna, the Chemistry Development Kit (CDK) or the Waikato Environment for Knowledge Analysis (WEKA). A first integrated version 1.0 of CDK-Taverna was recently released to the public.</p> <p>Results</p> <p>The CDK-Taverna project was migrated to the most up-to-date versions of its foundational software libraries with a complete re-engineering of its worker's architecture (version 2.0). 64-bit computing and multi-core usage by paralleled threads are now supported to allow for fast in-memory processing and analysis of large sets of molecules. Earlier deficiencies like workarounds for iterative data reading are removed. The combinatorial chemistry related reaction enumeration features are considerably enhanced. Additional functionality for calculating a natural product likeness score for small molecules is implemented to identify possible drug candidates. Finally the data analysis capabilities are extended with new workers that provide access to the open-source WEKA library for clustering and machine learning as well as training and test set partitioning. The new features are outlined with usage scenarios.</p> <p>Conclusions</p> <p>CDK-Taverna 2.0 as an open-source cheminformatics workflow solution matured to become a freely available and increasingly powerful tool for the biosciences. The combination of the new CDK-Taverna worker family with the already available workflows developed by a lively Taverna community and published on myexperiment.org enables molecular scientists to quickly calculate, process and analyse molecular data as typically found in e.g. today's systems biology scenarios.</p

    Bioclipse-R: integrating management and visualization of life science data with statistical analysis

    Get PDF
    SUMMARY: Bioclipse, a graphical workbench for the life sciences, provides functionality for managing and visualizing life science data. We introduce Bioclipse-R, which integrates Bioclipse and the statistical programming language R. The synergy between Bioclipse and R is demonstrated by the construction of a decision support system for anticancer drug screening and mutagenicity prediction, which shows how Bioclipse-R can be used to perform complex tasks from within a single software system. Availability and implementation: Bioclipse-R is implemented as a set of Java plug-ins for Bioclipse based on the R-package rj. Source code and binary packages are available from https://github.com/bioclipse and http://www.bioclipse.net/bioclipse-r, respectively. CONTACT: [email protected] SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

    Semi-Supervised Sparse Coding

    Full text link
    Sparse coding approximates the data sample as a sparse linear combination of some basic codewords and uses the sparse codes as new presentations. In this paper, we investigate learning discriminative sparse codes by sparse coding in a semi-supervised manner, where only a few training samples are labeled. By using the manifold structure spanned by the data set of both labeled and unlabeled samples and the constraints provided by the labels of the labeled samples, we learn the variable class labels for all the samples. Furthermore, to improve the discriminative ability of the learned sparse codes, we assume that the class labels could be predicted from the sparse codes directly using a linear classifier. By solving the codebook, sparse codes, class labels and classifier parameters simultaneously in a unified objective function, we develop a semi-supervised sparse coding algorithm. Experiments on two real-world pattern recognition problems demonstrate the advantage of the proposed methods over supervised sparse coding methods on partially labeled data sets

    Kernel Methods in Computer-Aided Constructive Drug Design

    Get PDF
    A drug is typically a small molecule that interacts with the binding site of some target protein. Drug design involves the optimization of this interaction so that the drug effectively binds with the target protein while not binding with other proteins (an event that could produce dangerous side effects). Computational drug design involves the geometric modeling of drug molecules, with the goal of generating similar molecules that will be more effective drug candidates. It is necessary that algorithms incorporate strategies to measure molecular similarity by comparing molecular descriptors that may involve dozens to hundreds of attributes. We use kernel-based methods to define these measures of similarity. Kernels are general functions that can be used to formulate similarity comparisons. The overall goal of this thesis is to develop effective and efficient computational methods that are reliant on transparent mathematical descriptors of molecules with applications to affinity prediction, detection of multiple binding modes, and generation of new drug leads. While in this thesis we derive computational strategies for the discovery of new drug leads, our approach differs from the traditional ligandbased approach. We have developed novel procedures to calculate inverse mappings and subsequently recover the structure of a potential drug lead. The contributions of this thesis are the following: 1. We propose a vector space model molecular descriptor (VSMMD) based on a vector space model that is suitable for kernel studies in QSAR modeling. Our experiments have provided convincing comparative empirical evidence that our descriptor formulation in conjunction with kernel based regression algorithms can provide sufficient discrimination to predict various biological activities of a molecule with reasonable accuracy. 2. We present a new component selection algorithm KACS (Kernel Alignment Component Selection) based on kernel alignment for a QSAR study. Kernel alignment has been developed as a measure of similarity between two kernel functions. In our algorithm, we refine kernel alignment as an evaluation tool, using recursive component elimination to eventually select the most important components for classification. We have demonstrated empirically and proven theoretically that our algorithm works well for finding the most important components in different QSAR data sets. 3. We extend the VSMMD in conjunction with a kernel based clustering algorithm to the prediction of multiple binding modes, a challenging area of research that has been previously studied by means of time consuming docking simulations. The results reported in this study provide strong empirical evidence that our strategy has enough resolving power to distinguish multiple binding modes through the use of a standard k-means algorithm. 4. We develop a set of reverse engineering strategies for QSAR modeling based on our VSMMD. These strategies include: (a) The use of a kernel feature space algorithm to design or modify descriptor image points in a feature space. (b) The deployment of a pre-image algorithm to map the newly defined descriptor image points in the feature space back to the input space of the descriptors. (c) The design of a probabilistic strategy to convert new descriptors to meaningful chemical graph templates. The most important aspect of these contributions is the presentation of strategies that actually generate the structure of a new drug candidate. While the training set is still used to generate a new image point in the feature space, the reverse engineering strategies just described allows us to develop a new drug candidate that is independent of issues related to probability distribution constraints placed on test set molecules

    Molecular Similarity and Xenobiotic Metabolism

    Get PDF
    MetaPrint2D, a new software tool implementing a data-mining approach for predicting sites of xenobiotic metabolism has been developed. The algorithm is based on a statistical analysis of the occurrences of atom centred circular fingerprints in both substrates and metabolites. This approach has undergone extensive evaluation and been shown to be of comparable accuracy to current best-in-class tools, but is able to make much faster predictions, for the first time enabling chemists to explore the effects of structural modifications on a compound’s metabolism in a highly responsive and interactive manner.MetaPrint2D is able to assign a confidence score to the predictions it generates, based on the availability of relevant data and the degree to which a compound is modelled by the algorithm.In the course of the evaluation of MetaPrint2D a novel metric for assessing the performance of site of metabolism predictions has been introduced. This overcomes the bias introduced by molecule size and the number of sites of metabolism inherent to the most commonly reported metrics used to evaluate site of metabolism predictions.This data mining approach to site of metabolism prediction has been augmented by a set of reaction type definitions to produce MetaPrint2D-React, enabling prediction of the types of transformations a compound is likely to undergo and the metabolites that are formed. This approach has been evaluated against both historical data and metabolic schemes reported in a number of recently published studies. Results suggest that the ability of this method to predict metabolic transformations is highly dependent on the relevance of the training set data to the query compounds.MetaPrint2D has been released as an open source software library, and both MetaPrint2D and MetaPrint2D-React are available for chemists to use through the Unilever Centre for Molecular Science Informatics website.----Boehringer-Ingelhie

    Étude de la structure et des propriétés SAR/QSAR de quelques molécules à visée thérapeutique

    Get PDF
    Recently, a series of carbazole derivatives containing chalcone analogues (CDCAs) were synthetized as potent anticancer agents and apoptosis inducers. These compounds target the inhibition of topoisomerase II and present cytotoxic activities. After comparison to experiment, we validated the use of B3LYP, a density functional theory-based approach, to describe the structure and molecular properties of the carbazole subunit and CDCAs compounds of interest. Then, we derived relationships between the chemical descriptors and activity of these carbazole derivatives using multi parameter optimization and quantitative structure activity relationships (QSAR) approaches. For the QSAR studies, we used multiple linear regression and artificial neural network statistical modelling. Our predicted activities are in good agreement with the experimental ones. We found that the most important parameter influencing the activity of the considered compounds is the octanol water partition coefficient, highlighting the importance of flexibility as a key molecular parameter to favor cell membrane crossing and enhance the action of these CDCAs against topoisomerase II. Our results provide useful guidelines for designing new oral active CDCAs medicaments for cytotoxic inhibition
    corecore