126 research outputs found

    Methods for the Analysis of Matched Molecular Pairs and Chemical Space Representations

    Get PDF
    Compound optimization is a complex process where different properties are optimized to increase the biological activity and therapeutic effects of a molecule. Frequently, the structure of molecules is modified in order to improve their property values. Therefore, computational analysis of the effects of structure modifications on property values is of great importance for the drug discovery process. It is also essential to analyze chemical space, i.e., the set of all chemically feasible molecules, in order to find subsets of molecules that display favorable property values. This thesis aims to expand the computational repertoire to analyze the effect of structure alterations and visualize chemical space. Matched molecular pairs are defined as pairs of compounds that share a large common substructure and only differ by a small chemical transformation. They have been frequently used to study property changes caused by structure modifications. These analyses are expanded in this thesis by studying the effect of chemical transformations on the ionization state and ligand efficiency, both measures of great importance in drug design. Additionally, novel matched molecular pairs based on retrosynthetic rules are developed to increase their utility for prospective use of chemical transformations in compound optimization. Further, new methods based on matched molecular pairs are described to obtain preliminary SAR information of screening hit compounds and predict the potency change caused by a chemical transformation. Visualizations of chemical space are introduced to aid compound optimization efforts. First, principal component plots are used to rationalize a matched molecular pair based multi-objective compound optimization procedure. Then, star coordinate and parallel coordinate plots are introduced to analyze drug-like subspaces, where compounds with favorable property values can be found. Finally, a novel network-based visualization of high-dimensional property space is developed. Concluding, the applications developed in this thesis expand the methodological spectrum of computer-aided compound optimization

    Turbocharging Matched Molecular Pair Analysis: Optimizing the Identification and Analysis of Pairs.

    Get PDF
    We have applied the two most commonly used methods for automatic matched pair identification, obtained the optimum settings, and discovered that the two methods are synergistic. A turbocharging approach to matched pair analysis is advocated in which a first round (a conservative categorical approach that uses an analogy with coin flips, heads corresponding to an increase in a measured property, tails to a decrease, and a biased coin to a structural change that reliably causes a change in that property) provides the settings for a second round (which uses the magnitude of the change in properties). Increased chemical specificity allows reliable knowledge to be extracted from smaller sets of pairs, and an assay-specific upper limit can be placed on the number of pairs required before adequate sampling of variability has been achieved

    Distance Measures in Bioinformatics

    Get PDF
    Many bioinformatics applications rely on the computation of similarities between objects. Distance and similarity measures applied to vectors of characteristics are essential to problems such as classification, clustering and information retrieval. This study explores the usefulness of distance and similarity measures in several bioinformatics applications. These applications are in two categories. (1) Estimation of the adverse reaction severity of unknown pharmaceutical treatments, based on the severity of known treatments, in order to provide guidance for testing of the unknown treatments in clinical trials. (2) Classification of cancer tissue types and estimation of cancer stages, based on high-dimensional microarray data, in order to support clinical decisions making. To address the first category, we studied several clustering and classification approaches for binary severity estimation of Cytokine Release Syndrome (CRS). We developed a Severity Estimation using Distance Metric Learning (SE-DML) approach to get graded severity estimation. With binary estimation we were able to identify treatments that caused the most severe response and then built prediction models for CRS. Using the SE-DML approach, we evaluated four known data sets and showed that SE-DML outperformed other widely used methods on these data sets. For the second category, we presented Kernelized Information-Theoretic Metric Learning (KITML) algorithms that optimize distance metrics and effectively handle high-dimensional data. This learned metric by KITML is used to improve the performance of kk-nearest neighbor classification for cancer tissue microarray data. We evaluated our approach on fourteen (14) cancer microarray data sets and compared our results with other state-of-the-art approaches. We achieved the best overall performance for the classification task. In addition we tested the KITML algorithm in estimating the severity stages of cancer samples, with accurate results.Ph.D., Electrical Engineering -- Drexel University, 201

    Evaluation of the availability and applicability of computational approaches in the safety assessment of nanomaterials: Final report of the Nanocomput project

    Get PDF
    This is the final report of the Nanocomput project, the main aims of which were to review the current status of computational methods that are potentially useful for predicting the properties of engineered nanomaterials, and to assess their applicability in order to provide advice on the use of these approaches for the purposes of the REACH regulation. Since computational methods cover a broad range of models and tools, emphasis was placed on Quantitative Structure-Property Relationship (QSPR) and Quantitative Structure-Activity Relationship (QSAR) models, and their potential role in predicting NM properties. In addition, the status of a diverse array of compartment-based mathematical models was assessed. These models comprised toxicokinetic (TK), toxicodynamic (TD), in vitro and in vivo dosimetry, and environmental fate models. Finally, based on systematic reviews of the scientific literature, as well as the outputs of the EU-funded research projects, recommendations for further research and development were also made. The Nanocomput project was carried out by the European Commission’s Joint Research Centre (JRC) for the Directorate-General (DG) for Internal Market, Industry, Entrepreneurship and SMEs (DG GROW) under the terms of an Administrative Arrangement between JRC and DG GROW. The project lasted 39 months, from January 2014 to March 2017, and was supported by a steering group with representatives from DG GROW, DG Environment and the European Chemicals Agency (ECHA).JRC.F.3-Chemicals Safety and Alternative Method

    Molecular Similarity and Xenobiotic Metabolism

    Get PDF
    MetaPrint2D, a new software tool implementing a data-mining approach for predicting sites of xenobiotic metabolism has been developed. The algorithm is based on a statistical analysis of the occurrences of atom centred circular fingerprints in both substrates and metabolites. This approach has undergone extensive evaluation and been shown to be of comparable accuracy to current best-in-class tools, but is able to make much faster predictions, for the first time enabling chemists to explore the effects of structural modifications on a compound’s metabolism in a highly responsive and interactive manner.MetaPrint2D is able to assign a confidence score to the predictions it generates, based on the availability of relevant data and the degree to which a compound is modelled by the algorithm.In the course of the evaluation of MetaPrint2D a novel metric for assessing the performance of site of metabolism predictions has been introduced. This overcomes the bias introduced by molecule size and the number of sites of metabolism inherent to the most commonly reported metrics used to evaluate site of metabolism predictions.This data mining approach to site of metabolism prediction has been augmented by a set of reaction type definitions to produce MetaPrint2D-React, enabling prediction of the types of transformations a compound is likely to undergo and the metabolites that are formed. This approach has been evaluated against both historical data and metabolic schemes reported in a number of recently published studies. Results suggest that the ability of this method to predict metabolic transformations is highly dependent on the relevance of the training set data to the query compounds.MetaPrint2D has been released as an open source software library, and both MetaPrint2D and MetaPrint2D-React are available for chemists to use through the Unilever Centre for Molecular Science Informatics website.----Boehringer-Ingelhie

    Uncertainty estimation for QSAR models using machine learning methods

    Get PDF
    • …
    corecore