13 research outputs found

    Supplementary_File_13.zip

    No full text
    Supplemental files 13-15 for "Computer-Aided Discovery of Novel Ebola Virus Inhibitors" manuscript.<br>These files contain all QSAR models from Chembench, HiT QSAR, and GUSAR<br

    Phantom PAINS: Problems with the Utility of Alerts for Pan‑Assay INterference CompoundS

    Get PDF
    The use of substructural alerts to identify <b>P</b>an-<b>A</b>ssay <b>IN</b>terference compound<b>S</b> (PAINS) has become a common component of the triage process in biological screening campaigns. These alerts, however, were originally derived from a proprietary library tested in just six assays measuring protein–protein interaction (PPI) inhibition using the AlphaScreen detection technology only; moreover, 68% (328 out of the 480 alerts) were derived from four or fewer compounds. In an effort to assess the reliability of these alerts as indicators of pan-assay interference, we performed a large-scale analysis of the impact of PAINS alerts on compound promiscuity in bioassays using publicly available data in PubChem. We found that the majority (97%) of all compounds containing PAINS alerts were actually infrequent hitters in AlphaScreen assays measuring PPI inhibition. We also found that the presence of PAINS alerts, contrary to expectations, did not reflect any heightened assay activity trends across all assays in PubChem including AlphaScreen, luciferase, beta-lactamase, or fluorescence-based assays. In addition, 109 PAINS alerts were present in 3570 extensively assayed, but consistently inactive compounds called Dark Chemical Matter. Finally, we observed that 87 small molecule FDA-approved drugs contained PAINS alerts and profiled their bioassay activity. Based on this detailed analysis of PAINS alerts in nonproprietary compound libraries, we caution against the blind use of PAINS filters to detect and triage compounds with possible PAINS liabilities and recommend that such conclusions should be drawn only by conducting orthogonal experiments

    CHEMINFORMATICS: AN INTRODUCTION

    No full text
    <p></p><p>Cheminformatics is an interdisciplinary field between chemistry and informatics, which has evolved considerably since its inception in the 1960s. Initially, the cheminformatics community dealt primarily with practical and technical aspects of chemical structure representation, manipulation, and processing, while modern research explores a new role: the exploration and interpretation of large chemical databases and the discovery of new compounds with desired activity and safety profiles. Despite the recent release of several hallmark reviews addressing methods and application of cheminformatics written in Portuguese, so far there are no scientific articles presenting cheminformatics research to the Brazilian scientific community yet. To address this gap, we aim to introduce the field of cheminformatics to both students and researchers in a simple and didactic way by narrating important historical facts and contextualizing information within the scope of various applications.</p><p></p

    Does Rational Selection of Training and Test Sets Improve the Outcome of QSAR Modeling?

    No full text
    Prior to using a quantitative structure activity relationship (QSAR) model for external predictions, its predictive power should be established and validated. In the absence of a true external data set, the best way to validate the predictive ability of a model is to perform its statistical external validation. In statistical external validation, the overall data set is divided into training and test sets. Commonly, this splitting is performed using random division. Rational splitting methods can divide data sets into training and test sets in an intelligent fashion. The purpose of this study was to determine whether rational division methods lead to more predictive models compared to random division. A special data splitting procedure was used to facilitate the comparison between random and rational division methods. For each toxicity end point, the overall data set was divided into a modeling set (80% of the overall set) and an external evaluation set (20% of the overall set) using random division. The modeling set was then subdivided into a training set (80% of the modeling set) and a test set (20% of the modeling set) using rational division methods and by using random division. The Kennard-Stone, minimal test set dissimilarity, and sphere exclusion algorithms were used as the rational division methods. The hierarchical clustering, random forest, and <i>k</i>-nearest neighbor (<i>k</i>NN) methods were used to develop QSAR models based on the training sets. For <i>k</i>NN QSAR, multiple training and test sets were generated, and multiple QSAR models were built. The results of this study indicate that models based on rational division methods generate better statistical results for the test sets than models based on random division, but the predictive power of both types of models are comparable

    Pred-Skin: A Fast and Reliable Web Application to Assess Skin Sensitization Effect of Chemicals

    No full text
    Chemically induced skin sensitization is a complex immunological disease with a profound impact on quality of life and working ability. Despite some progress in developing alternative methods for assessing the skin sensitization potential of chemical substances, there is no in vitro test that correlates well with human data. Computational QSAR models provide a rapid screening approach and contribute valuable information for the assessment of chemical toxicity. We describe the development of a freely accessible web-based and mobile application for the identification of potential skin sensitizers. The application is based on previously developed binary QSAR models of skin sensitization potential from human (109 compounds) and murine local lymph node assay (LLNA, 515 compounds) data with good external correct classification rate (0.70–0.81 and 0.72–0.84, respectively). We also included a multiclass skin sensitization potency model based on LLNA data (accuracy ranging between 0.73 and 0.76). When a user evaluates a compound in the web app, the outputs are (i) binary predictions of human and murine skin sensitization potential; (ii) multiclass prediction of murine skin sensitization; and (iii) probability maps illustrating the predicted contribution of chemical fragments. The app is the first tool available that incorporates quantitative structure–activity relationship (QSAR) models based on human data as well as multiclass models for LLNA. The Pred-Skin web app version 1.0 is freely available for the web, iOS, and Android (in development) at the LabMol web portal (http://labmol.com.br/predskin/), in the Apple Store, and on Google Play, respectively. We will continuously update the app as new skin sensitization data and respective models become available

    Chembench: A Publicly Accessible, Integrated Cheminformatics Portal

    Get PDF
    The enormous increase in the amount of publicly available chemical genomics data and the growing emphasis on data sharing and open science mandates that cheminformaticians also make their models publicly available for broad use by the scientific community. Chembench is one of the first publicly accessible, integrated cheminformatics Web portals. It has been extensively used by researchers from different fields for curation, visualization, analysis, and modeling of chemogenomics data. Since its launch in 2008, Chembench has been accessed more than 1 million times by more than 5000 users from a total of 98 countries. We report on the recent updates and improvements that increase the simplicity of use, computational efficiency, accuracy, and accessibility of a broad range of tools and services for computer-assisted drug design and computational toxicology available on Chembench. Chembench remains freely accessible at https://chembench.mml.unc.ed

    Chemotext: A Publicly Available Web Server for Mining Drug–Target–Disease Relationships in PubMed

    No full text
    Elucidation of the mechanistic relationships between drugs, their targets, and diseases is at the core of modern drug discovery research. Thousands of studies relevant to the drug–target–disease (DTD) triangle have been published and annotated in the Medline/PubMed database. Mining this database affords rapid identification of all published studies that confirm connections between vertices of this triangle or enable new inferences of such connections. To this end, we describe the development of Chemotext, a publicly available Web server that mines the entire compendium of published literature in PubMed annotated by Medline Subject Heading (MeSH) terms. The goal of Chemotext is to identify all known DTD relationships and infer missing links between vertices of the DTD triangle. As a proof-of-concept, we show that Chemotext could be instrumental in generating new drug repurposing hypotheses or annotating clinical outcomes pathways for known drugs. The Chemotext Web server is freely available at http://chemotext.mml.unc.edu

    Multi-Descriptor Read Across (MuDRA): A Simple and Transparent Approach for Developing Accurate Quantitative Structure–Activity Relationship Models

    No full text
    Multiple approaches to quantitative structure–activity relationship (QSAR) modeling using various statistical or machine learning techniques and different types of chemical descriptors have been developed over the years. Oftentimes models are used in consensus to make more accurate predictions at the expense of model interpretation. We propose a simple, fast, and reliable method termed Multi-Descriptor Read Across (MuDRA) for developing both accurate and interpretable models. The method is conceptually related to the well-known kNN approach but uses different types of chemical descriptors simultaneously for similarity assessment. To benchmark the new method, we have built MuDRA models for six different end points (Ames mutagenicity, aquatic toxicity, hepatotoxicity, hERG liability, skin sensitization, and endocrine disruption) and compared the results with those generated with conventional consensus QSAR modeling. We find that models built with MuDRA show consistently high external accuracy similar to that of conventional QSAR models. However, MuDRA models excel in terms of transparency, interpretability, and computational efficiency. We posit that due to its methodological simplicity and reliable predictive accuracy, MuDRA provides a powerful alternative to a much more complex consensus QSAR modeling. MuDRA is implemented and freely available at the Chembench web portal (https://chembench.mml.unc.edu/mudra<i>)</i>

    QSAR Modeling and Prediction of Drug–Drug Interactions

    No full text
    Severe adverse drug reactions (ADRs) are the fourth leading cause of fatality in the U.S. with more than 100 000 deaths per year. As up to 30% of all ADRs are believed to be caused by drug–drug interactions (DDIs), typically mediated by cytochrome P450s, possibilities to predict DDIs from existing knowledge are important. We collected data from public sources on 1485, 2628, 4371, and 27 966 possible DDIs mediated by four cytochrome P450 isoforms 1A2, 2C9, 2D6, and 3A4 for 55, 73, 94, and 237 drugs, respectively. For each of these data sets, we developed and validated QSAR models for the prediction of DDIs. As a unique feature of our approach, the interacting drug pairs were represented as binary chemical mixtures in a 1:1 ratio. We used two types of chemical descriptors: quantitative neighborhoods of atoms (QNA) and simplex descriptors. Radial basis functions with self-consistent regression (RBF-SCR) and random forest (RF) were utilized to build QSAR models predicting the likelihood of DDIs for any pair of drug molecules. Our models showed balanced accuracy of 72–79% for the external test sets with a coverage of 81.36–100% when a conservative threshold for the model’s applicability domain was applied. We generated virtually all possible binary combinations of marketed drugs and employed our models to identify drug pairs predicted to be instances of DDI. More than 4500 of these predicted DDIs that were not found in our training sets were confirmed by data from the DrugBank database
    corecore