572 research outputs found

    Quantitative assessment of the expanding complementarity between public and commercial databases of bioactive compounds

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Since 2004 public cheminformatic databases and their collective functionality for exploring relationships between compounds, protein sequences, literature and assay data have advanced dramatically. In parallel, commercial sources that extract and curate such relationships from journals and patents have also been expanding. This work updates a previous comparative study of databases chosen because of their bioactive content, availability of downloads and facility to select informative subsets.</p> <p>Results</p> <p>Where they could be calculated, extracted compounds-per-journal article were in the range of 12 to 19 but compound-per-protein counts increased with document numbers. Chemical structure filtration to facilitate standardised comparisons typically reduced source counts by between 5% and 30%. The pair-wise overlaps between 23 databases and subsets were determined, as well as changes between 2006 and 2008. While all compound sets have increased, PubChem has doubled to 14.2 million. The 2008 comparison matrix shows not only overlap but also unique content across all sources. Many of the detailed differences could be attributed to individual strategies for data selection and extraction. While there was a big increase in patent-derived structures entering PubChem since 2006, GVKBIO contains over 0.8 million unique structures from this source. Venn diagrams showed extensive overlap between compounds extracted by independent expert curation from journals by GVKBIO, WOMBAT (both commercial) and BindingDB (public) but each included unique content. In contrast, the approved drug collections from GVKBIO, MDDR (commercial) and DrugBank (public) showed surprisingly low overlap. Aggregating all commercial sources established that while 1 million compounds overlapped with PubChem 1.2 million did not.</p> <p>Conclusion</p> <p>On the basis of chemical structure content <it>per se </it>public sources have covered an increasing proportion of commercial databases over the last two years. However, commercial products included in this study provide links between compounds and information from patents and journals at a larger scale than current public efforts. They also continue to capture a significant proportion of unique content. Our results thus demonstrate not only an encouraging overall expansion of data-supported bioactive chemical space but also that both commercial and public sources are complementary for its exploration.</p

    Analysis of in vitro bioactivity data extracted from drug discovery literature and patents: Ranking 1654 human protein targets by assayed compounds and molecular scaffolds

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Since the classic Hopkins and Groom druggable genome review in 2002, there have been a number of publications updating both the hypothetical and successful human drug target statistics. However, listings of research targets that define the area between these two extremes are sparse because of the challenges of collating published information at the necessary scale. We have addressed this by interrogating databases, populated by expert curation, of bioactivity data extracted from patents and journal papers over the last 30 years.</p> <p>Results</p> <p>From a subset of just over 27,000 documents we have extracted a set of compound-to-target relationships for biochemical <it>in vitro </it>binding-type assay data for 1,736 human proteins and 1,654 gene identifiers. These are linked to 1,671,951 compound records derived from 823,179 unique chemical structures. The distribution showed a compounds-per-target average of 964 with a maximum of 42,869 (Factor Xa). The list includes non-targets, failed targets and cross-screening targets. The top-278 most actively pursued targets cover 90% of the compounds. We further investigated target ranking by determining the number of molecular frameworks and scaffolds. These were compared to the compound counts as alternative measures of chemical diversity on a per-target basis.</p> <p>Conclusions</p> <p>The compounds-per-protein listing generated in this work (provided as a supplementary file) represents the major proportion of the human drug target landscape defined by published data. We supplemented the simple ranking by the number of compounds assayed with additional rankings by molecular topology. These showed significant differences and provide complementary assessments of chemical tractability.</p

    Examples of SAR-centric patent mining using open resources

    Get PDF

    The IUPHAR/BPS Guide to PHARMACOLOGY in 2016: towards curated quantitative interactions between 1300 protein targets and 6000 ligands

    Get PDF
    The IUPHAR/BPS Guide to PHARMACOLOGY (GtoPdb, http://www.guidetopharmacology.org) provides expert-curated molecular interactions between successful and potential drugs and their targets in the human genome. Developed by the International Union of Basic and Clinical Pharmacology (IUPHAR) and the British Pharmacological Society (BPS), this resource, and its earlier incarnation as IUPHAR-DB, is described in our 2014 publication. This update incorporates changes over the intervening seven database releases. The unique model of content capture is based on established and new target class subcommittees collaborating with in-house curators. Most information comes from journal articles, but we now also index kinase cross-screening panels. Targets are specified by UniProtKB IDs. Small molecules are defined by PubChem Compound Identifiers (CIDs); ligand capture also includes peptides and clinical antibodies. We have extended the capture of ligands and targets linked via published quantitative binding data (e.g. Ki, IC50 or Kd). The resulting pharmacological relationship network now defines a data-supported druggable genome encompassing 7% of human proteins. The database also provides an expanded substrate for the biennially published compendium, the Concise Guide to PHARMACOLOGY. This article covers content increase, entity analysis, revised curation strategies, new website features and expanded download options

    Chemoinformatic Expedition of the Chemical Space of Fungal Products

    Get PDF
    Aim: Fungi are valuable resources for bioactive secondary metabolites. However, the chemical space of fungal secondary metabolites has been studied only on a limited basis. Herein, we report a comprehensive chemoinformatic analysis of a unique set of 207 fungal metabolites isolated and characterized in a USA National Cancer Institute funded drug discovery project. Results: Comparison of the molecular complexity of the 207 fungal metabolites with approved anticancer and nonanticancer drugs, compounds in clinical studies, general screening compounds and molecules Generally Recognized as Safe revealed that fungal metabolites have high degree of complexity. Molecular fingerprints showed that fungal metabolites are as structurally diverse as other natural products and have, in general, drug-like physicochemical properties. Conclusion: Fungal products represent promising candidates to expand the medicinally relevant chemical space. This work is a significant expansion of an analysis reported years ago for a smaller set of compounds (less than half of the ones included in the present work) from filamentous fungi using different structural properties

    VB-MK-LMF: Fusion of drugs, targets and interactions using Variational Bayesian Multiple Kernel Logistic Matrix Factorization

    Get PDF
    Background Computational fusion approaches to drug-target interaction (DTI) prediction, capable of utilizing multiple sources of background knowledge, were reported to achieve superior predictive performance in multiple studies. Other studies showed that specificities of the DTI task, such as weighting the observations and focusing the side information are also vital for reaching top performance. Method We present Variational Bayesian Multiple Kernel Logistic Matrix Factorization (VB-MK-LMF), which unifies the advantages of (1) multiple kernel learning, (2) weighted observations, (3) graph Laplacian regularization, and (4) explicit modeling of probabilities of binary drug-target interactions. Results VB-MK-LMF achieves significantly better predictive performance in standard benchmarks compared to state-of-the-art methods, which can be traced back to multiple factors. The systematic evaluation of the effect of multiple kernels confirm their benefits, but also highlights the limitations of linear kernel combinations, already recognized in other fields. The analysis of the effect of prior kernels using varying sample sizes sheds light on the balance of data and knowledge in DTI tasks and on the rate at which the effect of priors vanishes. This also shows the existence of ``small sample size'' regions where using side information offers significant gains. Alongside favorable predictive performance, a notable property of MF methods is that they provide a unified space for drugs and targets using latent representations. Compared to earlier studies, the dimensionality of this space proved to be surprisingly low, which makes the latent representations constructed by VB-ML-LMF especially well-suited for visual analytics. The probabilistic nature of the predictions allows the calculation of the expected values of hits in functionally relevant sets, which we demonstrate by predicting drug promiscuity. The variational Bayesian approximation is also implemented for general purpose graphics processing units yielding significantly improved computational time. Conclusion In standard benchmarks, VB-MK-LMF shows significantly improved predictive performance in a wide range of settings. Beyond these benchmarks, another contribution of our work is highlighting and providing estimates for further pharmaceutically relevant quantities, such as promiscuity, druggability and total number of interactions. Availability Data and code are available at http://bioinformatics.mit.bme.hu

    Field-based Proteochemometric Models Derived from 3D Protein Structures : A Novel Approach to Visualize Affinity and Selectivity Features

    Get PDF
    Designing drugs that are selective is crucial in pharmaceutical research to avoid unwanted side effects. To decipher selectivity of drug targets, computational approaches that utilize the sequence and structural information of the protein binding pockets are frequently exploited. In addition to methods that rely only on protein information, quantitative approaches such as proteochemometrics (PCM) use the combination of protein and ligand descriptions to derive quantitative relationships with binding affinity. PCM aims to explain cross-interactions between the different proteins and ligands, hence facilitating our understanding of selectivity. The main goal of this dissertation is to develop and apply field-based PCM to improve the understanding of relevant molecular interactions through visual illustrations. Field-based description that depends on the 3D structural information of proteins enhances visual interpretability of PCM models relative to the frequently used sequence-based descriptors for proteins. In these field-based PCM studies, knowledge-based fields that explain polarity and lipophilicity of the binding pockets and WaterMap-derived fields that elucidate the positions and energetics of water molecules are used together with the various 2D / 3D ligand descriptors to investigate the selectivity profiles of kinases and serine proteases. Field-based PCM is first applied to protein kinases, for which designing selective inhibitors has always been a challenge, owing to their highly similar ATP binding pockets. Our studies show that the method could be successfully applied to pinpoint the regions influencing the binding affinity and selectivity of kinases. As an extension of the initial studies conducted on a set of 50 kinases and 80 inhibitors, field-based PCM was used to build classification models on a large dataset (95 kinases and 1572 inhibitors) to distinguish active from inactive ligands. The prediction of the bioactivities of external test set compounds or kinases with accuracies over 80% (Matthews correlation coefficient, MCC: ~0.50) and area under the ROC curve (AUC) above 0.8 together with the visual inspection of the regions promoting activity demonstrates the ability of field-based PCM to generate both predictive and visually interpretable models. Further, the application of this method to serine proteases provides an overview of the sub-pocket specificities, which is crucial for inhibitor design. Additionally, alignment-independent Zernike descriptors derived from fields were used in PCM models to study the influence of protein superimpositions on field comparisons and subsequent PCM modelling.Lääketutkimuksessa selektiivisten lääkeaineiden suunnittelu on ratkaisevan tärkeää haittavaikutusten välttämiseksi. Kohdeselektiivisyyden selvittämiseen käytetään usein tietokoneavusteisia menetelmiä, jotka hyödyntävät proteiinien sitoutumiskohtien sekvenssi- ja rakennetietoja. Proteiinilähtöisten menetelmien lisäksi kvantitatiiviset menetelmät kuten proteokemometria (proteochemometrics, PCM) yhdistävät sekä proteiinin että ligandin tietoja muodostaessaan kvantitatiivisen suhteen sitoutumisaffiniteettiin. PCM pyrkii selittämään eri proteiinien ja ligandien vuorovaikutuksia ja näin auttaa ymmärtämään selektiivisyyttä. Väitöstutkimuksen tavoitteena oli kehittää ja hyödyntää kenttäpohjaista proteokemometriaa, joka auttaa ymmärtämään relevantteja molekyylitasoisia vuorovaikutuksia visuaalisen esitystavan kautta. Proteiinin kolmiulotteisesta rakenteesta riippuva kenttäpohjainen kuvaus helpottaa PCM-mallien tulkintaa, etenkin usein käytettyihin sekvenssipohjaisiin kuvauksiin verrattuna. Näissä kenttäpohjaisissa PCM-mallinnuksissa käytettiin tietoperustaisia sitoutumistaskun polaarisuutta ja lipofiilisyyttä kuvaavia kenttiä ja WaterMap-ohjelman tuottamia vesimolekyylien sijaintia ja energiaa havainnollistavia kenttiä yhdessä lukuisten ligandia kuvaavien 2D- ja 3D-deskriptorien kanssa. Malleja sovellettiin kinaasien ja seriiniproteaasien selektiivisyysprofiilien tutkimukseen. Tutkimuksen ensimmäisessä osassa kenttäpohjaista PCM-mallinnusta sovellettiin proteiinikinaaseihin, joille selektiivisten inhibiittorien suunnittelu on haastavaa samankaltaisten ATP sitoutumistaskujen takia. Tutkimuksemme osoitti menetelmän soveltuvan kinaasien sitoutumisaffiniteettia ja selektiivisyyttä ohjaavien alueiden osoittamiseen. Jatkona 50 kinaasia ja 80 inhibiittoria käsittäneelle alkuperäiselle tutkimukselle rakensimme kenttäpohjaisia PCM-luokittelumalleja suuremmalle joukolle kinaaseja (95) ja inhibiittoreita (1572) erotellaksemme aktiiviset ja inaktiiviset ligandit toisistaan. Ulkoisen testiyhdiste- tai testikinaasijoukon bioaktiivisuuksien ennustaminen yli 80 % tarkkuudella (Matthews korrelaatiokerroin, MCC noin 0,50) ja ROC-käyrän alle jäävä ala (AUC) yli 0,8 yhdessä aktiivisuutta tukevien alueiden visuaalisen tarkastelun kanssa osoittivat kenttäpohjaisen PCM:n pystyvän tuottamaan sekä ennustavia että visuaalisesti ymmärrettäviä malleja. Tutkimuksen toisessa osassa metodin soveltaminen seriiniproteaaseihin tuotti yleisnäkemyksen sitoutumistaskun eri osien spesifisyyksistä, mikä on ensiarvoisen tärkeää inhibiittorien suunnittelulle. Lisäksi kentistä johdettuja, proteiinien päällekkäinasettelusta riippumattomia Zernike-deskriptoreita hyödynnettiin PCM-malleissa arvioidaksemme proteiinien päällekkäinasettelun vaikutusta kenttien vertailuun ja sen jälkeiseen PCM-mallinnukseen

    QSAR-Based Virtual Screening: Advances and Applications in Drug Discovery

    Get PDF
    Virtual screening (VS) has emerged in drug discovery as a powerful computational approach to screen large libraries of small molecules for new hits with desired properties that can then be tested experimentally. Similar to other computational approaches, VS intention is not to replace in vitro or in vivo assays, but to speed up the discovery process, to reduce the number of candidates to be tested experimentally, and to rationalize their choice. Moreover, VS has become very popular in pharmaceutical companies and academic organizations due to its time-, cost-, resources-, and labor-saving. Among the VS approaches, quantitative structure–activity relationship (QSAR) analysis is the most powerful method due to its high and fast throughput and good hit rate. As the first preliminary step of a QSAR model development, relevant chemogenomics data are collected from databases and the literature. Then, chemical descriptors are calculated on different levels of representation of molecular structure, ranging from 1D to nD, and then correlated with the biological property using machine learning techniques. Once developed and validated, QSAR models are applied to predict the biological property of novel compounds. Although the experimental testing of computational hits is not an inherent part of QSAR methodology, it is highly desired and should be performed as an ultimate validation of developed models. In this mini-review, we summarize and critically analyze the recent trends of QSAR-based VS in drug discovery and demonstrate successful applications in identifying perspective compounds with desired properties. Moreover, we provide some recommendations about the best practices for QSAR-based VS along with the future perspectives of this approach

    The re-emergence of natural products for drug discovery in the genomics era

    Get PDF
    Natural products have been a rich source of compounds for drug discovery. However, their use has diminished in the past two decades, in part because of technical barriers to screening natural products in high-throughput assays against molecular targets. Here, we review strategies for natural product screening that harness the recent technical advances that have reduced these barriers. We also assess the use of genomic and metabolomic approaches to augment traditional methods of studying natural products, and highlight recent examples of natural products in antimicrobial drug discovery and as inhibitors of protein-protein interactions. The growing appreciation of functional assays and phenotypic screens may further contribute to a revival of interest in natural products for drug discovery

    Objective, Quantitative, Data-Driven Assessment of Chemical Probes.

    Get PDF
    Chemical probes are essential tools for understanding biological systems and for target validation, yet selecting probes for biomedical research is rarely based on objective assessment of all potential compounds. Here, we describe the Probe Miner: Chemical Probes Objective Assessment resource, capitalizing on the plethora of public medicinal chemistry data to empower quantitative, objective, data-driven evaluation of chemical probes. We assess >1.8 million compounds for their suitability as chemical tools against 2,220 human targets and dissect the biases and limitations encountered. Probe Miner represents a valuable resource to aid the identification of potential chemical probes, particularly when used alongside expert curation
    corecore