614 research outputs found

    A survey of across-target bioactivity results of small molecules in PubChem

    Get PDF
    This work provides an analysis of across-target bioactivity results in the screening data deposited in PubChem. Two alternative approaches for grouping-related targets are used to examine a compound's across-target bioactivity. This analysis identifies compounds that are selectively active against groups of protein targets that are identical or similar in sequence. This analysis also identifies compounds that are bioactive across unrelated targets. Statistical distributions of compound' across-target selectivity provide a survey to evaluate target specificity of compounds by deriving and analyzing bioactivity profile across a wide range of biological targets for tested small molecules in PubChem. This work enables one to select target specific inhibitors, identify promiscuous compounds and better understand the biological mechanisms of target-small molecule interactions

    Probabilistic Random Forest improves bioactivity predictions close to the classification threshold by taking into account experimental uncertainty.

    Get PDF
    Measurements of protein-ligand interactions have reproducibility limits due to experimental errors. Any model based on such assays will consequentially have such unavoidable errors influencing their performance which should ideally be factored into modelling and output predictions, such as the actual standard deviation of experimental measurements (σ) or the associated comparability of activity values between the aggregated heterogenous activity units (i.e., Ki versus IC50 values) during dataset assimilation. However, experimental errors are usually a neglected aspect of model generation. In order to improve upon the current state-of-the-art, we herein present a novel approach toward predicting protein-ligand interactions using a Probabilistic Random Forest (PRF) classifier. The PRF algorithm was applied toward in silico protein target prediction across ~ 550 tasks from ChEMBL and PubChem. Predictions were evaluated by taking into account various scenarios of experimental standard deviations in both training and test sets and performance was assessed using fivefold stratified shuffled splits for validation. The largest benefit in incorporating the experimental deviation in PRF was observed for data points close to the binary threshold boundary, when such information was not considered in any way in the original RF algorithm. For example, in cases when σ ranged between 0.4-0.6 log units and when ideal probability estimates between 0.4-0.6, the PRF outperformed RF with a median absolute error margin of ~ 17%. In comparison, the baseline RF outperformed PRF for cases with high confidence to belong to the active class (far from the binary decision threshold), although the RF models gave errors smaller than the experimental uncertainty, which could indicate that they were overtrained and/or over-confident. Finally, the PRF models trained with putative inactives decreased the performance compared to PRF models without putative inactives and this could be because putative inactives were not assigned an experimental pXC50 value, and therefore they were considered inactives with a low uncertainty (which in practice might not be true). In conclusion, PRF can be useful for target prediction models in particular for data where class boundaries overlap with the measurement uncertainty, and where a substantial part of the training data is located close to the classification threshold

    BindingDB in 2015: A public database for medicinal chemistry, computational chemistry and systems pharmacology.

    Get PDF
    BindingDB, www.bindingdb.org, is a publicly accessible database of experimental protein-small molecule interaction data. Its collection of over a million data entries derives primarily from scientific articles and, increasingly, US patents. BindingDB provides many ways to browse and search for data of interest, including an advanced search tool, which can cross searches of multiple query types, including text, chemical structure, protein sequence and numerical affinities. The PDB and PubMed provide links to data in BindingDB, and vice versa; and BindingDB provides links to pathway information, the ZINC catalog of available compounds, and other resources. The BindingDB website offers specialized tools that take advantage of its large data collection, including ones to generate hypotheses for the protein targets bound by a bioactive compound, and for the compounds bound by a new protein of known sequence; and virtual compound screening by maximal chemical similarity, binary kernel discrimination, and support vector machine methods. Specialized data sets are also available, such as binding data for hundreds of congeneric series of ligands, drawn from BindingDB and organized for use in validating drug design methods. BindingDB offers several forms of programmatic access, and comes with extensive background material and documentation. Here, we provide the first update of BindingDB since 2007, focusing on new and unique features and highlighting directions of importance to the field as a whole

    Artificial intelligence, machine learning, and drug repurposing in cancer

    Get PDF
    Introduction: Drug repurposing provides a cost-effective strategy to re-use approved drugs for new medical indications. Several machine learning (ML) and artificial intelligence (AI) approaches have been developed for systematic identification of drug repurposing leads based on big data resources, hence further accelerating and de-risking the drug development process by computational means. Areas covered: The authors focus on supervised ML and AI methods that make use of publicly available databases and information resources. While most of the example applications are in the field of anticancer drug therapies, the methods and resources reviewed are widely applicable also to other indications including COVID-19 treatment. A particular emphasis is placed on the use of comprehensive target activity profiles that enable a systematic repurposing process by extending the target profile of drugs to include potent off-targets with therapeutic potential for a new indication. Expert opinion: The scarcity of clinical patient data and the current focus on genetic aberrations as primary drug targets may limit the performance of anticancer drug repurposing approaches that rely solely on genomics-based information. Functional testing of cancer patient cells exposed to a large number of targeted therapies and their combinations provides an additional source of repurposing information for tissue-aware AI approaches.Peer reviewe

    C-SPADE : a web-tool for interactive analysis and visualization of drug screening experiments through compound-specific bioactivity dendrograms

    Get PDF
    The advent of polypharmacology paradigm in drug discovery calls for novel chemoinformatic tools for analyzing compounds' multi-targeting activities. Such tools should provide an intuitive representation of the chemical space through capturing and visualizing underlying patterns of compound similarities linked to their polypharmacological effects. Most of the existing compound-centric chemoinformatics tools lack interactive options and user interfaces that are critical for the real-time needs of chemical biologists carrying out compound screening experiments. Toward that end, we introduce C-SPADE, an open-source exploratory web-tool for interactive analysis and visualization of drug profiling assays (biochemical, cell-based or cell-free) using compound-centric similarity clustering. C-SPADE allows the users to visually map the chemical diversity of a screening panel, explore investigational compounds in terms of their similarity to the screening panel, perform polypharmacological analyses and guide drug-target interaction predictions. C-SPADE requires only the raw drug profiling data as input, and it automatically retrieves the structural information and constructs the compound clusters in real-time, thereby reducing the time required for manual analysis in drug development or repurposing applications. The web-tool provides a customizable visual workspace that can either be downloaded as figure or Newick tree file or shared as a hyperlink with other users. C-SPADE is freely available at http://cspade.fimm.fi/.Peer reviewe

    NETWORK INFERENCE DRIVEN DRUG DISCOVERY

    Get PDF
    The application of rational drug design principles in the era of network-pharmacology requires the investigation of drug-target and target-target interactions in order to design new drugs. The presented research was aimed at developing novel computational methods that enable the efficient analysis of complex biomedical data and to promote the hypothesis generation in the context of translational research. The three chapters of the Dissertation relate to various segments of drug discovery and development process. The first chapter introduces the integrated predictive drug discovery platform „SmartGraph”. The novel collaborative-filtering based algorithm „Target Based Recommender (TBR)” was developed in the framework of this project and was validated on a set of 28,270 experimentally determined bioactivity data points involving 1,882 compounds and 869 targets. The TBR is integrated into the SmartGraph platform. The graphical interface of SmartGraph enables data analysis and hypothesis generation even for investigators without substantial bioinformatics knowledge. The platform can be utilized in the context of target identification, drug-target prediction and drug repurposing. The second chapter of the Dissertation introduces an information theory inspired dynamic network model and the novel “Luminosity Diffusion (LD)” algorithm. The model can be utilized to prioritize protein targets for drug discovery purposes on the basis of available information and the importance of the targets. The importance of targets is accounted for in the information flow simulation process and is derived merely from network topology. The LD algorithm was validated on 8,010 relations of 794 proteins extracted from the Target Central Resource Database developed in the framework of the “Illuminating the Druggable Genome” project. The last chapter discusses a fundamental problem pertaining to the generation of similarity network of molecules and their clustering. The network generation process relies on the selection of a similarity threshold. The presented work introduces a network topology based systematic solution for selecting this threshold so that the likelihood of a reasonable clustering can be increased. Furthermore, the work proposes a solution for generating so-called “pseudo-reference clustering” for large molecular data sets for performance evaluation purposes. The results of this chapter are applicable in the lead identification and development processes

    STITCH 2: an interaction network database for small molecules and proteins

    Get PDF
    Over the last years, the publicly available knowledge on interactions between small molecules and proteins has been steadily increasing. To create a network of interactions, STITCH aims to integrate the data dispersed over the literature and various databases of biological pathways, drug–target relationships and binding affinities. In STITCH 2, the number of relevant interactions is increased by incorporation of BindingDB, PharmGKB and the Comparative Toxicogenomics Database. The resulting network can be explored interactively or used as the basis for large-scale analyses. To facilitate links to other chemical databases, we adopt InChIKeys that allow identification of chemicals with a short, checksum-like string. STITCH 2.0 connects proteins from 630 organisms to over 74 000 different chemicals, including 2200 drugs. STITCH can be accessed at http://stitch.embl.de/

    Challenges of connecting chemistry to pharmacology: perspectives from curating the IUPHAR/BPS Guide to PHARMACOLOGY

    Get PDF
    Connecting chemistry to pharmacology (c2p) has been an objective of GtoPdb and its precursor IUPHAR-DB since 2003. This has been achieved by populating our database with expert-curated relationships between documents, assays, quantitative results, chemical structures, their locations within the documents and the protein targets in the assays (D-A-R-C-P). A wide range of challenges associated with this are described in this perspective, using illustrative examples from GtoPdb entries. Our selection process begins with judgements of pharmacological relevance and scientific quality. Even though we have a stringent focus for our small-data extraction we note that assessing the quality of papers has become more difficult over the last 15 years. We discuss ambiguity issues with the resolution of authors’ descriptions of A-R-C-P entities to standardised identifiers. We also describe developments that have made this somewhat easier over the same period both in the publication ecosystem as well as enhancements of our internal processes over recent years. This perspective concludes with a look at challenges for the future including the wider capture of mechanistic nuances and possible impacts of text mining on automated entity extractio
    corecore