116 research outputs found

    Affinity chromatography in dynamic combinatorial libraries: one-pot amplification and isolation of a strongly binding receptor

    Get PDF
    We report the one-pot amplification and isolation of a nanomolar receptor in a multibuilding block aqueous dynamic combinatorial library using a polymer-bound template. By appropriate choice of a poly(N,N-dimethylacrylamide)-based support, unselective ion-exchange type behaviour between the oppositely charged cationic guest and polyanionic hosts was overcome, such that the selective molecular recognition arising in aqueous solution reactions is manifest also in the analogous templated solid phase DCL syntheses. The ability of a polymer bound template to identify and isolate a synthetic receptor via dynamic combinatorial chemistry was not compromised by the large size of the library, consisting of well over 140 theoretical members, demonstrating the practical advantages of a polymer-supported DCL methodology

    Discovering patterns in drug-protein interactions based on their fingerprints

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The discovering of interesting patterns in drug-protein interaction data at molecular level can reveal hidden relationship among drugs and proteins and can therefore be of paramount importance for such application as drug design. To discover such patterns, we propose here a computational approach to analyze the molecular data of drugs and proteins that are known to have interactions with each other. Specifically, we propose to use a data mining technique called <it>Drug-Protein Interaction Analysis </it>(<it>D-PIA</it>) to determine if there are any commonalities in the fingerprints of the substructures of interacting drug and protein molecules and if so, whether or not any patterns can be generalized from them.</p> <p>Method</p> <p>Given a database of drug-protein interactions, <it>D-PIA </it>performs its tasks in several steps. First, for each drug in the database, the fingerprints of its molecular substructures are first obtained. Second, for each protein in the database, the fingerprints of its protein domains are obtained. Third, based on known interactions between drugs and proteins, an interdependency measure between the fingerprint of each drug substructure and protein domain is then computed. Fourth, based on the interdependency measure, drug substructures and protein domains that are significantly interdependent are identified. Fifth, the existence of interaction relationship between a previously unknown drug-protein pairs is then predicted based on their constituent substructures that are significantly interdependent.</p> <p>Results</p> <p>To evaluate the effectiveness of <it>D-PIA</it>, we have tested it with real drug-protein interaction data. <it>D-PIA </it>has been tested with real drug-protein interaction data including enzymes, ion channels, and protein-coupled receptors. Experimental results show that there are indeed patterns that one can discover in the interdependency relationship between drug substructures and protein domains of interacting drugs and proteins. Based on these relationships, a testing set of drug-protein data are used to see if <it>D-PIA </it>can correctly predict the existence of interaction between drug-protein pairs. The results show that the prediction accuracy can be very high. An AUC score of a ROC plot could reach as high as 75% which shows the effectiveness of this classifier.</p> <p>Conclusions</p> <p><it>D-PIA </it>has the advantage that it is able to perform its tasks effectively based on the fingerprints of drug and protein molecules without requiring any 3D information about their structures and <it>D-PIA </it>is therefore very fast to compute. <it>D-PIA </it>has been tested with real drug-protein interaction data and experimental results show that it can be very useful for predicting previously unknown drug-protein as well as protein-ligand interactions. It can also be used to tackle problems such as ligand specificity which is related directly and indirectly to drug design and discovery.</p

    Learning with multiple pairwise kernels for drug bioactivity prediction

    Get PDF
    Motivation: Many inference problems in bioinformatics, including drug bioactivity prediction, can be formulated as pairwise learning problems, in which one is interested in making predictions for pairs of objects, e.g. drugs and their targets. Kernel-based approaches have emerged as powerful tools for solving problems of that kind, and especially multiple kernel learning (MKL) offers promising benefits as it enables integrating various types of complex biomedical information sources in the form of kernels, along with learning their importance for the prediction task. However, the immense size of pairwise kernel spaces remains a major bottleneck, making the existing MKL algorithms computationally infeasible even for small number of input pairs. Results: We introduce pairwiseMKL, the first method for time- and memory-efficient learning with multiple pairwise kernels. pairwiseMKL first determines the mixture weights of the input pairwise kernels, and then learns the pairwise prediction function. Both steps are performed efficiently without explicit computation of the massive pairwise matrices, therefore making the method applicable to solving large pairwise learning problems. We demonstrate the performance of pairwiseMKL in two related tasks of quantitative drug bioactivity prediction using up to 167 995 bioactivity measurements and 3120 pairwise kernels: (i) prediction of anticancer efficacy of drug compounds across a large panel of cancer cell lines; and (ii) prediction of target profiles of anticancer compounds across their kinome-wide target spaces. We show that pairwiseMKL provides accurate predictions using sparse solutions in terms of selected kernels, and therefore it automatically identifies also data sources relevant for the prediction problem.Peer reviewe

    A comparison of machine learning algorithms for chemical toxicity classification using a simulated multi-scale data model

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Bioactivity profiling using high-throughput <it>in vitro </it>assays can reduce the cost and time required for toxicological screening of environmental chemicals and can also reduce the need for animal testing. Several public efforts are aimed at discovering patterns or classifiers in high-dimensional bioactivity space that predict tissue, organ or whole animal toxicological endpoints. Supervised machine learning is a powerful approach to discover combinatorial relationships in complex <it>in vitro/in vivo </it>datasets. We present a novel model to simulate complex chemical-toxicology data sets and use this model to evaluate the relative performance of different machine learning (ML) methods.</p> <p>Results</p> <p>The classification performance of Artificial Neural Networks (ANN), K-Nearest Neighbors (KNN), Linear Discriminant Analysis (LDA), Naïve Bayes (NB), Recursive Partitioning and Regression Trees (RPART), and Support Vector Machines (SVM) in the presence and absence of filter-based feature selection was analyzed using K-way cross-validation testing and independent validation on simulated <it>in vitro </it>assay data sets with varying levels of model complexity, number of irrelevant features and measurement noise. While the prediction accuracy of all ML methods decreased as non-causal (irrelevant) features were added, some ML methods performed better than others. In the limit of using a large number of features, ANN and SVM were always in the top performing set of methods while RPART and KNN (k = 5) were always in the poorest performing set. The addition of measurement noise and irrelevant features decreased the classification accuracy of all ML methods, with LDA suffering the greatest performance degradation. LDA performance is especially sensitive to the use of feature selection. Filter-based feature selection generally improved performance, most strikingly for LDA.</p> <p>Conclusion</p> <p>We have developed a novel simulation model to evaluate machine learning methods for the analysis of data sets in which in vitro bioassay data is being used to predict in vivo chemical toxicology. From our analysis, we can recommend that several ML methods, most notably SVM and ANN, are good candidates for use in real world applications in this area.</p

    Computational Approaches for Drug-Induced Liver Injury (DILI) Prediction: State of the Art and Challenges

    Get PDF
    Drug-induced liver injury (DILI) is one of the prevailing causes of fulminant hepatic failure. It is estimated that three idiosyncratic drug reactions out of four result in liver transplantation or death. Additionally, DILI is the most common reason for withdrawal of an approved drug from the market. Therefore, the development of methods for the early identification of hepatotoxic drug candidates is of crucial importance. This review focuses on the current state of cheminformatics strategies being applied for the early in silico prediction of DILI. Herein, we discuss key issues associated with DILI modelling in terms of the data size, imbalance and quality, complexity of mechanisms, and the different levels of hepatotoxicity to model going from general hepatotoxicity to the molecular initiating events of DILI

    Using High-Throughput Screening Data To Discriminate Compounds with Single-Target Effects from Those with Side Effects

    No full text
    The most desirable compound leads from high-throughput assays are those with novel biological activities resulting from their action on a single biological target. Valuable resources can be wasted on compound leads with significant ‘side effects ’ on additional biological targets; therefore, technical refinements to identify compounds that primarily have effects resulting from a single target are needed. This study explores the use of multiple assays of a chemical library and a statistic based on entropy to identify lead compound classes that have patterns of assay activity resulting primarily from small molecule action on a single target. This statistic, called the coincidence score, discriminates with 88 % accuracy compound classes known to act primarily on a single target from compound classes with significant side effects on nonhomologous targets. Furthermore, a significant number of the compound classes predicted to have primarily single-target effects contain known bioactive compounds. We also show that a compound’s known biological target or mechanism of action can often be suggested by its pattern of activities in multiple assays
    corecore