59 research outputs found

    Link Mining for Kernel-based Compound-Protein Interaction Predictions Using a Chemogenomics Approach

    Full text link
    Virtual screening (VS) is widely used during computational drug discovery to reduce costs. Chemogenomics-based virtual screening (CGBVS) can be used to predict new compound-protein interactions (CPIs) from known CPI network data using several methods, including machine learning and data mining. Although CGBVS facilitates highly efficient and accurate CPI prediction, it has poor performance for prediction of new compounds for which CPIs are unknown. The pairwise kernel method (PKM) is a state-of-the-art CGBVS method and shows high accuracy for prediction of new compounds. In this study, on the basis of link mining, we improved the PKM by combining link indicator kernel (LIK) and chemical similarity and evaluated the accuracy of these methods. The proposed method obtained an average area under the precision-recall curve (AUPR) value of 0.562, which was higher than that achieved by the conventional Gaussian interaction profile (GIP) method (0.425), and the calculation time was only increased by a few percent

    Drug Target Interaction Prediction Using Machine Learning Techniques – A Review

    Get PDF
    Drug discovery is a key process, given the rising and ubiquitous demand for medication to stay in good shape right through the course of one’s life. Drugs are small molecules that inhibit or activate the function of a protein, offering patients a host of therapeutic benefits. Drug design is the inventive process of finding new medication, based on targets or proteins. Identifying new drugs is a process that involves time and money. This is where computer-aided drug design helps cut time and costs. Drug design needs drug targets that are a protein and a drug compound, with which the interaction between a drug and a target is established. Interaction, in this context, refers to the process of discovering protein binding sites, which are protein pockets that bind with drugs. Pockets are regions on a protein macromolecule that bind to drug molecules. Researchers have been at work trying to determine new Drug Target Interactions (DTI) that predict whether or not a given drug molecule will bind to a target. Machine learning (ML) techniques help establish the interaction between drugs and their targets, using computer-aided drug design. This paper aims to explore ML techniques better for DTI prediction and boost future research. Qualitative and quantitative analyses of ML techniques show that several have been applied to predict DTIs, employing a range of classifiers. Though DTI prediction improves with negative drug target pairs (DTP), the lack of true negative DTPs has led to the use a particular dataset of drugs and targets. Using dynamic DTPs improves DTI prediction. Little attention has so far been paid to developing a new classifier for DTI classification, and there is, unquestionably, a need for better ones

    VB-MK-LMF: Fusion of drugs, targets and interactions using Variational Bayesian Multiple Kernel Logistic Matrix Factorization

    Get PDF
    Background Computational fusion approaches to drug-target interaction (DTI) prediction, capable of utilizing multiple sources of background knowledge, were reported to achieve superior predictive performance in multiple studies. Other studies showed that specificities of the DTI task, such as weighting the observations and focusing the side information are also vital for reaching top performance. Method We present Variational Bayesian Multiple Kernel Logistic Matrix Factorization (VB-MK-LMF), which unifies the advantages of (1) multiple kernel learning, (2) weighted observations, (3) graph Laplacian regularization, and (4) explicit modeling of probabilities of binary drug-target interactions. Results VB-MK-LMF achieves significantly better predictive performance in standard benchmarks compared to state-of-the-art methods, which can be traced back to multiple factors. The systematic evaluation of the effect of multiple kernels confirm their benefits, but also highlights the limitations of linear kernel combinations, already recognized in other fields. The analysis of the effect of prior kernels using varying sample sizes sheds light on the balance of data and knowledge in DTI tasks and on the rate at which the effect of priors vanishes. This also shows the existence of ``small sample size'' regions where using side information offers significant gains. Alongside favorable predictive performance, a notable property of MF methods is that they provide a unified space for drugs and targets using latent representations. Compared to earlier studies, the dimensionality of this space proved to be surprisingly low, which makes the latent representations constructed by VB-ML-LMF especially well-suited for visual analytics. The probabilistic nature of the predictions allows the calculation of the expected values of hits in functionally relevant sets, which we demonstrate by predicting drug promiscuity. The variational Bayesian approximation is also implemented for general purpose graphics processing units yielding significantly improved computational time. Conclusion In standard benchmarks, VB-MK-LMF shows significantly improved predictive performance in a wide range of settings. Beyond these benchmarks, another contribution of our work is highlighting and providing estimates for further pharmaceutically relevant quantities, such as promiscuity, druggability and total number of interactions. Availability Data and code are available at http://bioinformatics.mit.bme.hu

    Machine Learning Methodologies for Interpretable Compound Activity Predictions

    Get PDF
    Machine learning (ML) models have gained attention for mining the pharmaceutical data that are currently generated at unprecedented rates and potentially accelerate the discovery of new drugs. The advent of deep learning (DL) has also raised expectations in pharmaceutical research. A central task in drug discovery is the initial search of compounds with desired biological activity. ML algorithms are able to find patterns in compound structures that are related to bioactivity, the so-called structure-activity relationships (SARs). ML-based predictions can complement biological testing to prioritize further experiments. Moreover, insights into model decisions are highly desired for further validation and identification of activity-relevant substructures. However, the interpretation of complex ML models remains essentially prohibitive. This thesis focuses on ML-based predictions of compound activity against multiple biological targets. Single-target and multi-target models are generated for relevant tasks including the prediction of profiling matrices from screening data and the discrimination between weak and strong inhibitors for more than a hundred kinases. Moreover, the relative performance of distinct modeling strategies is systematically analyzed under varying training conditions, and practical guidelines are reported. Since explainable model decisions are a clear requirement for the utility of ML bioactivity models in pharmaceutical research, methods for the interpretation and intuitive visualization of activity predictions from any ML or DL model are introduced. Taken together, this dissertation presents contributions that advance in the application and rationalization of ML models for biological activity and SAR predictions

    Multi-Target Prediction: A Unifying View on Problems and Methods

    Full text link
    Multi-target prediction (MTP) is concerned with the simultaneous prediction of multiple target variables of diverse type. Due to its enormous application potential, it has developed into an active and rapidly expanding research field that combines several subfields of machine learning, including multivariate regression, multi-label classification, multi-task learning, dyadic prediction, zero-shot learning, network inference, and matrix completion. In this paper, we present a unifying view on MTP problems and methods. First, we formally discuss commonalities and differences between existing MTP problems. To this end, we introduce a general framework that covers the above subfields as special cases. As a second contribution, we provide a structured overview of MTP methods. This is accomplished by identifying a number of key properties, which distinguish such methods and determine their suitability for different types of problems. Finally, we also discuss a few challenges for future research

    Genome-wide Protein-chemical Interaction Prediction

    Get PDF
    The analysis of protein-chemical reactions on a large scale is critical to understanding the complex interrelated mechanisms that govern biological life at the cellular level. Chemical proteomics is a new research area aimed at genome-wide screening of such chemical-protein interactions. Traditional approaches to such screening involve in vivo or in vitro experimentation, which while becoming faster with the application of high-throughput screening technologies, remains costly and time-consuming compared to in silico methods. Early in silico methods are dependant on knowing 3D protein structures (docking) or knowing binding information for many chemicals (ligand-based approaches). Typical machine learning approaches follow a global classification approach where a single predictive model is trained for an entire data set, but such an approach is unlikely to generalize well to the protein-chemical interaction space considering its diversity and heterogeneous distribution. In response to the global approach, work on local models has recently emerged to improve generalization across the interaction space by training a series of independant models localized to each predict a single interaction. This work examines current approaches to genome-wide protein-chemical interaction prediction and explores new computational methods based on modifications to the boosting framework for ensemble learning. The methods are described and compared to several competing classification methods. Genome-wide chemical-protein interaction data sets are acquired from publicly available resources, and a series of experimental studies are performed in order to compare the the performance of each method under a variety of conditions

    Artificial intelligence, machine learning, and drug repurposing in cancer

    Get PDF
    Introduction: Drug repurposing provides a cost-effective strategy to re-use approved drugs for new medical indications. Several machine learning (ML) and artificial intelligence (AI) approaches have been developed for systematic identification of drug repurposing leads based on big data resources, hence further accelerating and de-risking the drug development process by computational means. Areas covered: The authors focus on supervised ML and AI methods that make use of publicly available databases and information resources. While most of the example applications are in the field of anticancer drug therapies, the methods and resources reviewed are widely applicable also to other indications including COVID-19 treatment. A particular emphasis is placed on the use of comprehensive target activity profiles that enable a systematic repurposing process by extending the target profile of drugs to include potent off-targets with therapeutic potential for a new indication. Expert opinion: The scarcity of clinical patient data and the current focus on genetic aberrations as primary drug targets may limit the performance of anticancer drug repurposing approaches that rely solely on genomics-based information. Functional testing of cancer patient cells exposed to a large number of targeted therapies and their combinations provides an additional source of repurposing information for tissue-aware AI approaches.Peer reviewe

    Computational Methods for Structure-Activity Relationship Analysis and Activity Prediction

    Get PDF
    Structure-activity relationship (SAR) analysis of small bioactive compounds is a key task in medicinal chemistry. Traditionally, SARs were established on a case-by-case basis. However, with the arrival of high-throughput screening (HTS) and synthesis techniques, a surge in the size and structural heterogeneity of compound data is seen and the use of computational methods to analyse SARs has become imperative and valuable. In recent years, graphical methods have gained prominence for analysing SARs. The choice of molecular representation and the method of assessing similarities affects the outcome of the SAR analysis. Thus, alternative methods providing distinct points of view of SARs are required. In this thesis, a novel graphical representation utilizing the canonical scaffold-skeleton definition to explore meaningful global and local SAR patterns in compound data is introduced. Furthermore, efforts have been made to go beyond descriptive SAR analysis offered by the graphical methods. SAR features inferred from descriptive methods are utilized for compound activity predictions. In this context, a data structure called SAR matrix (SARM), which is reminiscent of conventional R-group tables, is utilized. SARMs suggest many virtual compounds that represent as of yet unexplored chemical space. These virtual compounds are candidates for further exploration but are too many to prioritize simply on the basis of visual inspection. Conceptually different approaches to enable systematic compound prediction and prioritization are introduced. Much emphasis is put on evolving the predictive ability for prospective compound design. Going beyond SAR analysis, the SARM method has also been adapted to navigate multi-target spaces primarily for analysing compound promiscuity patterns. Thus, the original SARM methodology has been further developed for a variety of medicinal chemistry and chemogenomics applications

    Predicting the mechanism of phospholipidosis.

    Get PDF
    The mechanism of phospholipidosis is still not well understood. Numerous different mechanisms have been proposed, varying from direct inhibition of the breakdown of phospholipids to the binding of a drug compound to the phospholipid, preventing breakdown. We have used a probabilistic method, the Parzen-Rosenblatt Window approach, to build a model from the ChEMBL dataset which can predict from a compound's structure both its primary pharmaceutical target and other targets with which it forms off-target, usually weaker, interactions. Using a small dataset of 182 phospholipidosis-inducing and non-inducing compounds, we predict their off-target activity against targets which could relate to phospholipidosis as a side-effect of a drug. We link these targets to specific mechanisms of inducing this lysosomal build-up of phospholipids in cells. Thus, we show that the induction of phospholipidosis is likely to occur by separate mechanisms when triggered by different cationic amphiphilic drugs. We find that both inhibition of phospholipase activity and enhanced cholesterol biosynthesis are likely to be important mechanisms. Furthermore, we provide evidence suggesting four specific protein targets. Sphingomyelin phosphodiesterase, phospholipase A2 and lysosomal phospholipase A1 are shown to be likely targets for the induction of phospholipidosis by inhibition of phospholipase activity, while lanosterol synthase is predicted to be associated with phospholipidosis being induced by enhanced cholesterol biosynthesis. This analysis provides the impetus for further experimental tests of these hypotheses.RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are
    corecore