Search CORE

1,729 research outputs found

Link Mining for Kernel-based Compound-Protein Interaction Predictions Using a Chemogenomics Approach

Author: A Lavecchia
ACA Nascimento
C-C Chang
D Rogers
H Ding
L Jacob
M Bouchard
M Gonen
M Hattori
MN Drwal
S Daminelli
T Laarhoven van
T Laarhoven van
TF Smith
Y Liu
Y Yamanishi
Publication venue
Publication date: 29/06/2017
Field of study

Virtual screening (VS) is widely used during computational drug discovery to reduce costs. Chemogenomics-based virtual screening (CGBVS) can be used to predict new compound-protein interactions (CPIs) from known CPI network data using several methods, including machine learning and data mining. Although CGBVS facilitates highly efficient and accurate CPI prediction, it has poor performance for prediction of new compounds for which CPIs are unknown. The pairwise kernel method (PKM) is a state-of-the-art CGBVS method and shows high accuracy for prediction of new compounds. In this study, on the basis of link mining, we improved the PKM by combining link indicator kernel (LIK) and chemical similarity and evaluated the accuracy of these methods. The proposed method obtained an average area under the precision-recall curve (AUPR) value of 0.562, which was higher than that achieved by the conventional Gaussian interaction profile (GIP) method (0.425), and the calculation time was only increased by a few percent

arXiv.org e-Print Archive

Crossref

A Comparative Study of Pairwise Learning Methods based on Kernel Ridge Regression

Author: Airola Antti
De Baets Bernard
Pahikkala Tapio
Stock Michiel
Waegeman Willem
Publication venue
Publication date: 01/01/2018
Field of study

Many machine learning problems can be formulated as predicting labels for a pair of objects. Problems of that kind are often referred to as pairwise learning, dyadic prediction or network inference problems. During the last decade kernel methods have played a dominant role in pairwise learning. They still obtain a state-of-the-art predictive performance, but a theoretical analysis of their behavior has been underexplored in the machine learning literature. In this work we review and unify existing kernel-based algorithms that are commonly used in different pairwise learning settings, ranging from matrix filtering to zero-shot learning. To this end, we focus on closed-form efficient instantiations of Kronecker kernel ridge regression. We show that independent task kernel ridge regression, two-step kernel ridge regression and a linear matrix filter arise naturally as a special case of Kronecker kernel ridge regression, implying that all these methods implicitly minimize a squared loss. In addition, we analyze universality, consistency and spectral filtering properties. Our theoretical results provide valuable insights in assessing the advantages and limitations of existing pairwise learning methods.Comment: arXiv admin note: text overlap with arXiv:1606.0427

arXiv.org e-Print Archive

Ghent University Academic Bibliography

Computational-experimental approach to drug-target interaction mapping : A case study on kinase inhibitors

Author: Airola Antti
Aittokallio Tero
Cichonska Anna
Pahikkala Tapio
Parri Elina
Ravikumar Balaguru
Rousu Juho
Timonen Sanna
Wennerberg Krister
Publication venue
Publication date: 01/08/2017
Field of study

Peer reviewe

Directory of Open Access Journals

Aaltodoc Publication Archive

Helsingin yliopiston digitaalinen arkisto

FigShare

VB-MK-LMF: Fusion of drugs, targets and interactions using Variational Bayesian Multiple Kernel Logistic Matrix Factorization

Author: Antal Péter
Bolgár Bence
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Background Computational fusion approaches to drug-target interaction (DTI) prediction, capable of utilizing multiple sources of background knowledge, were reported to achieve superior predictive performance in multiple studies. Other studies showed that specificities of the DTI task, such as weighting the observations and focusing the side information are also vital for reaching top performance. Method We present Variational Bayesian Multiple Kernel Logistic Matrix Factorization (VB-MK-LMF), which unifies the advantages of (1) multiple kernel learning, (2) weighted observations, (3) graph Laplacian regularization, and (4) explicit modeling of probabilities of binary drug-target interactions. Results VB-MK-LMF achieves significantly better predictive performance in standard benchmarks compared to state-of-the-art methods, which can be traced back to multiple factors. The systematic evaluation of the effect of multiple kernels confirm their benefits, but also highlights the limitations of linear kernel combinations, already recognized in other fields. The analysis of the effect of prior kernels using varying sample sizes sheds light on the balance of data and knowledge in DTI tasks and on the rate at which the effect of priors vanishes. This also shows the existence of ``small sample size'' regions where using side information offers significant gains. Alongside favorable predictive performance, a notable property of MF methods is that they provide a unified space for drugs and targets using latent representations. Compared to earlier studies, the dimensionality of this space proved to be surprisingly low, which makes the latent representations constructed by VB-ML-LMF especially well-suited for visual analytics. The probabilistic nature of the predictions allows the calculation of the expected values of hits in functionally relevant sets, which we demonstrate by predicting drug promiscuity. The variational Bayesian approximation is also implemented for general purpose graphics processing units yielding significantly improved computational time. Conclusion In standard benchmarks, VB-MK-LMF shows significantly improved predictive performance in a wide range of settings. Beyond these benchmarks, another contribution of our work is highlighting and providing estimates for further pharmaceutically relevant quantities, such as promiscuity, druggability and total number of interactions. Availability Data and code are available at http://bioinformatics.mit.bme.hu

Directory of Open Access Journals

Repository of the Academy's Library

Supplemental Information 2: Code and data

Author: Blom
Blom
Boutet
Carlsson
Chang
Chang
Conforti
Diella
Dou
Eisenhaber
Fan
Gao
Gao
Gupta
Gönen
Hornbeck
Hortin
Huang
Huang
Ischiropoulos
Jia
Lee
Li
Li
Li
Liu
Mann
Matthews
Miller
Minguez
Monigatti
Mukherjee
Nascimento
Pan
Peng
Qiu
Ubersax
Van Laarhoven
Vapnik
Walsh
Wang
Wang
Wong
Xie
Xu
Xu
Xu
Xu
Xue
Xue
Publication venue: 'PeerJ'
Publication date
Field of study

Crossref

Algebraic shortcuts for leave-one-out cross-validation in supervised network inference

Author: Airola Antti
De Baets Bernard
Pahikkala Tapio
Stock Michiel
Waegeman Willem
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2020
Field of study

Supervised machine learning techniques have traditionally been very successful at reconstructing biological networks, such as protein-ligand interaction, protein-protein interaction and gene regulatory networks. Many supervised techniques for network prediction use linear models on a possibly nonlinear pairwise feature representation of edges. Recently, much emphasis has been placed on the correct evaluation of such supervised models. It is vital to distinguish between using a model to either predict new interactions in a given network or to predict interactions for a new vertex not present in the original network. This distinction matters because (i) the performance might dramatically differ between the prediction settings and (ii) tuning the model hyperparameters to obtain the best possible model depends on the setting of interest. Specific cross-validation schemes need to be used to assess the performance in such different prediction settings. In this work we discuss a state-of-the-art kernel-based network inference technique called two-step kernel ridge regression. We show that this regression model can be trained efficiently, with a time complexity scaling with the number of vertices rather than the number of edges. Furthermore, this framework leads to a series of cross-validation shortcuts that allow one to rapidly estimate the model performance for any relevant network prediction setting. This allows computational biologists to fully assess the capabilities of their models

Ghent University Academic Bibliography