19,015 research outputs found
Algebraic shortcuts for leave-one-out cross-validation in supervised network inference
Supervised machine learning techniques have traditionally been very successful at reconstructing biological networks, such as protein-ligand interaction, protein-protein interaction and gene regulatory networks. Many supervised techniques for network prediction use linear models on a possibly nonlinear pairwise feature representation of edges. Recently, much emphasis has been placed on the correct evaluation of such supervised models. It is vital to distinguish between using a model to either predict new interactions in a given network or to predict interactions for a new vertex not present in the original network. This distinction matters because (i) the performance might dramatically differ between the prediction settings and (ii) tuning the model hyperparameters to obtain the best possible model depends on the setting of interest. Specific cross-validation schemes need to be used to assess the performance in such different prediction settings. In this work we discuss a state-of-the-art kernel-based network inference technique called two-step kernel ridge regression. We show that this regression model can be trained efficiently, with a time complexity scaling with the number of vertices rather than the number of edges. Furthermore, this framework leads to a series of cross-validation shortcuts that allow one to rapidly estimate the model performance for any relevant network prediction setting. This allows computational biologists to fully assess the capabilities of their models
Metric learning pairwise kernel for graph inference
Much recent work in bioinformatics has focused on the inference of various
types of biological networks, representing gene regulation, metabolic
processes, protein-protein interactions, etc. A common setting involves
inferring network edges in a supervised fashion from a set of high-confidence
edges, possibly characterized by multiple, heterogeneous data sets (protein
sequence, gene expression, etc.). Here, we distinguish between two modes of
inference in this setting: direct inference based upon similarities between
nodes joined by an edge, and indirect inference based upon similarities between
one pair of nodes and another pair of nodes. We propose a supervised approach
for the direct case by translating it into a distance metric learning problem.
A relaxation of the resulting convex optimization problem leads to the support
vector machine (SVM) algorithm with a particular kernel for pairs, which we
call the metric learning pairwise kernel (MLPK). We demonstrate, using several
real biological networks, that this direct approach often improves upon the
state-of-the-art SVM for indirect inference with the tensor product pairwise
kernel
Classifying pairs with trees for supervised biological network inference
Networks are ubiquitous in biology and computational approaches have been
largely investigated for their inference. In particular, supervised machine
learning methods can be used to complete a partially known network by
integrating various measurements. Two main supervised frameworks have been
proposed: the local approach, which trains a separate model for each network
node, and the global approach, which trains a single model over pairs of nodes.
Here, we systematically investigate, theoretically and empirically, the
exploitation of tree-based ensemble methods in the context of these two
approaches for biological network inference. We first formalize the problem of
network inference as classification of pairs, unifying in the process
homogeneous and bipartite graphs and discussing two main sampling schemes. We
then present the global and the local approaches, extending the later for the
prediction of interactions between two unseen network nodes, and discuss their
specializations to tree-based ensemble methods, highlighting their
interpretability and drawing links with clustering techniques. Extensive
computational experiments are carried out with these methods on various
biological networks that clearly highlight that these methods are competitive
with existing methods.Comment: 22 page
- …