269 research outputs found

    Algebraic shortcuts for leave-one-out cross-validation in supervised network inference

    Get PDF
    Supervised machine learning techniques have traditionally been very successful at reconstructing biological networks, such as protein-ligand interaction, protein-protein interaction and gene regulatory networks. Many supervised techniques for network prediction use linear models on a possibly nonlinear pairwise feature representation of edges. Recently, much emphasis has been placed on the correct evaluation of such supervised models. It is vital to distinguish between using a model to either predict new interactions in a given network or to predict interactions for a new vertex not present in the original network. This distinction matters because (i) the performance might dramatically differ between the prediction settings and (ii) tuning the model hyperparameters to obtain the best possible model depends on the setting of interest. Specific cross-validation schemes need to be used to assess the performance in such different prediction settings. In this work we discuss a state-of-the-art kernel-based network inference technique called two-step kernel ridge regression. We show that this regression model can be trained efficiently, with a time complexity scaling with the number of vertices rather than the number of edges. Furthermore, this framework leads to a series of cross-validation shortcuts that allow one to rapidly estimate the model performance for any relevant network prediction setting. This allows computational biologists to fully assess the capabilities of their models

    Algebraic shortcuts for leave-one-out cross-validation in supervised network inference

    Get PDF
    Supervised machine learning techniques have traditionally been very successful at reconstructing biological networks, such as protein-ligand interaction, protein-protein interaction and gene regulatory networks. Many supervised techniques for network prediction use linear models on a possibly nonlinear pairwise feature representation of edges. Recently, much emphasis has been placed on the correct evaluation of such supervised models. It is vital to distinguish between using a model to either predict new interactions in a given network or to predict interactions for a new vertex not present in the original network. This distinction matters because (i) the performance might dramatically differ between the prediction settings and (ii) tuning the model hyperparameters to obtain the best possible model depends on the setting of interest. Specific cross-validation schemes need to be used to assess the performance in such different prediction settings.In this work we discuss a state-of-the-art kernel-based network inference technique called two-step kernel ridge regression. We show that this regression model can be trained efficiently, with a time complexity scaling with the number of vertices rather than the number of edges. Furthermore, this framework leads to a series of cross-validation shortcuts that allow one to rapidly estimate the model performance for any relevant network prediction setting. This allows computational biologists to fully assess the capabilities of their models. The machine learning techniques with the algebraic shortcuts are implemented in the RLScore software package: https://github.com/aatapa/RLScore

    Cold-start problems in data-driven prediction of drug-drug interaction effects

    Get PDF
    Combining drugs, a phenomenon often referred to as polypharmacy, can induce additional adverse effects. The identification of adverse combinations is a key task in pharmacovigilance. In this context, in silico approaches based on machine learning are promising as they can learn from a limited number of combinations to predict for all. In this work, we identify various subtasks in predicting effects caused by drug–drug interaction. Predicting drug–drug interaction effects for drugs that already exist is very different from predicting outcomes for newly developed drugs, commonly called a cold-start problem. We propose suitable validation schemes for the different subtasks that emerge. These validation schemes are critical to correctly assess the performance. We develop a new model that obtains AUC-ROC =0.843 for the hardest cold-start task up to AUC-ROC =0.957 for the easiest one on the benchmark dataset of Zitnik et al. Finally, we illustrate how our predictions can be used to improve post-market surveillance systems or detect drug–drug interaction effects earlier during drug development

    A Comparative Study of Pairwise Learning Methods based on Kernel Ridge Regression

    Full text link
    Many machine learning problems can be formulated as predicting labels for a pair of objects. Problems of that kind are often referred to as pairwise learning, dyadic prediction or network inference problems. During the last decade kernel methods have played a dominant role in pairwise learning. They still obtain a state-of-the-art predictive performance, but a theoretical analysis of their behavior has been underexplored in the machine learning literature. In this work we review and unify existing kernel-based algorithms that are commonly used in different pairwise learning settings, ranging from matrix filtering to zero-shot learning. To this end, we focus on closed-form efficient instantiations of Kronecker kernel ridge regression. We show that independent task kernel ridge regression, two-step kernel ridge regression and a linear matrix filter arise naturally as a special case of Kronecker kernel ridge regression, implying that all these methods implicitly minimize a squared loss. In addition, we analyze universality, consistency and spectral filtering properties. Our theoretical results provide valuable insights in assessing the advantages and limitations of existing pairwise learning methods.Comment: arXiv admin note: text overlap with arXiv:1606.0427

    Supervised learning methods to predict species interactions based

    Get PDF
    Species interaction networks - pollination networks, host-phage networks, food webs and the like - are key tools to study community ecosystems. Biologists enjoy working with networks as they provide a sound mathematical description of a system and come equipped with a large toolkit to analyze various properties such as stability, diversity or dynamics. Species interaction networks can be obtained experimentally or by field observations. Modern techniques such as DNA barcoding and camera traps, coupled with large databases, contribute further to the popularity of networks in ecology. In practice however, a collected network rarely contains all in sito interactions, as this would require an unfeasible large sampling effort. Species distributions are also subject to changes, for example due to climate change, which leads to new potential interactions. It is of great importance to be able to predict such interactions, for example to anticipate the effect of exotic species in an ecosystem. In our work, we study how to use supervised machine learning tools to be able to predict new species interactions. Based on an observed network, we learn a function that takes as inputs the description of two species (e.g. traits, phylogenetic similarity or a morphological description) and predicts whether these two species are likely to interact or not. This framework for pairwise learning is based on kernels and similar methods have been highly successful for predicting molecular networks and for recommender systems, as used by companies such as Netflix and Amazon. We have shown that these methods can detect missing interactions in many different types of species interaction networks. A large focus of our work is on how the accuracy of these models can be estimated realistically. Our methods are available in an R package called xnet, making them easy to use for ecology researchers

    The Hyperdimensional Transform for Distributional Modelling, Regression and Classification

    Full text link
    Hyperdimensional computing (HDC) is an increasingly popular computing paradigm with immense potential for future intelligent applications. Although the main ideas already took form in the 1990s, HDC recently gained significant attention, especially in the field of machine learning and data science. Next to efficiency, interoperability and explainability, HDC offers attractive properties for generalization as it can be seen as an attempt to combine connectionist ideas from neural networks with symbolic aspects. In recent work, we introduced the hyperdimensional transform, revealing deep theoretical foundations for representing functions and distributions as high-dimensional holographic vectors. Here, we present the power of the hyperdimensional transform to a broad data science audience. We use the hyperdimensional transform as a theoretical basis and provide insight into state-of-the-art HDC approaches for machine learning. We show how existing algorithms can be modified and how this transform can lead to a novel, well-founded toolbox. Next to the standard regression and classification tasks of machine learning, our discussion includes various aspects of statistical modelling, such as representation, learning and deconvolving distributions, sampling, Bayesian inference, and uncertainty estimation

    A comparative study of pairwise learning methods based on Kernel ridge regression

    Get PDF
    Many machine learning problems can be formulated as predicting labels for a pair of objects. Problems of that kind are often referred to as pairwise learning, dyadic prediction, or network inference problems. During the past decade, kernel methods have played a dominant role in pairwise learning. They still obtain a state-of-the-art predictive performance, but a theoretical analysis of their behavior has been underexplored in the machine learning literature. In this work we review and unify kernel-based algorithms that are commonly used in different pairwise learning settings, ranging from matrix filtering to zero-shot learning. To this end, we focus on closed-form efficient instantiations of Kronecker kernel ridge regression. We show that independent task kernel ridge regression, two-step kernel ridge regression, and a linear matrix filter arise naturally as a special case of Kronecker kernel ridge regression, implying that all these methods implicitly minimize a squared loss. In addition, we analyze universality, consistency, and spectral filtering properties. Our theoretical results provide valuable insights into assessing the advantages and limitations of existing pairwise learning methods.</p
    • …
    corecore