netReg: network-regularized linear models for biological association studies

Abstract

Abstract Summary Modelling biological associations or dependencies using linear regression is often complicated when the analyzed data-sets are high-dimensional and less observations than variables are available (n ≪ p). For genomic data-sets penalized regression methods have been applied settling this issue. Recently proposed regression models utilize prior knowledge on dependencies, e.g. in the form of graphs, arguing that this information will lead to more reliable estimates for regression coefficients. However, none of the proposed models for multivariate genomic response variables have been implemented as a computationally efficient, freely available library. In this paper we propose netReg, a package for graph-penalized regression models that use large networks and thousands of variables. netReg incorporates a priori generated biological graph information into linear models yielding sparse or smooth solutions for regression coefficients. Availability and implementation netReg is implemented as both R-package and C ++ commandline tool. The main computations are done in C ++, where we use Armadillo for fast matrix calculations and Dlib for optimization. The R package is freely available on Bioconductorhttps://bioconductor.org/packages/netReg. The command line tool can be installed using the conda channel Bioconda. Installation details, issue reports, development versions, documentation and tutorials for the R and C ++ versions and the R package vignette can be found on GitHub https://dirmeier.github.io/netReg/. The GitHub page also contains code for benchmarking and example datasets used in this paper

    Similar works