47 research outputs found
Solving Kernel Ridge Regression with Gradient-Based Optimization Methods
Kernel ridge regression, KRR, is a non-linear generalization of linear ridge
regression. Here, we introduce an equivalent formulation of the objective
function of KRR, opening up both for using other penalties than the ridge
penalty and for studying kernel ridge regression from the perspective of
gradient descent. Using a continuous-time perspective, we derive a closed-form
solution, kernel gradient flow, KGF, with regularization through early
stopping, which allows us to theoretically bound the differences between KGF
and KRR. We generalize KRR by replacing the ridge penalty with the and
penalties and utilize the fact that analogously to the
similarities between KGF and KRR, the solutions obtained when using these
penalties are very similar to those obtained from forward stagewise regression
(also known as coordinate descent) and sign gradient descent in combination
with early stopping. Thus the need for computationally heavy proximal gradient
descent algorithms can be alleviated. We show theoretically and empirically how
these penalties, and corresponding gradient-based optimization algorithms,
produce signal-driven and robust regression solutions, respectively. We also
investigate kernel gradient descent where the kernel is allowed to change
during training, and theoretically address the effects this has on
generalization. Based on our findings, we propose an update scheme for the
bandwidth of translational-invariant kernels, where we let the bandwidth
decrease to zero during training, thus circumventing the need for
hyper-parameter selection. We demonstrate on real and synthetic data how
decreasing the bandwidth during training outperforms using a constant
bandwidth, selected by cross-validation and marginal likelihood maximization.
We also show that using a decreasing bandwidth, we are able to achieve both
zero training error and a double descent behavior
Non-linear, Sparse Dimensionality Reduction via Path Lasso Penalized Autoencoders
High-dimensional data sets are often analyzed and explored via the
construction of a latent low-dimensional space which enables convenient
visualization and efficient predictive modeling or clustering. For complex data
structures, linear dimensionality reduction techniques like PCA may not be
sufficiently flexible to enable low-dimensional representation. Non-linear
dimension reduction techniques, like kernel PCA and autoencoders, suffer from
loss of interpretability since each latent variable is dependent of all input
dimensions. To address this limitation, we here present path lasso penalized
autoencoders. This structured regularization enhances interpretability by
penalizing each path through the encoder from an input to a latent variable,
thus restricting how many input variables are represented in each latent
dimension. Our algorithm uses a group lasso penalty and non-negative matrix
factorization to construct a sparse, non-linear latent representation. We
compare the path lasso regularized autoencoder to PCA, sparse PCA, autoencoders
and sparse autoencoders on real and simulated data sets. We show that the
algorithm exhibits much lower reconstruction errors than sparse PCA and
parameter-wise lasso regularized autoencoders for low-dimensional
representations. Moreover, path lasso representations provide a more accurate
reconstruction match, i.e. preserved relative distance between objects in the
original and reconstructed spaces
A meta-data based method for DNA microarray imputation
BACKGROUND: DNA microarray experiments are conducted in logical sets, such as time course profiling after a treatment is applied to the samples, or comparisons of the samples under two or more conditions. Due to cost and design constraints of spotted cDNA microarray experiments, each logical set commonly includes only a small number of replicates per condition. Despite the vast improvement of the microarray technology in recent years, missing values are prevalent. Intuitively, imputation of missing values is best done using many replicates within the same logical set. In practice, there are few replicates and thus reliable imputation within logical sets is difficult. However, it is in the case of few replicates that the presence of missing values, and how they are imputed, can have the most profound impact on the outcome of downstream analyses (e.g. significance analysis and clustering). This study explores the feasibility of imputation across logical sets, using the vast amount of publicly available microarray data to improve imputation reliability in the small sample size setting. RESULTS: We download all cDNA microarray data of Saccharomyces cerevisiae, Arabidopsis thaliana, and Caenorhabditis elegans from the Stanford Microarray Database. Through cross-validation and simulation, we find that, for all three species, our proposed imputation using data from public databases is far superior to imputation within a logical set, sometimes to an astonishing degree. Furthermore, the imputation root mean square error for significant genes is generally a lot less than that of non-significant ones. CONCLUSION: Since downstream analysis of significant genes, such as clustering and network analysis, can be very sensitive to small perturbations of estimated gene effects, it is highly recommended that researchers apply reliable data imputation prior to further analysis. Our method can also be applied to cDNA microarray experiments from other species, provided good reference data are available
Network modeling of the transcriptional effects of copy number aberrations in glioblastoma
DNA copy number aberrations (CNAs) are a characteristic feature of cancer genomes. In this work, Rebecka Jörnsten, Sven Nelander and colleagues combine network modeling and experimental methods to analyze the systems-level effects of CNAs in glioblastoma
Clustering and classification based on the L1 data depth
Clustering and classification are important tasks for the analysis of microarray gene expression data. Classification of tissue samples can be a valuable diagnostic tool for diseases such as cancer. Clustering samples or experiments may lead to the discovery of subclasses of diseases. Clustering genes can help identify groups of genes that respond similarly to a set of experimental conditions. We also need validation tools for clustering and classification. Here, we focus on the identification of outliers--units that may have been misallocated, or mislabeled, or are not representative of the classes or clusters. We present two new methods: DDclust and DDclass, for clustering and classification. These non-parametric methods are based on the intuitively simple concept of data depth. We apply the methods to several gene expression and simulated data sets. We also discuss a convenient visualization and validation tool--the relative data depth plot.Clustering Classification Data depth Relative data depth
Bandwidth Selection for Gaussian Kernel Ridge Regression via Jacobian Control
Most machine learning methods require tuning of hyper-parameters. For kernel
ridge regression with the Gaussian kernel, the hyper-parameter is the
bandwidth. The bandwidth specifies the length-scale of the kernel and has to be
carefully selected in order to obtain a model with good generalization. The
default methods for bandwidth selection is cross-validation and marginal
likelihood maximization, which often yields good results, albeit at high
computational costs. Furthermore, the estimates provided by these methods tend
to have very high variance, especially when training data are scarce. Inspired
by Jacobian regularization, we formulate an approximate expression for how the
derivatives of the functions inferred by kernel ridge regression with the
Gaussian kernel depend on the kernel bandwidth. We then use this expression to
propose a closed-form, computationally feather-light, bandwidth selection
heuristic based on controlling the Jacobian. In addition, the Jacobian
expression illuminates how the bandwidth selection is a trade-off between the
smoothness of the inferred function, and the conditioning of the training data
kernel matrix. We show on real and synthetic data that compared to
cross-validation and marginal likelihood maximization, our method is
considerably faster and considerably more stable in terms of bandwidth
selection