Search CORE

91,205 research outputs found

Linear Regression with Random Projections

Author: Maillard Odalric
Munos Rémi
Publication venue: Microtome Publishing
Publication date: 01/01/2012
Field of study

International audienceWe investigate a method for regression that makes use of a randomly generated subspace

G_P

(of finite dimension

P

) of a given large (possibly infinite) dimensional function space

F

, for example,

L_{2}([0,1]^d)

G_P

is defined as the span of

P

random features that are linear combinations of a basis functions of

F

weighted by random Gaussian i.i.d.~coefficients. We show practical motivation for the use of this approach, detail the link that this random projections method share with RKHS and Gaussian objects theory and prove, both in deterministic and random design, approximation error bounds when searching for the best regression function in

G_P

rather than in

F

, and derive excess risk bounds for a specific regression algorithm (least squares regression in

G_P

). This paper stresses the motivation to study such methods, thus the analysis developed is kept simple for explanations purpose and leaves room for future developments

HAL - Lille 3

INRIA a CCSD electronic archive server

Linear regression with random projections

Author: Maillard Odalric-Ambrym
Munos Rémi
Publication venue: HAL CCSD
Publication date: 29/10/2010
Field of study

We consider ordinary (non penalized) least-squares regression where the regression function is chosen in a randomly generated sub-space GP \subset S of finite dimension P, where S is a function space of infinite dimension, e.g. L2([0, 1]^d). GP is defined as the span of P random features that are linear combinations of the basis functions of S weighted by random Gaussian i.i.d. coefficients. We characterize the so-called kernel space K \subset S of the resulting Gaussian process and derive approximation error bounds of order O(||f||^2_K log(P)/P) for functions f \in K approximated in GP . We apply this result to derive excess risk bounds for the least-squares estimate in various spaces. For illustration, we consider regression using the so-called scrambled wavelets (i.e. random linear combinations of wavelets of L2([0, 1]^d)) and derive an excess risk rate O(||f*||_K(logN)/sqrt(N)) which is arbitrarily close to the minimax optimal rate (up to a logarithmic factor) for target functions f* in K = H^s([0, 1]^d), a Sobolev space of smoothness order s > d/2. We describe an efficient implementation using lazy expansions with numerical complexity ˜O(2dN^3/2 logN+N^2), where d is the dimension of the input data and N is the number of data

HAL - Lille 3

INRIA a CCSD electronic archive server

Random Projections For Large-Scale Regression

Author: B. McWilliams
D. Achlioptas
M.W. Mahoney
O.-A. Maillard
P. Dhillon
P.S. Dhillon
R.D. Cook
S. Dasgupta
T. Marzetta
Publication venue
Publication date: 19/01/2017
Field of study

Fitting linear regression models can be computationally very expensive in large-scale data analysis tasks if the sample size and the number of variables are very large. Random projections are extensively used as a dimension reduction tool in machine learning and statistics. We discuss the applications of random projections in linear regression problems, developed to decrease computational costs, and give an overview of the theoretical guarantees of the generalization error. It can be shown that the combination of random projections with least squares regression leads to similar recovery as ridge regression and principal component regression. We also discuss possible improvements when averaging over multiple random projections, an approach that lends itself easily to parallel implementation.Comment: 13 pages, 3 Figure

arXiv.org e-Print Archive

Linear regression with random projections

Author: Maillard Odalric-Ambrym
Munos Rémi
Publication venue: HAL CCSD
Publication date: 29/10/2010
Field of study

INRIA a CCSD electronic archive server

Random projections for Bayesian regression

Author: Geppert Leo N.
Ickstadt Katja
Munteanu Alexander
Quedenfeld Jens
Sohler Christian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/11/2015
Field of study

This article deals with random projections applied as a data reduction technique for Bayesian regression analysis. We show sufficient conditions under which the entire

d

-dimensional distribution is approximately preserved under random projections by reducing the number of data points from

n

k\in O(\operatorname{poly}(d/\varepsilon))

in the case

n\gg d

. Under mild assumptions, we prove that evaluating a Gaussian likelihood function based on the projected data instead of the original data yields a

(1+O(\varepsilon))

-approximation in terms of the

\ell_2

Wasserstein distance. Our main result shows that the posterior distribution of Bayesian linear regression is approximated up to a small error depending on only an

\varepsilon

-fraction of its defining parameters. This holds when using arbitrary Gaussian priors or the degenerate case of uniform distributions over

\mathbb{R}^d

for

\beta

. Our empirical evaluations involve different simulated settings of Bayesian linear regression. Our experiments underline that the proposed method is able to recover the regression model up to small error while considerably reducing the total running time

arXiv.org e-Print Archive

Springer - Publisher Connector

High-dimensional analysis of double descent for linear regression with random projections

Author: Bach Francis
Publication venue
Publication date: 13/03/2023
Field of study

We consider linear regression problems with a varying number of random projections, where we provably exhibit a double descent curve for a fixed prediction problem, with a high-dimensional analysis based on random matrix theory. We first consider the ridge regression estimator and review earlier results using classical notions from non-parametric statistics, namely degrees of freedom, also known as effective dimensionality. We then compute asymptotic equivalents of the generalization performance (in terms of squared bias and variance) of the minimum norm least-squares fit with random projections, providing simple expressions for the double descent phenomenon

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Compressed Regression

Author: Lafferty John
Wasserman Larry
Zhou Shuheng
Publication venue
Publication date: 01/01/2007
Field of study

Recent research has studied the role of sparsity in high dimensional regression and signal reconstruction, establishing theoretical limits for recovering sparse models from sparse data. This line of work shows that

\ell_1

-regularized least squares regression can accurately estimate a sparse linear model from

n

noisy examples in

p

dimensions, even if

p

is much larger than

n

. In this paper we study a variant of this problem where the original

n

input variables are compressed by a random linear transformation to

m \ll n

examples in

p

dimensions, and establish conditions under which a sparse linear model can be successfully recovered from the compressed data. A primary motivation for this compression procedure is to anonymize the data and preserve privacy by revealing little information about the original data. We characterize the number of random projections that are required for

\ell_1

-regularized compressed regression to identify the nonzero coefficients in the true model with probability approaching one, a property called ``sparsistence.'' In addition, we show that

\ell_1

-regularized compressed regression asymptotically predicts as well as an oracle linear model, a property called ``persistence.'' Finally, we characterize the privacy properties of the compression procedure in information-theoretic terms, establishing upper bounds on the mutual information between the compressed and uncompressed data that decay to zero.Comment: 59 pages, 5 figure, Submitted for revie

arXiv.org e-Print Archive

CiteSeerX

A Computationally Efficient Projection-Based Approach for Spatial Generalized Linear Mixed Models

Author: Guan Yawen
Haran Murali
Publication venue
Publication date: 15/01/2018
Field of study

Inference for spatial generalized linear mixed models (SGLMMs) for high-dimensional non-Gaussian spatial data is computationally intensive. The computational challenge is due to the high-dimensional random effects and because Markov chain Monte Carlo (MCMC) algorithms for these models tend to be slow mixing. Moreover, spatial confounding inflates the variance of fixed effect (regression coefficient) estimates. Our approach addresses both the computational and confounding issues by replacing the high-dimensional spatial random effects with a reduced-dimensional representation based on random projections. Standard MCMC algorithms mix well and the reduced-dimensional setting speeds up computations per iteration. We show, via simulated examples, that Bayesian inference for this reduced-dimensional approach works well both in terms of inference as well as prediction, our methods also compare favorably to existing "reduced-rank" approaches. We also apply our methods to two real world data examples, one on bird count data and the other classifying rock types

arXiv.org e-Print Archive

FigShare