Search CORE

91,872 research outputs found

Compressed Regression

Author: Lafferty John
Wasserman Larry
Zhou Shuheng
Publication venue
Publication date: 01/01/2007
Field of study

Recent research has studied the role of sparsity in high dimensional regression and signal reconstruction, establishing theoretical limits for recovering sparse models from sparse data. This line of work shows that

\ell_1

-regularized least squares regression can accurately estimate a sparse linear model from

n

noisy examples in

p

dimensions, even if

p

is much larger than

n

. In this paper we study a variant of this problem where the original

n

input variables are compressed by a random linear transformation to

m \ll n

examples in

p

dimensions, and establish conditions under which a sparse linear model can be successfully recovered from the compressed data. A primary motivation for this compression procedure is to anonymize the data and preserve privacy by revealing little information about the original data. We characterize the number of random projections that are required for

\ell_1

-regularized compressed regression to identify the nonzero coefficients in the true model with probability approaching one, a property called ``sparsistence.'' In addition, we show that

\ell_1

-regularized compressed regression asymptotically predicts as well as an oracle linear model, a property called ``persistence.'' Finally, we characterize the privacy properties of the compression procedure in information-theoretic terms, establishing upper bounds on the mutual information between the compressed and uncompressed data that decay to zero.Comment: 59 pages, 5 figure, Submitted for revie

arXiv.org e-Print Archive

CiteSeerX

Bayesian Compressed Regression

Author: Dunson David B.
Guhaniyogi Rajarshi
Publication venue
Publication date: 22/03/2013
Field of study

As an alternative to variable selection or shrinkage in high dimensional regression, we propose to randomly compress the predictors prior to analysis. This dramatically reduces storage and computational bottlenecks, performing well when the predictors can be projected to a low dimensional linear subspace with minimal loss of information about the response. As opposed to existing Bayesian dimensionality reduction approaches, the exact posterior distribution conditional on the compressed data is available analytically, speeding up computation by many orders of magnitude while also bypassing robustness issues due to convergence and mixing problems with MCMC. Model averaging is used to reduce sensitivity to the random projection matrix, while accommodating uncertainty in the subspace dimension. Strong theoretical support is provided for the approach by showing near parametric convergence rates for the predictive density in the large p small n asymptotic paradigm. Practical performance relative to competitors is illustrated in simulations and real data applications.Comment: 29 pages, 4 figure

arXiv.org e-Print Archive

CiteSeerX

A Method for Compressing Parameters in Bayesian Models with Application to Logistic Sequence Prediction Models

Author: Li Longhai
Neal Radford M.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 30/11/2007
Field of study

Bayesian classification and regression with high order interactions is largely infeasible because Markov chain Monte Carlo (MCMC) would need to be applied with a great many parameters, whose number increases rapidly with the order. In this paper we show how to make it feasible by effectively reducing the number of parameters, exploiting the fact that many interactions have the same values for all training cases. Our method uses a single ``compressed'' parameter to represent the sum of all parameters associated with a set of patterns that have the same value for all training cases. Using symmetric stable distributions as the priors of the original parameters, we can easily find the priors of these compressed parameters. We therefore need to deal only with a much smaller number of compressed parameters when training the model with MCMC. The number of compressed parameters may have converged before considering the highest possible order. After training the model, we can split these compressed parameters into the original ones as needed to make predictions for test cases. We show in detail how to compress parameters for logistic sequence prediction models. Experiments on both simulated and real data demonstrate that a huge number of parameters can indeed be reduced by our compression method.Comment: 29 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

Estimates on compressed neural networks regression

Author: Aad
Achlioptas
Barron
Bishop
Blum
Brauer
Cao
Chen
Cucker
Cybenko
Davenport
Funahashi
Hamers
Hastie
Haykin
Hritonenko
Jiabing Ji
Jianyong Sun
Kohler
Leonardis
Musavi
Pontil
Tibshirani
Tikhonov
Tsaig
Williamson
Xu
Yongquan Zhang
Youmei Li
Yuan
Publication venue: 'Elsevier BV'
Publication date: 01/03/2015
Field of study

When the neural element number nn of neural networks is larger than the sample size mm, the overfitting problem arises since there are more parameters than actual data (more variable than constraints). In order to overcome the overfitting problem, we propose to reduce the number of neural elements by using compressed projection AA which does not need to satisfy the condition of Restricted Isometric Property (RIP). By applying probability inequalities and approximation properties of the feedforward neural networks (FNNs), we prove that solving the FNNs regression learning algorithm in the compressed domain instead of the original domain reduces the sample error at the price of an increased (but controlled) approximation error, where the covering number theory is used to estimate the excess error, and an upper bound of the excess error is given

Crossref

Greenwich Academic Literature Archive