558 research outputs found

    Exponential Screening and optimal rates of sparse estimation

    Full text link
    In high-dimensional linear regression, the goal pursued here is to estimate an unknown regression function using linear combinations of a suitable set of covariates. One of the key assumptions for the success of any statistical procedure in this setup is to assume that the linear combination is sparse in some sense, for example, that it involves only few covariates. We consider a general, non necessarily linear, regression with Gaussian noise and study a related question that is to find a linear combination of approximating functions, which is at the same time sparse and has small mean squared error (MSE). We introduce a new estimation procedure, called Exponential Screening that shows remarkable adaptation properties. It adapts to the linear combination that optimally balances MSE and sparsity, whether the latter is measured in terms of the number of non-zero entries in the combination (â„“0\ell_0 norm) or in terms of the global weight of the combination (â„“1\ell_1 norm). The power of this adaptation result is illustrated by showing that Exponential Screening solves optimally and simultaneously all the problems of aggregation in Gaussian regression that have been discussed in the literature. Moreover, we show that the performance of the Exponential Screening estimator cannot be improved in a minimax sense, even if the optimal sparsity is known in advance. The theoretical and numerical superiority of Exponential Screening compared to state-of-the-art sparse procedures is also discussed

    Discussion of ``2004 IMS Medallion Lecture: Local Rademacher complexities and oracle inequalities in risk minimization'' by V. Koltchinskii

    Full text link
    Discussion of ``2004 IMS Medallion Lecture: Local Rademacher complexities and oracle inequalities in risk minimization'' by V. Koltchinskii [arXiv:0708.0083]Comment: Published at http://dx.doi.org/10.1214/009053606000001064 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Linear and convex aggregation of density estimators

    Get PDF
    We study the problem of linear and convex aggregation of MM estimators of a density with respect to the mean squared risk. We provide procedures for linear and convex aggregation and we prove oracle inequalities for their risks. We also obtain lower bounds showing that these procedures are rate optimal in a minimax sense. As an example, we apply general results to aggregation of multivariate kernel density estimators with different bandwidths. We show that linear and convex aggregates mimic the kernel oracles in asymptotically exact sense for a large class of kernels including Gaussian, Silverman's and Pinsker's ones. We prove that, for Pinsker's kernel, the proposed aggregates are sharp asymptotically minimax simultaneously over a large scale of Sobolev classes of densities. Finally, we provide simulations demonstrating performance of the convex aggregation procedure.Comment: 22 page

    Sparse Regression Learning by Aggregation and Langevin Monte-Carlo

    Get PDF
    We consider the problem of regression learning for deterministic design and independent random errors. We start by proving a sharp PAC-Bayesian type bound for the exponentially weighted aggregate (EWA) under the expected squared empirical loss. For a broad class of noise distributions the presented bound is valid whenever the temperature parameter β\beta of the EWA is larger than or equal to 4σ24\sigma^2, where σ2\sigma^2 is the noise variance. A remarkable feature of this result is that it is valid even for unbounded regression functions and the choice of the temperature parameter depends exclusively on the noise level. Next, we apply this general bound to the problem of aggregating the elements of a finite-dimensional linear space spanned by a dictionary of functions ϕ1,...,ϕM\phi_1,...,\phi_M. We allow MM to be much larger than the sample size nn but we assume that the true regression function can be well approximated by a sparse linear combination of functions ϕj\phi_j. Under this sparsity scenario, we propose an EWA with a heavy tailed prior and we show that it satisfies a sparsity oracle inequality with leading constant one. Finally, we propose several Langevin Monte-Carlo algorithms to approximately compute such an EWA when the number MM of aggregated functions can be large. We discuss in some detail the convergence of these algorithms and present numerical experiments that confirm our theoretical findings.Comment: Short version published in COLT 200

    Estimation of matrices with row sparsity

    Full text link
    An increasing number of applications is concerned with recovering a sparse matrix from noisy observations. In this paper, we consider the setting where each row of the unknown matrix is sparse. We establish minimax optimal rates of convergence for estimating matrices with row sparsity. A major focus in the present paper is on the derivation of lower bounds

    Estimation of high-dimensional low-rank matrices

    Full text link
    Suppose that we observe entries or, more generally, linear combinations of entries of an unknown m×Tm\times T-matrix AA corrupted by noise. We are particularly interested in the high-dimensional setting where the number mTmT of unknown entries can be much larger than the sample size NN. Motivated by several applications, we consider estimation of matrix AA under the assumption that it has small rank. This can be viewed as dimension reduction or sparsity assumption. In order to shrink toward a low-rank representation, we investigate penalized least squares estimators with a Schatten-pp quasi-norm penalty term, p≤1p\leq1. We study these estimators under two possible assumptions---a modified version of the restricted isometry condition and a uniform bound on the ratio "empirical norm induced by the sampling operator/Frobenius norm." The main results are stated as nonasymptotic upper bounds on the prediction risk and on the Schatten-qq risk of the estimators, where q∈[p,2]q\in[p,2]. The rates that we obtain for the prediction risk are of the form rm/Nrm/N (for m=Tm=T), up to logarithmic factors, where rr is the rank of AA. The particular examples of multi-task learning and matrix completion are worked out in detail. The proofs are based on tools from the theory of empirical processes. As a by-product, we derive bounds for the kkth entropy numbers of the quasi-convex Schatten class embeddings SpM↪S2MS_p^M\hookrightarrow S_2^M, p<1p<1, which are of independent interest.Comment: Published in at http://dx.doi.org/10.1214/10-AOS860 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org
    • …
    corecore