559 research outputs found
Exponential Screening and optimal rates of sparse estimation
In high-dimensional linear regression, the goal pursued here is to estimate
an unknown regression function using linear combinations of a suitable set of
covariates. One of the key assumptions for the success of any statistical
procedure in this setup is to assume that the linear combination is sparse in
some sense, for example, that it involves only few covariates. We consider a
general, non necessarily linear, regression with Gaussian noise and study a
related question that is to find a linear combination of approximating
functions, which is at the same time sparse and has small mean squared error
(MSE). We introduce a new estimation procedure, called Exponential Screening
that shows remarkable adaptation properties. It adapts to the linear
combination that optimally balances MSE and sparsity, whether the latter is
measured in terms of the number of non-zero entries in the combination
( norm) or in terms of the global weight of the combination (
norm). The power of this adaptation result is illustrated by showing that
Exponential Screening solves optimally and simultaneously all the problems of
aggregation in Gaussian regression that have been discussed in the literature.
Moreover, we show that the performance of the Exponential Screening estimator
cannot be improved in a minimax sense, even if the optimal sparsity is known in
advance. The theoretical and numerical superiority of Exponential Screening
compared to state-of-the-art sparse procedures is also discussed
Discussion of ``2004 IMS Medallion Lecture: Local Rademacher complexities and oracle inequalities in risk minimization'' by V. Koltchinskii
Discussion of ``2004 IMS Medallion Lecture: Local Rademacher complexities and
oracle inequalities in risk minimization'' by V. Koltchinskii [arXiv:0708.0083]Comment: Published at http://dx.doi.org/10.1214/009053606000001064 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Linear and convex aggregation of density estimators
We study the problem of linear and convex aggregation of estimators of a
density with respect to the mean squared risk. We provide procedures for linear
and convex aggregation and we prove oracle inequalities for their risks. We
also obtain lower bounds showing that these procedures are rate optimal in a
minimax sense. As an example, we apply general results to aggregation of
multivariate kernel density estimators with different bandwidths. We show that
linear and convex aggregates mimic the kernel oracles in asymptotically exact
sense for a large class of kernels including Gaussian, Silverman's and
Pinsker's ones. We prove that, for Pinsker's kernel, the proposed aggregates
are sharp asymptotically minimax simultaneously over a large scale of Sobolev
classes of densities. Finally, we provide simulations demonstrating performance
of the convex aggregation procedure.Comment: 22 page
Sparse Regression Learning by Aggregation and Langevin Monte-Carlo
We consider the problem of regression learning for deterministic design and
independent random errors. We start by proving a sharp PAC-Bayesian type bound
for the exponentially weighted aggregate (EWA) under the expected squared
empirical loss. For a broad class of noise distributions the presented bound is
valid whenever the temperature parameter of the EWA is larger than or
equal to , where is the noise variance. A remarkable
feature of this result is that it is valid even for unbounded regression
functions and the choice of the temperature parameter depends exclusively on
the noise level. Next, we apply this general bound to the problem of
aggregating the elements of a finite-dimensional linear space spanned by a
dictionary of functions . We allow to be much larger
than the sample size but we assume that the true regression function can be
well approximated by a sparse linear combination of functions . Under
this sparsity scenario, we propose an EWA with a heavy tailed prior and we show
that it satisfies a sparsity oracle inequality with leading constant one.
Finally, we propose several Langevin Monte-Carlo algorithms to approximately
compute such an EWA when the number of aggregated functions can be large.
We discuss in some detail the convergence of these algorithms and present
numerical experiments that confirm our theoretical findings.Comment: Short version published in COLT 200
Estimation of matrices with row sparsity
An increasing number of applications is concerned with recovering a sparse
matrix from noisy observations. In this paper, we consider the setting where
each row of the unknown matrix is sparse. We establish minimax optimal rates of
convergence for estimating matrices with row sparsity. A major focus in the
present paper is on the derivation of lower bounds
Estimation of high-dimensional low-rank matrices
Suppose that we observe entries or, more generally, linear combinations of
entries of an unknown -matrix corrupted by noise. We are
particularly interested in the high-dimensional setting where the number
of unknown entries can be much larger than the sample size . Motivated by
several applications, we consider estimation of matrix under the assumption
that it has small rank. This can be viewed as dimension reduction or sparsity
assumption. In order to shrink toward a low-rank representation, we investigate
penalized least squares estimators with a Schatten- quasi-norm penalty term,
. We study these estimators under two possible assumptions---a modified
version of the restricted isometry condition and a uniform bound on the ratio
"empirical norm induced by the sampling operator/Frobenius norm." The main
results are stated as nonasymptotic upper bounds on the prediction risk and on
the Schatten- risk of the estimators, where . The rates that we
obtain for the prediction risk are of the form (for ), up to
logarithmic factors, where is the rank of . The particular examples of
multi-task learning and matrix completion are worked out in detail. The proofs
are based on tools from the theory of empirical processes. As a by-product, we
derive bounds for the th entropy numbers of the quasi-convex Schatten class
embeddings , , which are of independent
interest.Comment: Published in at http://dx.doi.org/10.1214/10-AOS860 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
- …