233 research outputs found
Discussion of ``2004 IMS Medallion Lecture: Local Rademacher complexities and oracle inequalities in risk minimization'' by V. Koltchinskii
Discussion of ``2004 IMS Medallion Lecture: Local Rademacher complexities and
oracle inequalities in risk minimization'' by V. Koltchinskii [arXiv:0708.0083]Comment: Published at http://dx.doi.org/10.1214/009053606000001064 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Sparse Regression Learning by Aggregation and Langevin Monte-Carlo
We consider the problem of regression learning for deterministic design and
independent random errors. We start by proving a sharp PAC-Bayesian type bound
for the exponentially weighted aggregate (EWA) under the expected squared
empirical loss. For a broad class of noise distributions the presented bound is
valid whenever the temperature parameter of the EWA is larger than or
equal to , where is the noise variance. A remarkable
feature of this result is that it is valid even for unbounded regression
functions and the choice of the temperature parameter depends exclusively on
the noise level. Next, we apply this general bound to the problem of
aggregating the elements of a finite-dimensional linear space spanned by a
dictionary of functions . We allow to be much larger
than the sample size but we assume that the true regression function can be
well approximated by a sparse linear combination of functions . Under
this sparsity scenario, we propose an EWA with a heavy tailed prior and we show
that it satisfies a sparsity oracle inequality with leading constant one.
Finally, we propose several Langevin Monte-Carlo algorithms to approximately
compute such an EWA when the number of aggregated functions can be large.
We discuss in some detail the convergence of these algorithms and present
numerical experiments that confirm our theoretical findings.Comment: Short version published in COLT 200
Estimation of matrices with row sparsity
An increasing number of applications is concerned with recovering a sparse
matrix from noisy observations. In this paper, we consider the setting where
each row of the unknown matrix is sparse. We establish minimax optimal rates of
convergence for estimating matrices with row sparsity. A major focus in the
present paper is on the derivation of lower bounds
Estimation of high-dimensional low-rank matrices
Suppose that we observe entries or, more generally, linear combinations of
entries of an unknown -matrix corrupted by noise. We are
particularly interested in the high-dimensional setting where the number
of unknown entries can be much larger than the sample size . Motivated by
several applications, we consider estimation of matrix under the assumption
that it has small rank. This can be viewed as dimension reduction or sparsity
assumption. In order to shrink toward a low-rank representation, we investigate
penalized least squares estimators with a Schatten- quasi-norm penalty term,
. We study these estimators under two possible assumptions---a modified
version of the restricted isometry condition and a uniform bound on the ratio
"empirical norm induced by the sampling operator/Frobenius norm." The main
results are stated as nonasymptotic upper bounds on the prediction risk and on
the Schatten- risk of the estimators, where . The rates that we
obtain for the prediction risk are of the form (for ), up to
logarithmic factors, where is the rank of . The particular examples of
multi-task learning and matrix completion are worked out in detail. The proofs
are based on tools from the theory of empirical processes. As a by-product, we
derive bounds for the th entropy numbers of the quasi-convex Schatten class
embeddings , , which are of independent
interest.Comment: Published in at http://dx.doi.org/10.1214/10-AOS860 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Fast learning rates for plug-in classifiers under the margin condition
It has been recently shown that, under the margin (or low noise) assumption,
there exist classifiers attaining fast rates of convergence of the excess Bayes
risk, i.e., the rates faster than . The works on this subject
suggested the following two conjectures: (i) the best achievable fast rate is
of the order , and (ii) the plug-in classifiers generally converge
slower than the classifiers based on empirical risk minimization. We show that
both conjectures are not correct. In particular, we construct plug-in
classifiers that can achieve not only the fast, but also the {\it super-fast}
rates, i.e., the rates faster than . We establish minimax lower bounds
showing that the obtained rates cannot be improved.Comment: 36 page
- …