68 research outputs found
Sharp oracle inequalities for Least Squares estimators in shape restricted regression
The performance of Least Squares (LS) estimators is studied in isotonic,
unimodal and convex regression. Our results have the form of sharp oracle
inequalities that account for the model misspecification error. In isotonic and
unimodal regression, the LS estimator achieves the nonparametric rate
as well as a parametric rate of order up to logarithmic
factors, where is the number of constant pieces of the true parameter.
In univariate convex regression, the LS estimator satisfies an adaptive risk
bound of order up to logarithmic factors, where is the number of
affine pieces of the true regression function. This adaptive risk bound holds
for any design points. While Guntuboyina and Sen (2013) established that the
nonparametric rate of convex regression is of order for equispaced
design points, we show that the nonparametric rate of convex regression can be
as slow as for some worst-case design points. This phenomenon can be
explained as follows: Although convexity brings more structure than
unimodality, for some worst-case design points this extra structure is
uninformative and the nonparametric rates of unimodal regression and convex
regression are both
Concentration of quadratic forms under a Bernstein moment assumption
A concentration result for quadratic form of independent subgaussian random
variables is derived. If the moments of the random variables satisfy a
"Bernstein condition", then the variance term of the Hanson-Wright inequality
can be improved. The Bernstein condition is satisfied, for instance, by all
log-concave subgaussian distributions.Comment: This short note presents a result that initially appeared in
arXiv:1410.0346v1 (see Assumption 3.3). The result was later removed from
arXiv:1410.0346 and the published version
https://projecteuclid.org/euclid.aos/1519268423 due to space constraint
Aggregation of supports along the Lasso path
In linear regression with fixed design, we propose two procedures that
aggregate a data-driven collection of supports. The collection is a subset of
the possible supports and both its cardinality and its elements can
depend on the data. The procedures satisfy oracle inequalities with no
assumption on the design matrix. Then we use these procedures to aggregate the
supports that appear on the regularization path of the Lasso in order to
construct an estimator that mimics the best Lasso estimator. If the restricted
eigenvalue condition on the design matrix is satisfied, then this estimator
achieves optimal prediction bounds. Finally, we discuss the computational cost
of these procedures
Optimal exponential bounds for aggregation of density estimators
We consider the problem of model selection type aggregation in the context of
density estimation. We first show that empirical risk minimization is
sub-optimal for this problem and it shares this property with the exponential
weights aggregate, empirical risk minimization over the convex hull of the
dictionary functions, and all selectors. Using a penalty inspired by recent
works on the -aggregation procedure, we derive a sharp oracle inequality in
deviation under a simple boundedness assumption and we show that the rate is
optimal in a minimax sense. Unlike the procedures based on exponential weights,
this estimator is fully adaptive under the uniform prior. In particular, its
construction does not rely on the sup-norm of the unknown density. By providing
lower bounds with exponential tails, we show that the deviation term appearing
in the sharp oracle inequalities cannot be improved.Comment: Published at http://dx.doi.org/10.3150/15-BEJ742 in the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
Optimistic lower bounds for convex regularized least-squares
Minimax lower bounds are pessimistic in nature: for any given estimator,
minimax lower bounds yield the existence of a worst-case target vector
for which the prediction error of the given estimator is
bounded from below. However, minimax lower bounds shed no light on the
prediction error of the given estimator for target vectors different than
. A characterization of the prediction error of any convex
regularized least-squares is given. This characterization provide both a lower
bound and an upper bound on the prediction error. This produces lower bounds
that are applicable for any target vector and not only for a single, worst-case
. Finally, these lower and upper bounds on the prediction
error are applied to the Lasso is sparse linear regression. We obtain a lower
bound involving the compatibility constant for any tuning parameter, matching
upper and lower bounds for the universal choice of the tuning parameter, and a
lower bound for the Lasso with small tuning parameter
The cost-free nature of optimally tuning Tikhonov regularizers and other ordered smoothers
We consider the problem of selecting the best estimator among a family of
Tikhonov regularized estimators, or, alternatively, to select a linear
combination of these regularizers that is as good as the best regularizer in
the family. Our theory reveals that if the Tikhonov regularizers share the same
penalty matrix with different tuning parameters, a convex procedure based on
-aggregation achieves the mean square error of the best estimator, up to a
small error term no larger than , where is the noise
level and is an absolute constant. Remarkably, the error term does not
depend on the penalty matrix or the number of estimators as long as they share
the same penalty matrix, i.e., it applies to any grid of tuning parameters, no
matter how large the cardinality of the grid is. This reveals the surprising
"cost-free" nature of optimally tuning Tikhonov regularizers, in striking
contrast with the existing literature on aggregation of estimators where one
typically has to pay a cost of where is the number of
estimators in the family. The result holds, more generally, for any family of
ordered linear smoothers. This encompasses Ridge regression as well as
Principal Component Regression. The result is extended to the problem of tuning
Tikhonov regularizers with different penalty matrices
Second order Stein: SURE for SURE and other applications in high-dimensional inference
Stein's formula states that a random variable of the form is mean-zero for functions with integrable gradient. Here,
is the divergence of the function and is a standard
normal vector. This paper aims to propose a Second Order Stein formula to
characterize the variance of such random variables for all functions
with square integrable gradient, and to demonstrate the usefulness of this
formula in various applications.
In the Gaussian sequence model, a consequence of Stein's formula is Stein's
Unbiased Risk Estimate (SURE), an unbiased estimate of the mean squared risk
for almost any estimator of the unknown mean. A first application of
the Second Order Stein formula is an Unbiased Risk Estimate for SURE itself
(SURE for SURE): an unbiased estimate {providing} information about the squared
distance between SURE and the squared estimation error of . SURE for
SURE has a simple form as a function of the data and is applicable to all
with square integrable gradient, e.g. the Lasso and the Elastic Net.
In addition to SURE for SURE, the following applications are developed: (1)
Upper bounds on the risk of SURE when the estimation target is the mean squared
error; (2) Confidence regions based on SURE; (3) Oracle inequalities satisfied
by SURE-tuned estimates; (4) An upper bound on the variance of the size of the
model selected by the Lasso; (5) Explicit expressions of SURE for SURE for the
Lasso and the Elastic-Net; (6) In the linear model, a general semi-parametric
scheme to de-bias a differentiable initial estimator for inference of a
low-dimensional projection of the unknown , with a characterization of
the variance after de-biasing; and (7) An accuracy analysis of a Gaussian Monte
Carlo scheme to approximate the divergence of functions
Out-of-sample error estimate for robust M-estimators with convex penalty
A generic out-of-sample error estimate is proposed for robust -estimators
regularized with a convex penalty in high-dimensional linear regression where
is observed and are of the same order. If is the
derivative of the robust data-fitting loss , the estimate depends on the
observed data only through the quantities ,
and the derivatives and
for fixed .
The out-of-sample error estimate enjoys a relative error of order
in a linear model with Gaussian covariates and independent noise, either
non-asymptotically when or asymptotically in the
high-dimensional asymptotic regime . General
differentiable loss functions are allowed provided that is
1-Lipschitz. The validity of the out-of-sample error estimate holds either
under a strong convexity assumption, or for the -penalized Huber
M-estimator if the number of corrupted observations and sparsity of the true
are bounded from above by for some small enough constant
independent of .
For the square loss and in the absence of corruption in the response, the
results additionally yield -consistent estimates of the noise
variance and of the generalization error. This generalizes, to arbitrary convex
penalty, estimates that were previously known for the Lasso.Comment: This version adds simulations for the nuclear norm penalt
Optimal bounds for aggregation of affine estimators
We study the problem of aggregation of estimators when the estimators are not
independent of the data used for aggregation and no sample splitting is
allowed. If the estimators are deterministic vectors, it is well known that the
minimax rate of aggregation is of order , where is the number of
estimators to aggregate. It is proved that for affine estimators, the minimax
rate of aggregation is unchanged: it is possible to handle the linear
dependence between the affine estimators and the data used for aggregation at
no extra cost. The minimax rate is not impacted either by the variance of the
affine estimators, or any other measure of their statistical complexity. The
minimax rate is attained with a penalized procedure over the convex hull of the
estimators, for a penalty that is inspired from the -aggregation procedure.
The results follow from the interplay between the penalty, strong convexity and
concentration.Comment: Published at https://projecteuclid.org/euclid.aos/1519268423 in the
Annals of Statistics (http://imstat.org/aos/ ) by the Institute of
Mathematical Statistics (http://imstat.org/
The noise barrier and the large signal bias of the Lasso and other convex estimators
Convex estimators such as the Lasso, the matrix Lasso and the group Lasso
have been studied extensively in the last two decades, demonstrating great
success in both theory and practice. Two quantities are introduced, the noise
barrier and the large scale bias, that provides insights on the performance of
these convex regularized estimators. It is now well understood that the Lasso
achieves fast prediction rates, provided that the correlations of the design
satisfy some Restricted Eigenvalue or Compatibility condition, and provided
that the tuning parameter is large enough. Using the two quantities introduced
in the paper, we show that the compatibility condition on the design matrix is
actually unavoidable to achieve fast prediction rates with the Lasso. The Lasso
must incur a loss due to the correlations of the design matrix, measured in
terms of the compatibility constant. This results holds for any design matrix,
any active subset of covariates, and any tuning parameter. It is now well known
that the Lasso enjoys a dimension reduction property: the prediction error is
of order where is the sparsity; even if the ambient
dimension is much larger than . Such results require that the tuning
parameters is greater than some universal threshold. We characterize sharp
phase transitions for the tuning parameter of the Lasso around a critical
threshold dependent on . If is equal or larger than this critical
threshold, the Lasso is minimax over -sparse target vectors. If is
equal or smaller than critical threshold, the Lasso incurs a loss of order
--which corresponds to a model of size -- even if the target
vector has fewer than nonzero coefficients. Remarkably, the lower bounds
obtained in the paper also apply to random, data-driven tuning parameters. The
results extend to convex penalties beyond the Lasso.Comment: This paper supersedes the previous article arXiv:1703.0133
- β¦