Search CORE

394 research outputs found

Concentration of quadratic forms under a Bernstein moment assumption

Author: Bellec Pierre C
Publication venue
Publication date: 24/01/2019
Field of study

A concentration result for quadratic form of independent subgaussian random variables is derived. If the moments of the random variables satisfy a "Bernstein condition", then the variance term of the Hanson-Wright inequality can be improved. The Bernstein condition is satisfied, for instance, by all log-concave subgaussian distributions.Comment: This short note presents a result that initially appeared in arXiv:1410.0346v1 (see Assumption 3.3). The result was later removed from arXiv:1410.0346 and the published version https://projecteuclid.org/euclid.aos/1519268423 due to space constraint

arXiv.org e-Print Archive

Sharp oracle inequalities for Least Squares estimators in shape restricted regression

Author: Bellec Pierre C.
Publication venue
Publication date: 07/08/2016
Field of study

The performance of Least Squares (LS) estimators is studied in isotonic, unimodal and convex regression. Our results have the form of sharp oracle inequalities that account for the model misspecification error. In isotonic and unimodal regression, the LS estimator achieves the nonparametric rate

n^{-2/3}

as well as a parametric rate of order

k/n

up to logarithmic factors, where

k

is the number of constant pieces of the true parameter. In univariate convex regression, the LS estimator satisfies an adaptive risk bound of order

q/n

up to logarithmic factors, where

q

is the number of affine pieces of the true regression function. This adaptive risk bound holds for any design points. While Guntuboyina and Sen (2013) established that the nonparametric rate of convex regression is of order

n^{-4/5}

for equispaced design points, we show that the nonparametric rate of convex regression can be as slow as

n^{-2/3}

for some worst-case design points. This phenomenon can be explained as follows: Although convexity brings more structure than unimodality, for some worst-case design points this extra structure is uninformative and the nonparametric rates of unimodal regression and convex regression are both

n^{-2/3}

arXiv.org e-Print Archive

Localized Gaussian width of $M$ -convex hulls with applications to Lasso and convex aggregation

Author: Bellec Pierre C
Publication venue
Publication date: 26/09/2017
Field of study

Upper and lower bounds are derived for the Gaussian mean width of the intersection of a convex hull of

M

points with an Euclidean ball of a given radius. The upper bound holds for any collection of extreme point bounded in Euclidean norm. The upper bound and the lower bound match up to a multiplicative constant whenever the extreme points satisfy a one sided Restricted Isometry Property. This bound is then applied to study the Lasso estimator in fixed-design regression, the Empirical Risk Minimizer in the anisotropic persistence problem, and the convex aggregation problem in density estimation

arXiv.org e-Print Archive

Aggregation of supports along the Lasso path

Author: Bellec Pierre C.
Publication venue
Publication date: 31/05/2016
Field of study

In linear regression with fixed design, we propose two procedures that aggregate a data-driven collection of supports. The collection is a subset of the

2^p

possible supports and both its cardinality and its elements can depend on the data. The procedures satisfy oracle inequalities with no assumption on the design matrix. Then we use these procedures to aggregate the supports that appear on the regularization path of the Lasso in order to construct an estimator that mimics the best Lasso estimator. If the restricted eigenvalue condition on the design matrix is satisfied, then this estimator achieves optimal prediction bounds. Finally, we discuss the computational cost of these procedures

arXiv.org e-Print Archive

Optimal exponential bounds for aggregation of density estimators

Author: Bellec Pierre C.
Publication venue: 'Bernoulli Society for Mathematical Statistics and Probability'
Publication date: 28/09/2016
Field of study

We consider the problem of model selection type aggregation in the context of density estimation. We first show that empirical risk minimization is sub-optimal for this problem and it shares this property with the exponential weights aggregate, empirical risk minimization over the convex hull of the dictionary functions, and all selectors. Using a penalty inspired by recent works on the

Q

-aggregation procedure, we derive a sharp oracle inequality in deviation under a simple boundedness assumption and we show that the rate is optimal in a minimax sense. Unlike the procedures based on exponential weights, this estimator is fully adaptive under the uniform prior. In particular, its construction does not rely on the sup-norm of the unknown density. By providing lower bounds with exponential tails, we show that the deviation term appearing in the sharp oracle inequalities cannot be improved.Comment: Published at http://dx.doi.org/10.3150/15-BEJ742 in the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm

arXiv.org e-Print Archive

Optimistic lower bounds for convex regularized least-squares

Author: Bellec Pierre C
Publication venue
Publication date: 06/10/2017
Field of study

Minimax lower bounds are pessimistic in nature: for any given estimator, minimax lower bounds yield the existence of a worst-case target vector

\beta^*_{worst}

for which the prediction error of the given estimator is bounded from below. However, minimax lower bounds shed no light on the prediction error of the given estimator for target vectors different than

\beta^*_{worst}

. A characterization of the prediction error of any convex regularized least-squares is given. This characterization provide both a lower bound and an upper bound on the prediction error. This produces lower bounds that are applicable for any target vector and not only for a single, worst-case

\beta^*_{worst}

. Finally, these lower and upper bounds on the prediction error are applied to the Lasso is sparse linear regression. We obtain a lower bound involving the compatibility constant for any tuning parameter, matching upper and lower bounds for the universal choice of the tuning parameter, and a lower bound for the Lasso with small tuning parameter

arXiv.org e-Print Archive

The cost-free nature of optimally tuning Tikhonov regularizers and other ordered smoothers

Author: Bellec Pierre C
Yang Dana
Publication venue
Publication date: 29/05/2019
Field of study

We consider the problem of selecting the best estimator among a family of Tikhonov regularized estimators, or, alternatively, to select a linear combination of these regularizers that is as good as the best regularizer in the family. Our theory reveals that if the Tikhonov regularizers share the same penalty matrix with different tuning parameters, a convex procedure based on

Q

-aggregation achieves the mean square error of the best estimator, up to a small error term no larger than

C\sigma^2

, where

\sigma^2

is the noise level and

C>0

is an absolute constant. Remarkably, the error term does not depend on the penalty matrix or the number of estimators as long as they share the same penalty matrix, i.e., it applies to any grid of tuning parameters, no matter how large the cardinality of the grid is. This reveals the surprising "cost-free" nature of optimally tuning Tikhonov regularizers, in striking contrast with the existing literature on aggregation of estimators where one typically has to pay a cost of

\sigma^2\log(M)

where

M

is the number of estimators in the family. The result holds, more generally, for any family of ordered linear smoothers. This encompasses Ridge regression as well as Principal Component Regression. The result is extended to the problem of tuning Tikhonov regularizers with different penalty matrices

arXiv.org e-Print Archive

Second order Stein: SURE for SURE and other applications in high-dimensional inference

Author: Bellec Pierre C
Zhang Cun-Hui
Publication venue
Publication date: 06/02/2020
Field of study

Stein's formula states that a random variable of the form

z^\top f(z) - \text{div} f(z)

is mean-zero for functions

f

with integrable gradient. Here,

\text{div} f

is the divergence of the function

f

and

z

is a standard normal vector. This paper aims to propose a Second Order Stein formula to characterize the variance of such random variables for all functions

f(z)

with square integrable gradient, and to demonstrate the usefulness of this formula in various applications. In the Gaussian sequence model, a consequence of Stein's formula is Stein's Unbiased Risk Estimate (SURE), an unbiased estimate of the mean squared risk for almost any estimator

\hat\mu

of the unknown mean. A first application of the Second Order Stein formula is an Unbiased Risk Estimate for SURE itself (SURE for SURE): an unbiased estimate {providing} information about the squared distance between SURE and the squared estimation error of

\hat\mu

. SURE for SURE has a simple form as a function of the data and is applicable to all

\hat\mu

with square integrable gradient, e.g. the Lasso and the Elastic Net. In addition to SURE for SURE, the following applications are developed: (1) Upper bounds on the risk of SURE when the estimation target is the mean squared error; (2) Confidence regions based on SURE; (3) Oracle inequalities satisfied by SURE-tuned estimates; (4) An upper bound on the variance of the size of the model selected by the Lasso; (5) Explicit expressions of SURE for SURE for the Lasso and the Elastic-Net; (6) In the linear model, a general semi-parametric scheme to de-bias a differentiable initial estimator for inference of a low-dimensional projection of the unknown

\beta

, with a characterization of the variance after de-biasing; and (7) An accuracy analysis of a Gaussian Monte Carlo scheme to approximate the divergence of functions

f: R^n\to R^n

arXiv.org e-Print Archive

Sharp oracle bounds for monotone and convex regression through aggregation

Author: Bellec Pierre C.
Tsybakov Alexandre B.
Publication venue
Publication date: 30/09/2015
Field of study

We derive oracle inequalities for the problems of isotonic and convex regression using the combination of

Q

-aggregation procedure and sparsity pattern aggregation. This improves upon the previous results including the oracle inequalities for the constrained least squares estimator. One of the improvements is that our oracle inequalities are sharp, i.e., with leading constant 1. It allows us to obtain bounds for the minimax regret thus accounting for model misspecification, which was not possible based on the previous results. Another improvement is that we obtain oracle inequalities both with high probability and in expectation

arXiv.org e-Print Archive

Slope meets Lasso: improved oracle bounds and optimality

Author: Bellec Pierre C.
Lecué Guillaume
Tsybakov Alexandre B.
Publication venue
Publication date: 24/05/2017
Field of study

We show that two polynomial time methods, a Lasso estimator with adaptively chosen tuning parameter and a Slope estimator, adaptively achieve the exact minimax prediction and

\ell_2

estimation rate

(s/n)\log (p/s)

in high-dimensional linear regression on the class of

s

-sparse target vectors in

\mathbb R^p

. This is done under the Restricted Eigenvalue (RE) condition for the Lasso and under a slightly more constraining assumption on the design for the Slope. The main results have the form of sharp oracle inequalities accounting for the model misspecification error. The minimax optimal bounds are also obtained for the

\ell_q

estimation errors with

1\le q\le 2

when the model is well-specified. The results are non-asymptotic, and hold both in probability and in expectation. The assumptions that we impose on the design are satisfied with high probability for a large class of random matrices with independent and possibly anisotropically distributed rows. We give a comparative analysis of conditions, under which oracle bounds for the Lasso and Slope estimators can be obtained. In particular, we show that several known conditions, such as the RE condition and the sparse eigenvalue condition are equivalent if the

\ell_2

-norms of regressors are uniformly bounded

arXiv.org e-Print Archive