Search CORE

61 research outputs found

Honest variable selection in linear and logistic regression models via $\ell_1$ and $\ell_1+\ell_2$ penalization

Author: Bunea Florentina
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2008
Field of study

This paper investigates correct variable selection in finite samples via

\ell_1

and

\ell_1+\ell_2

type penalization schemes. The asymptotic consistency of variable selection immediately follows from this analysis. We focus on logistic and linear regression models. The following questions are central to our paper: given a level of confidence

1-\delta

, under which assumptions on the design matrix, for which strength of the signal and for what values of the tuning parameters can we identify the true model at the given level of confidence? Formally, if

\widehat{I}

is an estimate of the true variable set

I^*

, we study conditions under which

\mathbb{P}(\widehat{I}=I^*)\geq 1-\delta

, for a given sample size

n

, number of parameters

M

and confidence

1-\delta

. We show that in identifiable models, both methods can recover coefficients of size

\frac{1}{\sqrt{n}}

, up to small multiplicative constants and logarithmic factors in

M

and

\frac{1}{\delta}

. The advantage of the

\ell_1+\ell_2

penalization over the

\ell_1

is minor for the variable selection problem, for the models we consider here. Whereas the former estimates are unique, and become more stable for highly correlated data matrices as one increases the tuning parameter of the

\ell_2

part, too large an increase in this parameter value may preclude variable selection.Comment: Published in at http://dx.doi.org/10.1214/08-EJS287 the Electronic Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

Consistent selection via the Lasso for high dimensional approximating regression models

Author: Bunea Florentina
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2008
Field of study

In this article we investigate consistency of selection in regression models via the popular Lasso method. Here we depart from the traditional linear regression assumption and consider approximations of the regression function

f

with elements of a given dictionary of

M

functions. The target for consistency is the index set of those functions from this dictionary that realize the most parsimonious approximation to

f

among all linear combinations belonging to an

L_2

ball centered at

f

and of radius

r_{n,M}^2

. In this framework we show that a consistent estimate of this index set can be derived via

\ell_1

penalized least squares, with a data dependent penalty and with tuning sequence

r_{n,M}>\sqrt{\log(Mn)/n}

, where

n

is the sample size. Our results hold for any

1\leq M\leq n^{\gamma}

, for any

\gamma>0

.Comment: Published in at http://dx.doi.org/10.1214/074921708000000101 the IMS Collections (http://www.imstat.org/publications/imscollections.htm) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Sparsity oracle inequalities for the Lasso

Author: Bunea Florentina
Tsybakov Alexandre
Wegkamp Marten
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2007
Field of study

This paper studies oracle properties of

\ell_1

-penalized least squares in nonparametric regression setting with random design. We show that the penalized least squares estimator satisfies sparsity oracle inequalities, i.e., bounds in terms of the number of non-zero components of the oracle vector. The results are valid even when the dimension of the model is (much) larger than the sample size and the regression matrix is not positive definite. They can be applied to high-dimensional linear regression, to nonparametric adaptive regression estimation and to the problem of aggregation of arbitrary estimators.Comment: Published at http://dx.doi.org/10.1214/07-EJS008 in the Electronic Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Convex Banding of the Covariance Matrix

Author: Bien Jacob
Bunea Florentina
Xiao Luo
Publication venue
Publication date: 23/05/2014
Field of study

We introduce a new sparse estimator of the covariance matrix for high-dimensional models in which the variables have a known ordering. Our estimator, which is the solution to a convex optimization problem, is equivalently expressed as an estimator which tapers the sample covariance matrix by a Toeplitz, sparsely-banded, data-adaptive matrix. As a result of this adaptivity, the convex banding estimator enjoys theoretical optimality properties not attained by previous banding or tapered estimators. In particular, our convex banding estimator is minimax rate adaptive in Frobenius and operator norms, up to log factors, over commonly-studied classes of covariance matrices, and over more general classes. Furthermore, it correctly recovers the bandwidth when the true covariance is exactly banded. Our convex formulation admits a simple and efficient algorithm. Empirical studies demonstrate its practical effectiveness and illustrate that our exactly-banded estimator works well even when the true covariance matrix is only close to a banded matrix, confirming our theoretical results. Our method compares favorably with all existing methods, in terms of accuracy and speed. We illustrate the practical merits of the convex banding estimator by showing that it can be used to improve the performance of discriminant analysis for classifying sound recordings

arXiv.org e-Print Archive

CiteSeerX

FigShare

Optimal selection of reduced rank estimators of high-dimensional matrices

Author: Bunea Florentina
She Yiyuan
Wegkamp Marten H.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 16/10/2011
Field of study

We introduce a new criterion, the Rank Selection Criterion (RSC), for selecting the optimal reduced rank estimator of the coefficient matrix in multivariate response regression models. The corresponding RSC estimator minimizes the Frobenius norm of the fit plus a regularization term proportional to the number of parameters in the reduced rank model. The rank of the RSC estimator provides a consistent estimator of the rank of the coefficient matrix; in general, the rank of our estimator is a consistent estimate of the effective rank, which we define to be the number of singular values of the target matrix that are appropriately large. The consistency results are valid not only in the classic asymptotic regime, when

n

, the number of responses, and

p

, the number of predictors, stay bounded, and

m

, the number of observations, grows, but also when either, or both,

n

and

p

grow, possibly much faster than

m

. We establish minimax optimal bounds on the mean squared errors of our estimators. Our finite sample performance bounds for the RSC estimator show that it achieves the optimal balance between the approximation error and the penalty term. Furthermore, our procedure has very low computational complexity, linear in the number of candidate models, making it particularly appealing for large scale problems. We contrast our estimator with the nuclear norm penalized least squares (NNP) estimator, which has an inherently higher computational complexity than RSC, for multivariate regression models. We show that NNP has estimation properties similar to those of RSC, albeit under stronger conditions. However, it is not as parsimonious as RSC. We offer a simple correction of the NNP estimator which leads to consistent rank estimation.Comment: Published in at http://dx.doi.org/10.1214/11-AOS876 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org) (some typos corrected

arXiv.org e-Print Archive

Crossref