61 research outputs found

    Honest variable selection in linear and logistic regression models via β„“1\ell_1 and β„“1+β„“2\ell_1+\ell_2 penalization

    Full text link
    This paper investigates correct variable selection in finite samples via β„“1\ell_1 and β„“1+β„“2\ell_1+\ell_2 type penalization schemes. The asymptotic consistency of variable selection immediately follows from this analysis. We focus on logistic and linear regression models. The following questions are central to our paper: given a level of confidence 1βˆ’Ξ΄1-\delta, under which assumptions on the design matrix, for which strength of the signal and for what values of the tuning parameters can we identify the true model at the given level of confidence? Formally, if I^\widehat{I} is an estimate of the true variable set Iβˆ—I^*, we study conditions under which P(I^=Iβˆ—)β‰₯1βˆ’Ξ΄\mathbb{P}(\widehat{I}=I^*)\geq 1-\delta, for a given sample size nn, number of parameters MM and confidence 1βˆ’Ξ΄1-\delta. We show that in identifiable models, both methods can recover coefficients of size 1n\frac{1}{\sqrt{n}}, up to small multiplicative constants and logarithmic factors in MM and 1Ξ΄\frac{1}{\delta}. The advantage of the β„“1+β„“2\ell_1+\ell_2 penalization over the β„“1\ell_1 is minor for the variable selection problem, for the models we consider here. Whereas the former estimates are unique, and become more stable for highly correlated data matrices as one increases the tuning parameter of the β„“2\ell_2 part, too large an increase in this parameter value may preclude variable selection.Comment: Published in at http://dx.doi.org/10.1214/08-EJS287 the Electronic Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Consistent selection via the Lasso for high dimensional approximating regression models

    Full text link
    In this article we investigate consistency of selection in regression models via the popular Lasso method. Here we depart from the traditional linear regression assumption and consider approximations of the regression function ff with elements of a given dictionary of MM functions. The target for consistency is the index set of those functions from this dictionary that realize the most parsimonious approximation to ff among all linear combinations belonging to an L2L_2 ball centered at ff and of radius rn,M2r_{n,M}^2. In this framework we show that a consistent estimate of this index set can be derived via β„“1\ell_1 penalized least squares, with a data dependent penalty and with tuning sequence rn,M>log⁑(Mn)/nr_{n,M}>\sqrt{\log(Mn)/n}, where nn is the sample size. Our results hold for any 1≀M≀nΞ³1\leq M\leq n^{\gamma}, for any Ξ³>0\gamma>0.Comment: Published in at http://dx.doi.org/10.1214/074921708000000101 the IMS Collections (http://www.imstat.org/publications/imscollections.htm) by the Institute of Mathematical Statistics (http://www.imstat.org

    Sparsity oracle inequalities for the Lasso

    Full text link
    This paper studies oracle properties of β„“1\ell_1-penalized least squares in nonparametric regression setting with random design. We show that the penalized least squares estimator satisfies sparsity oracle inequalities, i.e., bounds in terms of the number of non-zero components of the oracle vector. The results are valid even when the dimension of the model is (much) larger than the sample size and the regression matrix is not positive definite. They can be applied to high-dimensional linear regression, to nonparametric adaptive regression estimation and to the problem of aggregation of arbitrary estimators.Comment: Published at http://dx.doi.org/10.1214/07-EJS008 in the Electronic Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Convex Banding of the Covariance Matrix

    Full text link
    We introduce a new sparse estimator of the covariance matrix for high-dimensional models in which the variables have a known ordering. Our estimator, which is the solution to a convex optimization problem, is equivalently expressed as an estimator which tapers the sample covariance matrix by a Toeplitz, sparsely-banded, data-adaptive matrix. As a result of this adaptivity, the convex banding estimator enjoys theoretical optimality properties not attained by previous banding or tapered estimators. In particular, our convex banding estimator is minimax rate adaptive in Frobenius and operator norms, up to log factors, over commonly-studied classes of covariance matrices, and over more general classes. Furthermore, it correctly recovers the bandwidth when the true covariance is exactly banded. Our convex formulation admits a simple and efficient algorithm. Empirical studies demonstrate its practical effectiveness and illustrate that our exactly-banded estimator works well even when the true covariance matrix is only close to a banded matrix, confirming our theoretical results. Our method compares favorably with all existing methods, in terms of accuracy and speed. We illustrate the practical merits of the convex banding estimator by showing that it can be used to improve the performance of discriminant analysis for classifying sound recordings

    Optimal selection of reduced rank estimators of high-dimensional matrices

    Full text link
    We introduce a new criterion, the Rank Selection Criterion (RSC), for selecting the optimal reduced rank estimator of the coefficient matrix in multivariate response regression models. The corresponding RSC estimator minimizes the Frobenius norm of the fit plus a regularization term proportional to the number of parameters in the reduced rank model. The rank of the RSC estimator provides a consistent estimator of the rank of the coefficient matrix; in general, the rank of our estimator is a consistent estimate of the effective rank, which we define to be the number of singular values of the target matrix that are appropriately large. The consistency results are valid not only in the classic asymptotic regime, when nn, the number of responses, and pp, the number of predictors, stay bounded, and mm, the number of observations, grows, but also when either, or both, nn and pp grow, possibly much faster than mm. We establish minimax optimal bounds on the mean squared errors of our estimators. Our finite sample performance bounds for the RSC estimator show that it achieves the optimal balance between the approximation error and the penalty term. Furthermore, our procedure has very low computational complexity, linear in the number of candidate models, making it particularly appealing for large scale problems. We contrast our estimator with the nuclear norm penalized least squares (NNP) estimator, which has an inherently higher computational complexity than RSC, for multivariate regression models. We show that NNP has estimation properties similar to those of RSC, albeit under stronger conditions. However, it is not as parsimonious as RSC. We offer a simple correction of the NNP estimator which leads to consistent rank estimation.Comment: Published in at http://dx.doi.org/10.1214/11-AOS876 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org) (some typos corrected
    • …
    corecore