Search CORE

40 research outputs found

Pivotal estimation in high-dimensional regression via linear programming

Author: A. Antoniadis
A. Belloni
A. Belloni
A. Dalalyan
A. Dalalyan
B. Efron
B.Y. Jing
E. Candès
F. Ye
N. Städler
P. Bertail
P. Bickel
P. Bühlmann
P. Rigollet
P. Rigollet
Publication venue
Publication date: 01/01/2013
Field of study

We propose a new method of estimation in high-dimensional linear regression model. It allows for very weak distributional assumptions including heteroscedasticity, and does not require the knowledge of the variance of random errors. The method is based on linear programming only, so that its numerical implementation is faster than for previously known techniques using conic programs, and it allows one to deal with higher dimensional models. We provide upper bounds for estimation and prediction errors of the proposed estimator showing that it achieves the same rate as in the more restrictive situation of fixed design and i.i.d. Gaussian errors with known variance. Following Gautier and Tsybakov (2011), we obtain the results under weaker sensitivity assumptions than the restricted eigenvalue or assimilated conditions

arXiv.org e-Print Archive

Crossref

Toulouse Capitole Publications

Toulouse 1 Capitole Publications

HAL-Polytechnique

Regularity Properties for Sparse Regression

Author: Dobriban Edgar
Fan Jianqing
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/12/2015
Field of study

Statistical and machine learning theory has developed several conditions ensuring that popular estimators such as the Lasso or the Dantzig selector perform well in high-dimensional sparse regression, including the restricted eigenvalue, compatibility, and

\ell_q

sensitivity properties. However, some of the central aspects of these conditions are not well understood. For instance, it is unknown if these conditions can be checked efficiently on any given data set. This is problematic, because they are at the core of the theory of sparse regression. Here we provide a rigorous proof that these conditions are NP-hard to check. This shows that the conditions are computationally infeasible to verify, and raises some questions about their practical applications. However, by taking an average-case perspective instead of the worst-case view of NP-hardness, we show that a particular condition,

\ell_q

sensitivity, has certain desirable properties. This condition is weaker and more general than the others. We show that it holds with high probability in models where the parent population is well behaved, and that it is robust to certain data processing steps. These results are desirable, as they provide guidance about when the condition, and more generally the theory of sparse regression, may be relevant in the analysis of high-dimensional correlated observational data.Comment: Manuscript shortened and more motivation added. To appear in Communications in Mathematics and Statistic

arXiv.org e-Print Archive

Princeton University Open Access Repository

PubMed Central

Optimal False Discovery Control of Minimax Estimator

Author: Cheng Guang
Song Qifan
Publication venue
Publication date: 24/12/2018
Field of study

In the analysis of high dimensional regression models, there are two important objectives: statistical estimation and variable selection. In literature, most works focus on either optimal estimation, e.g., minimax

L_2

error, or optimal selection behavior, e.g., minimax Hamming loss. However in this study, we investigate the subtle interplay between the estimation accuracy and selection behavior. Our result shows that an estimator's

L_2

error rate critically depends on its performance of type I error control. Essentially, the minimax convergence rate of false discovery rate over all rate-minimax estimators is a polynomial of the true sparsity ratio. This result helps us to characterize the false positive control of rate-optimal estimators under different sparsity regimes. More specifically, under near-linear sparsity, the number of yielded false positives always explodes to infinity under worst scenario, but the false discovery rate still converges to 0; under linear sparsity, even the false discovery rate doesn't asymptotically converge to 0. On the other side, in order to asymptotically eliminate all false discoveries, the estimator must be sub-optimal in terms of its convergence rate. This work attempts to offer rigorous analysis on the incompatibility phenomenon between selection consistency and rate-minimaxity observed in the high dimensional regression literature

arXiv.org e-Print Archive

초고차원 자료에 대한 교정 비볼록 벌점화 로지스틱 회귀분석

Author: 최세민
Publication venue: 서울대학교 대학원
Publication date: 01/08/2019
Field of study

학위논문(박사)--서울대학교 대학원 :자연과학대학 통계학과,2019. 8. 김용대.In high dimensional linear regression, penalized regression methods are used for estimation and variable selection simultaneously. The LASSO is a penalized regression method which is easy to compute the solution, but the LASSO solution is hard to satisfy the variable selection consistency. Nonconvex penalized regression methods such as the SCAD and the MCP have the oracle property which contains variable selection consistency. However, direct computation of the global solution to the nonconvex penalized regression is infeasible. The calibrated CCCP is developed which can obtain the oracle estimator as the unique local minimum. We propose the calibrated CCCP for logistic model. We prove that the calibrated CCCP for logistic model produces a consistent solution path which contains the oracle estimator with probability tending to one. Since the loss function for logistic model is not quadratic, we apply the MLQA-CCCP algorithm for the penalized objective function. Furthermore, we extend the theoretical result to the case of Huber loss instead of the logistic loss. The numerical experiments support our theoretical results.고차원 선형회귀분석에서 벌점화 회귀 방법은 추정과 변수선택을 동시에 하는 방법이다. 라소는 벌점화 회귀 방법의 한가지로, 그 해를 구하기 쉽다는 장점이 있으나 변수선택 일치성을 만족하기 어렵다. MCP와 SCAD 등과 같은 비볼록 벌점화 회귀 방법은 변수선택 일치성을 포함한 신의 성질을 가진다. 그러나 비볼록 벌점화 회귀에서 전역 최적해의 직접적인 계산이 어려워 신의 추정량을 구하기가 어렵다. 한편, 조정된 CCCP 알고리즘으로 구한 유일한 국소 최소해는 신의 추정량이 된다는 이론적 사실이 알려져있다. 본 논문에서는 로지스틱 모형에 대한 조정된 CCCP 알고리즘을 제안한다. 그리고 로지스틱 모형의 조정된 CCCP 알고리즘으로 계산된 해가 1로 향해가는 확률로 신의 추정량이 됨을 증명한다. 로지스틱모형에서는 손실함수가 2차함수가 아니기 때문에, MLQA-CCCP 알고리즘을 적용하였다. 또한, 로지스틱 손실함수를 확장하여 Huber 손실함수에서도 같은 결과가 성립함을 증명한다. 본 논문의 수치 실험들은 이론적 결과들을 뒷받침한다.1. Introduction 1 1.1 Overview 1 1.2 Outline of the thesis 4 2. Literature review : Penalized Regression on High Dimensional Regression 5 2.1 Introduction 5 2.2 LASSO 8 2.3 Nonconvex penalized regression 12 2.4 The calibrated CCCP[Wang et al.,2013] 18 2.5 Review of compatibility condition 19 2.6 Algorithms for l1 penalized regression 24 3. The calibrated CCCP for logistic model 26 3.1 Introduction 26 3.2 The proposed algorithm 27 3.3 Assumptions 31 3.4 Theoretical properties 34 4. Experiments 44 4.1 Simulation studies 44 4.2 Real data analysis 51 5. Conclusion 54 Bibliography 56 Abstract (in Korean) 62Docto

SNU Open Repository and Archive

Geometric Inference for General High-Dimensional Linear Inverse Problems

Author: Cai Tony
Liang Tengyuan
Rakhlin Alexander
Publication venue: ScholarlyCommons
Publication date: 01/07/2016
Field of study

This paper presents a unified geometric framework for the statistical analysis of a general ill-posed linear inverse model which includes as special cases noisy compressed sensing, sign vector recovery, trace regression, orthogonal matrix estimation and noisy matrix completion. We propose computationally feasible convex programs for statistical inference including estimation, confidence intervals and hypothesis testing. A theoretical framework is developed to characterize the local estimation rate of convergence and to provide statistical inference guarantees. Our results are built based on the local conic geometry and duality. The difficulty of statistical inference is captured by the geometric characterization of the local tangent cone through the Gaussian width and Sudakov estimate

ScholarlyCommons@Penn