Search CORE

35,546 research outputs found

High-Dimensional Screening Using Multiple Grouping of Variables

Author: Vats Divyanshu
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 02/12/2013
Field of study

Screening is the problem of finding a superset of the set of non-zero entries in an unknown p-dimensional vector \beta* given n noisy observations. Naturally, we want this superset to be as small as possible. We propose a novel framework for screening, which we refer to as Multiple Grouping (MuG), that groups variables, performs variable selection over the groups, and repeats this process multiple number of times to estimate a sequence of sets that contains the non-zero entries in \beta*. Screening is done by taking an intersection of all these estimated sets. The MuG framework can be used in conjunction with any group based variable selection algorithm. In the high-dimensional setting, where p >> n, we show that when MuG is used with the group Lasso estimator, screening can be consistently performed without using any tuning parameter. Our numerical simulations clearly show the merits of using the MuG framework in practice.Comment: This paper will appear in the IEEE Transactions on Signal Processing. See http://www.ima.umn.edu/~dvats/MuGScreening.html for more detail

arXiv.org e-Print Archive

CiteSeerX

Crossref

Recommended from our members

Covariate-assisted ranking and screening for large-scale two-sample inference

Author: Cai T. Tony
Sun Wenguang
Wang Weinan
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Two-sample multiple testing has a wide range of applications. The conventionalpractice first reduces the original observations to a vector of p-values and then chooses a cutoffto adjust for multiplicity. However, this data reduction step could cause significant loss ofinformation and thus lead to suboptimal testing procedures.We introduce a new framework fortwo-sample multiple testing by incorporating a carefully constructed auxiliary variable in inferenceto improve the power. A data-driven multiple-testing procedure is developed by employinga covariate-assisted ranking and screening (CARS) approach that optimally combines the informationfrom both the primary and the auxiliary variables. The proposed CARS procedureis shown to be asymptotically valid and optimal for false discovery rate control. The procedureis implemented in the R package CARS. Numerical results confirm the effectiveness of CARSin false discovery rate control and show that it achieves substantial power gain over existingmethods. CARS is also illustrated through an application to the analysis of a satellite imagingdata set for supernova detection

eScholarship - University of California

Scalable Sparse Cox's Regression for Large-Scale Survival Data via Broken Adaptive Ridge

Author: Kawaguchi Eric S.
Li Gang
Liu Zhenqiu
Suchard Marc A.
Publication venue: 'Wiley'
Publication date: 25/07/2018
Field of study

This paper develops a new scalable sparse Cox regression tool for sparse high-dimensional massive sample size (sHDMSS) survival data. The method is a local

L_0

-penalized Cox regression via repeatedly performing reweighted

L_2

-penalized Cox regression. We show that the resulting estimator enjoys the best of

L_0

- and

L_2

-penalized Cox regressions while overcoming their limitations. Specifically, the estimator is selection consistent, oracle for parameter estimation, and possesses a grouping property for highly correlated covariates. Simulation results suggest that when the sample size is large, the proposed method with pre-specified tuning parameters has a comparable or better performance than some popular penalized regression methods. More importantly, because the method naturally enables adaptation of efficient algorithms for massive

L_2

-penalized optimization and does not require costly data driven tuning parameter selection, it has a significant computational advantage for sHDMSS data, offering an average of 5-fold speedup over its closest competitor in empirical studies

arXiv.org e-Print Archive

eScholarship - University of California

Manifold Elastic Net: A Unified Framework for Sparse Dimension Reduction

Author: A D’aspremont
B Efron
C Fyfe
CHQ Ding
CM Bishop
D Cai
D Tao
D Tao
D Tao
Dacheng Tao
DB Graham
DL Donoho
E Candes
EJ Candes
GH Golub
GM James
H Hotelling
H Zou
H Zou
H Zou
H Zou
HP Kriegel
J Fan
J Fan
J Lv
JB Tenenbaum
M Belkin
M Belkin
PJ Phillips
PN Belhumeur
R Tibshirani
RA Fisher
S Yan
ST Roweis
T Li
T Zhang
Tianyi Zhou
X He
X Li
Xindong Wu
Z Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/07/2010
Field of study

It is difficult to find the optimal sparse solution of a manifold learning based dimensionality reduction algorithm. The lasso or the elastic net penalized manifold learning based dimensionality reduction is not directly a lasso penalized least square problem and thus the least angle regression (LARS) (Efron et al. \cite{LARS}), one of the most popular algorithms in sparse learning, cannot be applied. Therefore, most current approaches take indirect ways or have strict settings, which can be inconvenient for applications. In this paper, we proposed the manifold elastic net or MEN for short. MEN incorporates the merits of both the manifold learning based dimensionality reduction and the sparse learning based dimensionality reduction. By using a series of equivalent transformations, we show MEN is equivalent to the lasso penalized least square problem and thus LARS is adopted to obtain the optimal sparse solution of MEN. In particular, MEN has the following advantages for subsequent classification: 1) the local geometry of samples is well preserved for low dimensional data representation, 2) both the margin maximization and the classification error minimization are considered for sparse projection calculation, 3) the projection matrix of MEN improves the parsimony in computation, 4) the elastic net penalty reduces the over-fitting problem, and 5) the projection matrix of MEN can be interpreted psychologically and physiologically. Experimental evidence on face recognition over various popular datasets suggests that MEN is superior to top level dimensionality reduction algorithms.Comment: 33 pages, 12 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

OPUS - University of Technology Sydney