Search CORE

37,715 research outputs found

Penalized Orthogonal-Components Regression for Large p Small n Data

Author: Lin Yanzhu
Zhang Dabao
Zhang Min
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 17/12/2008
Field of study

We propose a penalized orthogonal-components regression (POCRE) for large p small n data. Orthogonal components are sequentially constructed to maximize, upon standardization, their correlation to the response residuals. A new penalization framework, implemented via empirical Bayes thresholding, is presented to effectively identify sparse predictors of each component. POCRE is computationally efficient owing to its sequential construction of leading sparse principal components. In addition, such construction offers other properties such as grouping highly correlated predictors and allowing for collinear or nearly collinear predictors. With multivariate responses, POCRE can construct common components and thus build up latent-variable models for large p small n data.Comment: 12 page

arXiv.org e-Print Archive

Crossref

Penalized Orthogonal-Components Regression for Large p Small n Data

Author: A. Vazquez
B.J. Kim
D. Callaway
J.J. Pansiot
R. Albert
R. Cohen
R. Cohen
Publication venue
Publication date: 01/01/2004
Field of study

arXiv.org e-Print Archive

Inference for feature selection using the Lasso with high-dimensional data

Author: Brink-Jensen Kasper
Ekstrøm Claus Thorn
Publication venue
Publication date: 17/03/2014
Field of study

Penalized regression models such as the Lasso have proved useful for variable selection in many fields - especially for situations with high-dimensional data where the numbers of predictors far exceeds the number of observations. These methods identify and rank variables of importance but do not generally provide any inference of the selected variables. Thus, the variables selected might be the "most important" but need not be significant. We propose a significance test for the selection found by the Lasso. We introduce a procedure that computes inference and p-values for features chosen by the Lasso. This method rephrases the null hypothesis and uses a randomization approach which ensures that the error rate is controlled even for small samples. We demonstrate the ability of the algorithm to compute

p

-values of the expected magnitude with simulated data using a multitude of scenarios that involve various effects strengths and correlation between predictors. The algorithm is also applied to a prostate cancer dataset that has been analyzed in recent papers on the subject. The proposed method is found to provide a powerful way to make inference for feature selection even for small samples and when the number of predictors are several orders of magnitude larger than the number of observations. The algorithm is implemented in the MESS package in R and is freely available

arXiv.org e-Print Archive

CiteSeerX

Copenhagen University Research Information System

Bayesian inference in high-dimensional linear models using an empirical correlation-adaptive prior

Author: Bondell Howard
Liu Chang
Martin Ryan
Yang Yue
Publication venue
Publication date: 01/10/2018
Field of study

In the context of a high-dimensional linear regression model, we propose the use of an empirical correlation-adaptive prior that makes use of information in the observed predictor variable matrix to adaptively address high collinearity, determining if parameters associated with correlated predictors should be shrunk together or kept apart. Under suitable conditions, we prove that this empirical Bayes posterior concentrates around the true sparse parameter at the optimal rate asymptotically. A simplified version of a shotgun stochastic search algorithm is employed to implement the variable selection procedure, and we show, via simulation experiments across different settings and a real-data application, the favorable performance of the proposed method compared to existing methods.Comment: 25 pages, 4 figures, 2 table

arXiv.org e-Print Archive