Search CORE

30,606 research outputs found

The Lasso Problem and Uniqueness

Author: Tibshirani Ryan J.
Publication venue
Publication date: 01/01/2012
Field of study

The lasso is a popular tool for sparse linear regression, especially for problems in which the number of variables p exceeds the number of observations n. But when p>n, the lasso criterion is not strictly convex, and hence it may not have a unique minimum. An important question is: when is the lasso solution well-defined (unique)? We review results from the literature, which show that if the predictor variables are drawn from a continuous probability distribution, then there is a unique lasso solution with probability one, regardless of the sizes of n and p. We also show that this result extends easily to

\ell_1

penalized minimization problems over a wide range of loss functions. A second important question is: how can we deal with the case of non-uniqueness in lasso solutions? In light of the aforementioned result, this case really only arises when some of the predictor variables are discrete, or when some post-processing has been performed on continuous predictor measurements. Though we certainly cannot claim to provide a complete answer to such a broad question, we do present progress towards understanding some aspects of non-uniqueness. First, we extend the LARS algorithm for computing the lasso solution path to cover the non-unique case, so that this path algorithm works for any predictor matrix. Next, we derive a simple method for computing the component-wise uncertainty in lasso solutions of any given problem instance, based on linear programming. Finally, we review results from the literature on some of the unifying properties of lasso solutions, and also point out particular forms of solutions that have distinctive properties.Comment: 25 pages, 0 figure

arXiv.org e-Print Archive

CiteSeerX

Variable selection in nonparametric additive models

Author: Horowitz Joel L.
Huang Jian
Wei Fengrong
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 24/05/2010
Field of study

We consider a nonparametric additive model of a conditional mean function in which the number of variables and additive components may be larger than the sample size but the number of nonzero additive components is "small" relative to the sample size. The statistical problem is to determine which additive components are nonzero. The additive components are approximated by truncated series expansions with B-spline bases. With this approximation, the problem of component selection becomes that of selecting the groups of coefficients in the expansion. We apply the adaptive group Lasso to select nonzero components, using the group Lasso to obtain an initial estimator and reduce the dimension of the problem. We give conditions under which the group Lasso selects a model whose number of components is comparable with the underlying model, and the adaptive group Lasso selects the nonzero components correctly with probability approaching one as the sample size increases and achieves the optimal rate of convergence. The results of Monte Carlo experiments show that the adaptive group Lasso procedure works well with samples of moderate size. A data example is used to illustrate the application of the proposed method.Comment: Published in at http://dx.doi.org/10.1214/09-AOS781 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

A flexible framework for sparse simultaneous component based data integration

Author: AE Hoerl
AL Barabasi
Anestis Antoniadis
D Lee
DM Witten
GJ McLachlan
H Kiers
H Zou
H Zou
HAL Kiers
I Borg
I Jolliffe
IT Jolliffe
Iven Van Mechelen
J de Leeuw
J Friedman
J Huang
JMF Ten Berge
K Lange
K Lemmens
K Van Deun
K Van Deun
K Van Deun
KA Le Cao
Katrijn Van Deun
KR Gabriel
L Meier
M de Tayrac
M Kowalski
M Yuan
MJ van der Werf
N Ishii
O Alter
P Zhao
PJF Groenen
R Jenatton
R Tibshirani
R van den Berg
Robert A van den Berg
S Hochreiter
S Ma
T Wilderjans
TF Wilderjans
Tom F Wilderjans
WJ Heiser
Y Kim
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract 1 Background High throughput data are complex and methods that reveal structure underlying the data are most useful. Principal component analysis, frequently implemented as a singular value decomposition, is a popular technique in this respect. Nowadays often the challenge is to reveal structure in several sources of information (e.g., transcriptomics, proteomics) that are available for the same biological entities under study. Simultaneous component methods are most promising in this respect. However, the interpretation of the principal and simultaneous components is often daunting because contributions of each of the biomolecules (transcripts, proteins) have to be taken into account. 2 Results We propose a sparse simultaneous component method that makes many of the parameters redundant by shrinking them to zero. It includes principal component analysis, sparse principal component analysis, and ordinary simultaneous component analysis as special cases. Several penalties can be tuned that account in different ways for the block structure present in the integrated data. This yields known sparse approaches as the lasso, the ridge penalty, the elastic net, the group lasso, sparse group lasso, and elitist lasso. In addition, the algorithmic results can be easily transposed to the context of regression. Metabolomics data obtained with two measurement platforms for the same set of <it>Escherichia coli </it>samples are used to illustrate the proposed methodology and the properties of different penalties with respect to sparseness across and within data blocks. 3 Conclusion Sparse simultaneous component analysis is a useful method for data integration: First, simultaneous analyses of multiple blocks offer advantages over sequential and separate analyses and second, interpretation of the results is highly facilitated by their sparseness. The approach offered is flexible and allows to take the block structure in different ways into account. As such, structures can be found that are exclusively tied to one data platform (group lasso approach) as well as structures that involve all data platforms (Elitist lasso approach). 4 Availability The additional file contains a MATLAB implementation of the sparse simultaneous component method.</p

Lirias

Crossref

Hal - Université Grenoble Alpes

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Recommended from our members

A modified principal component technique based on the LASSO

Author: Jolliffe I.T.
Trendafilov N.T.
Uddin M.
Publication venue: 'Informa UK Limited'
Publication date: 01/09/2003
Field of study

In many multivariate statistical techniques, a set of linear functions of the original p variables is produced. One of the more difŽ cult aspects of these techniques is the interpretation of the linear functions, as these functions usually have nonzero coefŽ cients on all p variables.A common approach is to effectively ignore (treat as zero) any coefŽ cients less than some threshold value, so that the function becomes simple and the interpretation becomes easier for the users. Such a procedure can be misleading.There are alternatives to principal component analysis which restrict the coefficients to a smaller number of possible values in the derivationof the linear functions,or replace the principal components by “principal variables.” This article introduces a new technique, borrowing an idea proposed by Tibshirani in the context of multiple regressionwhere similar problemsarise in interpreting regression equations. This approach is the so-called LASSO, the “least absolute shrinkage and selection operator,” in which a bound is introduced on the sum of the absolute values of the coefficients, and in which some coefficients consequently become zero.We explore some of the propertiesof the newtechnique,both theoreticallyand using simulationstudies, and apply it to an example

Open Research Online (The Open University)