2,599 research outputs found
The degrees of freedom of the Lasso for general design matrix
In this paper, we investigate the degrees of freedom (\dof) of penalized
minimization (also known as the Lasso) for linear regression models.
We give a closed-form expression of the \dof of the Lasso response. Namely,
we show that for any given Lasso regularization parameter and any
observed data belonging to a set of full (Lebesgue) measure, the
cardinality of the support of a particular solution of the Lasso problem is an
unbiased estimator of the degrees of freedom. This is achieved without the need
of uniqueness of the Lasso solution. Thus, our result holds true for both the
underdetermined and the overdetermined case, where the latter was originally
studied in \cite{zou}. We also show, by providing a simple counterexample, that
although the \dof theorem of \cite{zou} is correct, their proof contains a
flaw since their divergence formula holds on a different set of a full measure
than the one that they claim. An effective estimator of the number of degrees
of freedom may have several applications including an objectively guided choice
of the regularization parameter in the Lasso through the \sure framework. Our
theoretical findings are illustrated through several numerical simulations.Comment: A short version appeared in SPARS'11, June 2011 Previously entitled
"The degrees of freedom of penalized l1 minimization
Sparse logistic principal components analysis for binary data
We develop a new principal components analysis (PCA) type dimension reduction
method for binary data. Different from the standard PCA which is defined on the
observed data, the proposed PCA is defined on the logit transform of the
success probabilities of the binary observations. Sparsity is introduced to the
principal component (PC) loading vectors for enhanced interpretability and more
stable extraction of the principal components. Our sparse PCA is formulated as
solving an optimization problem with a criterion function motivated from a
penalized Bernoulli likelihood. A Majorization--Minimization algorithm is
developed to efficiently solve the optimization problem. The effectiveness of
the proposed sparse logistic PCA method is illustrated by application to a
single nucleotide polymorphism data set and a simulation study.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS327 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
A Path Algorithm for Constrained Estimation
Many least squares problems involve affine equality and inequality
constraints. Although there are variety of methods for solving such problems,
most statisticians find constrained estimation challenging. The current paper
proposes a new path following algorithm for quadratic programming based on
exact penalization. Similar penalties arise in regularization in model
selection. Classical penalty methods solve a sequence of unconstrained problems
that put greater and greater stress on meeting the constraints. In the limit as
the penalty constant tends to , one recovers the constrained solution.
In the exact penalty method, squared penalties are replaced by absolute value
penalties, and the solution is recovered for a finite value of the penalty
constant. The exact path following method starts at the unconstrained solution
and follows the solution path as the penalty constant increases. In the
process, the solution path hits, slides along, and exits from the various
constraints. Path following in lasso penalized regression, in contrast, starts
with a large value of the penalty constant and works its way downward. In both
settings, inspection of the entire solution path is revealing. Just as with the
lasso and generalized lasso, it is possible to plot the effective degrees of
freedom along the solution path. For a strictly convex quadratic program, the
exact penalty algorithm can be framed entirely in terms of the sweep operator
of regression analysis. A few well chosen examples illustrate the mechanics and
potential of path following.Comment: 26 pages, 5 figure
One-step estimator paths for concave regularization
The statistics literature of the past 15 years has established many favorable
properties for sparse diminishing-bias regularization: techniques which can
roughly be understood as providing estimation under penalty functions spanning
the range of concavity between and norms. However, lasso
-regularized estimation remains the standard tool for industrial `Big
Data' applications because of its minimal computational cost and the presence
of easy-to-apply rules for penalty selection. In response, this article
proposes a simple new algorithm framework that requires no more computation
than a lasso path: the path of one-step estimators (POSE) does penalized
regression estimation on a grid of decreasing penalties, but adapts
coefficient-specific weights to decrease as a function of the coefficient
estimated in the previous path step. This provides sparse diminishing-bias
regularization at no extra cost over the fastest lasso algorithms. Moreover,
our `gamma lasso' implementation of POSE is accompanied by a reliable heuristic
for the fit degrees of freedom, so that standard information criteria can be
applied in penalty selection. We also provide novel results on the distance
between weighted- and penalized predictors; this allows us to build
intuition about POSE and other diminishing-bias regularization schemes. The
methods and results are illustrated in extensive simulations and in application
of logistic regression to evaluating the performance of hockey players.Comment: Data and code are in the gamlr package for R. Supplemental appendix
is at https://github.com/TaddyLab/pose/raw/master/paper/supplemental.pd
Biomarker Detection in Association Studies: Modeling SNPs Simultaneously via Logistic ANOVA
In genome-wide association studies, the primary task is to detect biomarkers in the form of Single Nucleotide Polymorphisms (SNPs) that have nontrivial associations with a disease phenotype and some other important clinical/environmental factors. However, the extremely large number of SNPs comparing to the sample size inhibits application of classical methods such as the multiple logistic regression. Currently the most commonly used approach is still to analyze one SNP at a time. In this pa- per, we propose to consider the genotypes of the SNPs simultaneously via a logistic analysis of variance (ANOVA) model, which expresses the logit transformed mean of SNP genotypes as the summation of the SNP effects, effects of the disease phenotype and/or other clinical variables, and the interaction effects. We use a reduced-rank representation of the interaction-effect matrix for dimensionality reduction, and employ the L1-penalty in a penalized likelihood framework to filter out the SNPs that have no associations. We develop a Majorization-Minimization algorithm for computational implementation. In addition, we propose a modified BIC criterion to select the penalty parameters and determine the rank number. The proposed method is applied to a Multiple Sclerosis data set and simulated data sets and shows promise in biomarker detection
Regularized Multivariate Regression Models with Skew-\u3cem\u3et\u3c/em\u3e Error Distributions
We consider regularization of the parameters in multivariate linear regression models with the errors having a multivariate skew-t distribution. An iterative penalized likelihood procedure is proposed for constructing sparse estimators of both the regression coefficient and inverse scale matrices simultaneously. The sparsity is introduced through penalizing the negative log-likelihood by adding L1-penalties on the entries of the two matrices. Taking advantage of the hierarchical representation of skew-t distributions, and using the expectation conditional maximization (ECM) algorithm, we reduce the problem to penalized normal likelihood and develop a procedure to minimize the ensuing objective function. Using a simulation study the performance of the method is assessed, and the methodology is illustrated using a real data set with a 24-dimensional response vector
Sparse modeling of categorial explanatory variables
Shrinking methods in regression analysis are usually designed for metric
predictors. In this article, however, shrinkage methods for categorial
predictors are proposed. As an application we consider data from the Munich
rent standard, where, for example, urban districts are treated as a categorial
predictor. If independent variables are categorial, some modifications to usual
shrinking procedures are necessary. Two -penalty based methods for factor
selection and clustering of categories are presented and investigated. The
first approach is designed for nominal scale levels, the second one for ordinal
predictors. Besides applying them to the Munich rent standard, methods are
illustrated and compared in simulation studies.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS355 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Robust and sparse factor modelling.
Factor construction methods are widely used to summarize a large panel of variables by means of a relatively small number of representative factors. We propose a novel factor construction procedure that enjoys the properties of robustness to outliers and of sparsity; that is, having relatively few nonzero factor loadings. Compared to the traditional factor construction method, we find that this procedure leads to a favorable forecasting performance in the presence of outliers and to better interpretable factors. We investigate the performance of the method in a Monte Carlo experiment and in an empirical application to a large data set from macroeconomics.Dimension reduction; Forecasting; Outliers; Regularization; Sparsity;
The composite absolute penalties family for grouped and hierarchical variable selection
Extracting useful information from high-dimensional data is an important
focus of today's statistical research and practice. Penalized loss function
minimization has been shown to be effective for this task both theoretically
and empirically. With the virtues of both regularization and sparsity, the
-penalized squared error minimization method Lasso has been popular in
regression models and beyond. In this paper, we combine different norms
including to form an intelligent penalty in order to add side information
to the fitting of a regression or classification model to obtain reasonable
estimates. Specifically, we introduce the Composite Absolute Penalties (CAP)
family, which allows given grouping and hierarchical relationships between the
predictors to be expressed. CAP penalties are built by defining groups and
combining the properties of norm penalties at the across-group and within-group
levels. Grouped selection occurs for nonoverlapping groups. Hierarchical
variable selection is reached by defining groups with particular overlapping
patterns. We propose using the BLASSO and cross-validation to compute CAP
estimates in general. For a subfamily of CAP estimates involving only the
and norms, we introduce the iCAP algorithm to trace the entire
regularization path for the grouped selection problem. Within this subfamily,
unbiased estimates of the degrees of freedom (df) are derived so that the
regularization parameter is selected without cross-validation. CAP is shown to
improve on the predictive performance of the LASSO in a series of simulated
experiments, including cases with and possibly mis-specified
groupings. When the complexity of a model is properly calculated, iCAP is seen
to be parsimonious in the experiments.Comment: Published in at http://dx.doi.org/10.1214/07-AOS584 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
- …