89 research outputs found
Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection
A number of variable selection methods have been proposed involving nonconvex
penalty functions. These methods, which include the smoothly clipped absolute
deviation (SCAD) penalty and the minimax concave penalty (MCP), have been
demonstrated to have attractive theoretical properties, but model fitting is
not a straightforward task, and the resulting solutions may be unstable. Here,
we demonstrate the potential of coordinate descent algorithms for fitting these
models, establishing theoretical convergence properties and demonstrating that
they are significantly faster than competing approaches. In addition, we
demonstrate the utility of convexity diagnostics to determine regions of the
parameter space in which the objective function is locally convex, even though
the penalty is not. Our simulation study and data examples indicate that
nonconvex penalties like MCP and SCAD are worthwhile alternatives to the lasso
in many applications. In particular, our numerical results suggest that MCP is
the preferred approach among the three methods.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS388 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Strong rules for nonconvex penalties and their implications for efficient algorithms in high-dimensional regression
We consider approaches for improving the efficiency of algorithms for fitting
nonconvex penalized regression models such as SCAD and MCP in high dimensions.
In particular, we develop rules for discarding variables during cyclic
coordinate descent. This dimension reduction leads to a substantial improvement
in the speed of these algorithms for high-dimensional problems. The rules we
propose here eliminate a substantial fraction of the variables from the
coordinate descent algorithm. Violations are quite rare, especially in the
locally convex region of the solution path, and furthermore, may be easily
detected and corrected by checking the Karush-Kuhn-Tucker conditions. We extend
these rules to generalized linear models, as well as to other nonconvex
penalties such as the -stabilized Mnet penalty, group MCP, and group
SCAD. We explore three variants of the coordinate decent algorithm that
incorporate these rules and study the efficiency of these algorithms in fitting
models to both simulated data and on real data from a genome-wide association
study
Kernel-based aggregation of marker-level genetic association tests involving copy-number variation
Genetic association tests involving copy-number variants (CNVs) are
complicated by the fact that CNVs span multiple markers at which measurements
are taken. The power of an association test at a single marker is typically
low, and it is desirable to pool information across the markers spanned by the
CNV. However, CNV boundaries are not known in advance, and the best way to
proceed with this pooling is unclear. In this article, we propose a
kernel-based method for aggregation of marker-level tests and explore several
aspects of its implementation. In addition, we explore some of the theoretical
aspects of marker-level test aggregation, proposing a permutation-based
approach that preserves the family-wise error rate of the testing procedure,
while demonstrating that several simpler alternatives fail to do so. The
empirical power of the approach is studied in a number of simulations
constructed from real data involving a pharmacogenomic study of gemcitabine,
and compares favorably with several competing approaches
A Selective Review of Group Selection in High-Dimensional Models
Grouping structures arise naturally in many statistical modeling problems.
Several methods have been proposed for variable selection that respect grouping
structure in variables. Examples include the group LASSO and several concave
group selection methods. In this article, we give a selective review of group
selection concerning methodological developments, theoretical properties and
computational algorithms. We pay particular attention to group selection
methods involving concave penalties. We address both group selection and
bi-level selection methods. We describe several applications of these methods
in nonparametric additive models, semiparametric regression, seemingly
unrelated regressions, genomic data analysis and genome wide association
studies. We also highlight some issues that require further study.Comment: Published in at http://dx.doi.org/10.1214/12-STS392 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
- …