4,878 research outputs found
Pathwise coordinate optimization
We consider ``one-at-a-time'' coordinate-wise descent algorithms for a class
of convex optimization problems. An algorithm of this kind has been proposed
for the -penalized regression (lasso) in the literature, but it seems to
have been largely ignored. Indeed, it seems that coordinate-wise algorithms are
not often used in convex optimization. We show that this algorithm is very
competitive with the well-known LARS (or homotopy) procedure in large lasso
problems, and that it can be applied to related methods such as the garotte and
elastic net. It turns out that coordinate-wise descent does not work in the
``fused lasso,'' however, so we derive a generalized algorithm that yields the
solution in much less time that a standard convex optimizer. Finally, we
generalize the procedure to the two-dimensional fused lasso, and demonstrate
its performance on some image smoothing problems.Comment: Published in at http://dx.doi.org/10.1214/07-AOAS131 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Best Subset Selection via a Modern Optimization Lens
In the last twenty-five years (1990-2014), algorithmic advances in integer
optimization combined with hardware improvements have resulted in an
astonishing 200 billion factor speedup in solving Mixed Integer Optimization
(MIO) problems. We present a MIO approach for solving the classical best subset
selection problem of choosing out of features in linear regression
given observations. We develop a discrete extension of modern first order
continuous optimization methods to find high quality feasible solutions that we
use as warm starts to a MIO solver that finds provably optimal solutions. The
resulting algorithm (a) provides a solution with a guarantee on its
suboptimality even if we terminate the algorithm early, (b) can accommodate
side constraints on the coefficients of the linear regression and (c) extends
to finding best subset solutions for the least absolute deviation loss
function. Using a wide variety of synthetic and real datasets, we demonstrate
that our approach solves problems with in the 1000s and in the 100s in
minutes to provable optimality, and finds near optimal solutions for in the
100s and in the 1000s in minutes. We also establish via numerical
experiments that the MIO approach performs better than {\texttt {Lasso}} and
other popularly used sparse learning procedures, in terms of achieving sparse
solutions with good predictive power.Comment: This is a revised version (May, 2015) of the first submission in June
201
CoCoA: A General Framework for Communication-Efficient Distributed Optimization
The scale of modern datasets necessitates the development of efficient
distributed optimization methods for machine learning. We present a
general-purpose framework for distributed computing environments, CoCoA, that
has an efficient communication scheme and is applicable to a wide variety of
problems in machine learning and signal processing. We extend the framework to
cover general non-strongly-convex regularizers, including L1-regularized
problems like lasso, sparse logistic regression, and elastic net
regularization, and show how earlier work can be derived as a special case. We
provide convergence guarantees for the class of convex regularized loss
minimization objectives, leveraging a novel approach in handling
non-strongly-convex regularizers and non-smooth loss functions. The resulting
framework has markedly improved performance over state-of-the-art methods, as
we illustrate with an extensive set of experiments on real distributed
datasets
The Influence Function of Penalized Regression Estimators
To perform regression analysis in high dimensions, lasso or ridge estimation
are a common choice. However, it has been shown that these methods are not
robust to outliers. Therefore, alternatives as penalized M-estimation or the
sparse least trimmed squares (LTS) estimator have been proposed. The robustness
of these regression methods can be measured with the influence function. It
quantifies the effect of infinitesimal perturbations in the data. Furthermore
it can be used to compute the asymptotic variance and the mean squared error.
In this paper we compute the influence function, the asymptotic variance and
the mean squared error for penalized M-estimators and the sparse LTS estimator.
The asymptotic biasedness of the estimators make the calculations nonstandard.
We show that only M-estimators with a loss function with a bounded derivative
are robust against regression outliers. In particular, the lasso has an
unbounded influence function.Comment: appears in Statistics: A Journal of Theoretical and Applied
Statistics, 201
Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection
A number of variable selection methods have been proposed involving nonconvex
penalty functions. These methods, which include the smoothly clipped absolute
deviation (SCAD) penalty and the minimax concave penalty (MCP), have been
demonstrated to have attractive theoretical properties, but model fitting is
not a straightforward task, and the resulting solutions may be unstable. Here,
we demonstrate the potential of coordinate descent algorithms for fitting these
models, establishing theoretical convergence properties and demonstrating that
they are significantly faster than competing approaches. In addition, we
demonstrate the utility of convexity diagnostics to determine regions of the
parameter space in which the objective function is locally convex, even though
the penalty is not. Our simulation study and data examples indicate that
nonconvex penalties like MCP and SCAD are worthwhile alternatives to the lasso
in many applications. In particular, our numerical results suggest that MCP is
the preferred approach among the three methods.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS388 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors
Penalized regression is an attractive framework for variable selection
problems. Often, variables possess a grouping structure, and the relevant
selection problem is that of selecting groups, not individual variables. The
group lasso has been proposed as a way of extending the ideas of the lasso to
the problem of group selection. Nonconvex penalties such as SCAD and MCP have
been proposed and shown to have several advantages over the lasso; these
penalties may also be extended to the group selection problem, giving rise to
group SCAD and group MCP methods. Here, we describe algorithms for fitting
these models stably and efficiently. In addition, we present simulation results
and real data examples comparing and contrasting the statistical properties of
these methods
A General Family of Penalties for Combining Differing Types of Penalties in Generalized Structured Models
Penalized estimation has become an established tool for regularization and model selection in regression models.
A variety of penalties with specific features are available
and effective algorithms for specific penalties have been proposed.
But not much is available to fit models that call for a combination of different penalties.
When modeling rent data, which will be considered as an example, various types of predictors call for a combination of a Ridge, a grouped Lasso and a Lasso-type penalty within one model.
Algorithms that can deal with such problems, are in demand.
We propose to approximate penalties that are (semi-)norms of scalar linear transformations of the coefficient vector in generalized structured models.
The penalty is very general such that the Lasso, the fused Lasso, the Ridge, the smoothly clipped absolute deviation penalty (SCAD), the elastic net and many more penalties are embedded.
The approximation allows to combine all these penalties within one model.
The computation is based on conventional penalized iteratively re-weighted least squares (PIRLS) algorithms and hence, easy to implement.
Moreover, new penalties can be incorporated quickly.
The approach is also extended to penalties with vector based arguments; that is, to penalties with norms of linear transformations of the coefficient vector.
Some illustrative examples and the model for the Munich rent data show promising results
- …