481 research outputs found
A General Framework for Fast Stagewise Algorithms
Forward stagewise regression follows a very simple strategy for constructing
a sequence of sparse regression estimates: it starts with all coefficients
equal to zero, and iteratively updates the coefficient (by a small amount
) of the variable that achieves the maximal absolute inner product
with the current residual. This procedure has an interesting connection to the
lasso: under some conditions, it is known that the sequence of forward
stagewise estimates exactly coincides with the lasso path, as the step size
goes to zero. Furthermore, essentially the same equivalence holds
outside of least squares regression, with the minimization of a differentiable
convex loss function subject to an norm constraint (the stagewise
algorithm now updates the coefficient corresponding to the maximal absolute
component of the gradient).
Even when they do not match their -constrained analogues, stagewise
estimates provide a useful approximation, and are computationally appealing.
Their success in sparse modeling motivates the question: can a simple,
effective strategy like forward stagewise be applied more broadly in other
regularization settings, beyond the norm and sparsity? The current
paper is an attempt to do just this. We present a general framework for
stagewise estimation, which yields fast algorithms for problems such as
group-structured learning, matrix completion, image denoising, and more.Comment: 56 pages, 15 figure
Templates for Convex Cone Problems with Applications to Sparse Signal Recovery
This paper develops a general framework for solving a variety of convex cone
problems that frequently arise in signal processing, machine learning,
statistics, and other fields. The approach works as follows: first, determine a
conic formulation of the problem; second, determine its dual; third, apply
smoothing; and fourth, solve using an optimal first-order method. A merit of
this approach is its flexibility: for example, all compressed sensing problems
can be solved via this approach. These include models with objective
functionals such as the total-variation norm, ||Wx||_1 where W is arbitrary, or
a combination thereof. In addition, the paper also introduces a number of
technical contributions such as a novel continuation scheme, a novel approach
for controlling the step size, and some new results showing that the smooth and
unsmoothed problems are sometimes formally equivalent. Combined with our
framework, these lead to novel, stable and computationally efficient
algorithms. For instance, our general implementation is competitive with
state-of-the-art methods for solving intensively studied problems such as the
LASSO. Further, numerical experiments show that one can solve the Dantzig
selector problem, for which no efficient large-scale solvers exist, in a few
hundred iterations. Finally, the paper is accompanied with a software release.
This software is not a single, monolithic solver; rather, it is a suite of
programs and routines designed to serve as building blocks for constructing
complete algorithms.Comment: The TFOCS software is available at http://tfocs.stanford.edu This
version has updated reference
Regularization and Bayesian Learning in Dynamical Systems: Past, Present and Future
Regularization and Bayesian methods for system identification have been
repopularized in the recent years, and proved to be competitive w.r.t.
classical parametric approaches. In this paper we shall make an attempt to
illustrate how the use of regularization in system identification has evolved
over the years, starting from the early contributions both in the Automatic
Control as well as Econometrics and Statistics literature. In particular we
shall discuss some fundamental issues such as compound estimation problems and
exchangeability which play and important role in regularization and Bayesian
approaches, as also illustrated in early publications in Statistics. The
historical and foundational issues will be given more emphasis (and space), at
the expense of the more recent developments which are only briefly discussed.
The main reason for such a choice is that, while the recent literature is
readily available, and surveys have already been published on the subject, in
the author's opinion a clear link with past work had not been completely
clarified.Comment: Plenary Presentation at the IFAC SYSID 2015. Submitted to Annual
Reviews in Contro
Matrix Completion Models with Fixed Basis Coefficients and Rank Regularized Problems with Hard Constraints
Ph.DDOCTOR OF PHILOSOPH
Optimization with Sparsity-Inducing Penalties
Sparse estimation methods are aimed at using or obtaining parsimonious
representations of data or models. They were first dedicated to linear variable
selection but numerous extensions have now emerged such as structured sparsity
or kernel selection. It turns out that many of the related estimation problems
can be cast as convex optimization problems by regularizing the empirical risk
with appropriate non-smooth norms. The goal of this paper is to present from a
general perspective optimization tools and techniques dedicated to such
sparsity-inducing penalties. We cover proximal methods, block-coordinate
descent, reweighted -penalized techniques, working-set and homotopy
methods, as well as non-convex formulations and extensions, and provide an
extensive set of experiments to compare various algorithms from a computational
point of view
Sparse convex optimization methods for machine learning
Diss., Eidgenössische Technische Hochschule ETH Zürich, Nr. 20013, 201
- …