406 research outputs found
Structured Sparsity: Discrete and Convex approaches
Compressive sensing (CS) exploits sparsity to recover sparse or compressible
signals from dimensionality reducing, non-adaptive sensing mechanisms. Sparsity
is also used to enhance interpretability in machine learning and statistics
applications: While the ambient dimension is vast in modern data analysis
problems, the relevant information therein typically resides in a much lower
dimensional space. However, many solutions proposed nowadays do not leverage
the true underlying structure. Recent results in CS extend the simple sparsity
idea to more sophisticated {\em structured} sparsity models, which describe the
interdependency between the nonzero components of a signal, allowing to
increase the interpretability of the results and lead to better recovery
performance. In order to better understand the impact of structured sparsity,
in this chapter we analyze the connections between the discrete models and
their convex relaxations, highlighting their relative advantages. We start with
the general group sparse model and then elaborate on two important special
cases: the dispersive and the hierarchical models. For each, we present the
models in their discrete nature, discuss how to solve the ensuing discrete
problems and then describe convex relaxations. We also consider more general
structures as defined by set functions and present their convex proxies.
Further, we discuss efficient optimization solutions for structured sparsity
problems and illustrate structured sparsity in action via three applications.Comment: 30 pages, 18 figure
FAASTA: A fast solver for total-variation regularization of ill-conditioned problems with application to brain imaging
The total variation (TV) penalty, as many other analysis-sparsity problems,
does not lead to separable factors or a proximal operatorwith a closed-form
expression, such as soft thresholding for the penalty. As a result,
in a variational formulation of an inverse problem or statisticallearning
estimation, it leads to challenging non-smooth optimization problemsthat are
often solved with elaborate single-step first-order methods. When thedata-fit
term arises from empirical measurements, as in brain imaging, it isoften very
ill-conditioned and without simple structure. In this situation, in proximal
splitting methods, the computation cost of thegradient step can easily dominate
each iteration. Thus it is beneficialto minimize the number of gradient
steps.We present fAASTA, a variant of FISTA, that relies on an internal solver
forthe TV proximal operator, and refines its tolerance to balance
computationalcost of the gradient and the proximal steps. We give benchmarks
andillustrations on "brain decoding": recovering brain maps from
noisymeasurements to predict observed behavior. The algorithm as well as
theempirical study of convergence speed are valuable for any non-exact
proximaloperator, in particular analysis-sparsity problems
- …