2,337 research outputs found
Smoothing Proximal Gradient Method for General Structured Sparse Learning
We study the problem of learning high dimensional regression models
regularized by a structured-sparsity-inducing penalty that encodes prior
structural information on either input or output sides. We consider two widely
adopted types of such penalties as our motivating examples: 1) overlapping
group lasso penalty, based on the l1/l2 mixed-norm penalty, and 2) graph-guided
fusion penalty. For both types of penalties, due to their non-separability,
developing an efficient optimization method has remained a challenging problem.
In this paper, we propose a general optimization approach, called smoothing
proximal gradient method, which can solve the structured sparse regression
problems with a smooth convex loss and a wide spectrum of
structured-sparsity-inducing penalties. Our approach is based on a general
smoothing technique of Nesterov. It achieves a convergence rate faster than the
standard first-order method, subgradient method, and is much more scalable than
the most widely used interior-point method. Numerical results are reported to
demonstrate the efficiency and scalability of the proposed method.Comment: arXiv admin note: substantial text overlap with arXiv:1005.471
Continuation of Nesterov's Smoothing for Regression with Structured Sparsity in High-Dimensional Neuroimaging
Predictive models can be used on high-dimensional brain images for diagnosis
of a clinical condition. Spatial regularization through structured sparsity
offers new perspectives in this context and reduces the risk of overfitting the
model while providing interpretable neuroimaging signatures by forcing the
solution to adhere to domain-specific constraints. Total Variation (TV)
enforces spatial smoothness of the solution while segmenting predictive regions
from the background. We consider the problem of minimizing the sum of a smooth
convex loss, a non-smooth convex penalty (whose proximal operator is known) and
a wide range of possible complex, non-smooth convex structured penalties such
as TV or overlapping group Lasso. Existing solvers are either limited in the
functions they can minimize or in their practical capacity to scale to
high-dimensional imaging data. Nesterov's smoothing technique can be used to
minimize a large number of non-smooth convex structured penalties but
reasonable precision requires a small smoothing parameter, which slows down the
convergence speed. To benefit from the versatility of Nesterov's smoothing
technique, we propose a first order continuation algorithm, CONESTA, which
automatically generates a sequence of decreasing smoothing parameters. The
generated sequence maintains the optimal convergence speed towards any globally
desired precision. Our main contributions are: To propose an expression of the
duality gap to probe the current distance to the global optimum in order to
adapt the smoothing parameter and the convergence speed. We provide a
convergence rate, which is an improvement over classical proximal gradient
smoothing methods. We demonstrate on both simulated and high-dimensional
structural neuroimaging data that CONESTA significantly outperforms many
state-of-the-art solvers in regard to convergence speed and precision.Comment: 11 pages, 6 figures, accepted in IEEE TMI, IEEE Transactions on
Medical Imaging 201
Tree-guided group lasso for multi-response regression with structured sparsity, with an application to eQTL mapping
We consider the problem of estimating a sparse multi-response regression
function, with an application to expression quantitative trait locus (eQTL)
mapping, where the goal is to discover genetic variations that influence
gene-expression levels. In particular, we investigate a shrinkage technique
capable of capturing a given hierarchical structure over the responses, such as
a hierarchical clustering tree with leaf nodes for responses and internal nodes
for clusters of related responses at multiple granularity, and we seek to
leverage this structure to recover covariates relevant to each
hierarchically-defined cluster of responses. We propose a tree-guided group
lasso, or tree lasso, for estimating such structured sparsity under
multi-response regression by employing a novel penalty function constructed
from the tree. We describe a systematic weighting scheme for the overlapping
groups in the tree-penalty such that each regression coefficient is penalized
in a balanced manner despite the inhomogeneous multiplicity of group
memberships of the regression coefficients due to overlaps among groups. For
efficient optimization, we employ a smoothing proximal gradient method that was
originally developed for a general class of structured-sparsity-inducing
penalties. Using simulated and yeast data sets, we demonstrate that our method
shows a superior performance in terms of both prediction errors and recovery of
true sparsity patterns, compared to other methods for learning a
multivariate-response regression.Comment: Published in at http://dx.doi.org/10.1214/12-AOAS549 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
A successive difference-of-convex approximation method for a class of nonconvex nonsmooth optimization problems
We consider a class of nonconvex nonsmooth optimization problems whose
objective is the sum of a smooth function and a finite number of nonnegative
proper closed possibly nonsmooth functions (whose proximal mappings are easy to
compute), some of which are further composed with linear maps. This kind of
problems arises naturally in various applications when different regularizers
are introduced for inducing simultaneous structures in the solutions. Solving
these problems, however, can be challenging because of the coupled nonsmooth
functions: the corresponding proximal mapping can be hard to compute so that
standard first-order methods such as the proximal gradient algorithm cannot be
applied efficiently. In this paper, we propose a successive
difference-of-convex approximation method for solving this kind of problems. In
this algorithm, we approximate the nonsmooth functions by their Moreau
envelopes in each iteration. Making use of the simple observation that Moreau
envelopes of nonnegative proper closed functions are continuous {\em
difference-of-convex} functions, we can then approximately minimize the
approximation function by first-order methods with suitable majorization
techniques. These first-order methods can be implemented efficiently thanks to
the fact that the proximal mapping of {\em each} nonsmooth function is easy to
compute. Under suitable assumptions, we prove that the sequence generated by
our method is bounded and any accumulation point is a stationary point of the
objective. We also discuss how our method can be applied to concrete
applications such as nonconvex fused regularized optimization problems and
simultaneously structured matrix optimization problems, and illustrate the
performance numerically for these two specific applications
Fast Nonsmooth Regularized Risk Minimization with Continuation
In regularized risk minimization, the associated optimization problem becomes
particularly difficult when both the loss and regularizer are nonsmooth.
Existing approaches either have slow or unclear convergence properties, are
restricted to limited problem subclasses, or require careful setting of a
smoothing parameter. In this paper, we propose a continuation algorithm that is
applicable to a large class of nonsmooth regularized risk minimization
problems, can be flexibly used with a number of existing solvers for the
underlying smoothed subproblem, and with convergence results on the whole
algorithm rather than just one of its subproblems. In particular, when
accelerated solvers are used, the proposed algorithm achieves the fastest known
rates of on strongly convex problems, and on general convex
problems. Experiments on nonsmooth classification and regression tasks
demonstrate that the proposed algorithm outperforms the state-of-the-art.Comment: AAAI-201
- …