8,541 research outputs found
The Convergence Guarantees of a Non-convex Approach for Sparse Recovery
In the area of sparse recovery, numerous researches hint that non-convex
penalties might induce better sparsity than convex ones, but up until now those
corresponding non-convex algorithms lack convergence guarantees from the
initial solution to the global optimum. This paper aims to provide performance
guarantees of a non-convex approach for sparse recovery. Specifically, the
concept of weak convexity is incorporated into a class of sparsity-inducing
penalties to characterize the non-convexity. Borrowing the idea of the
projected subgradient method, an algorithm is proposed to solve the non-convex
optimization problem. In addition, a uniform approximate projection is adopted
in the projection step to make this algorithm computationally tractable for
large scale problems. The convergence analysis is provided in the noisy
scenario. It is shown that if the non-convexity of the penalty is below a
threshold (which is in inverse proportion to the distance between the initial
solution and the sparse signal), the recovered solution has recovery error
linear in both the step size and the noise term. Numerical simulations are
implemented to test the performance of the proposed approach and verify the
theoretical analysis.Comment: 33 pages, 7 figure
Non-convex Optimization for Machine Learning
A vast majority of machine learning algorithms train their models and perform
inference by solving optimization problems. In order to capture the learning
and prediction problems accurately, structural constraints such as sparsity or
low rank are frequently imposed or else the objective itself is designed to be
a non-convex function. This is especially true of algorithms that operate in
high-dimensional spaces or that train non-linear models such as tensor models
and deep networks.
The freedom to express the learning problem as a non-convex optimization
problem gives immense modeling power to the algorithm designer, but often such
problems are NP-hard to solve. A popular workaround to this has been to relax
non-convex problems to convex ones and use traditional methods to solve the
(convex) relaxed optimization problems. However this approach may be lossy and
nevertheless presents significant challenges for large scale optimization.
On the other hand, direct approaches to non-convex optimization have met with
resounding success in several domains and remain the methods of choice for the
practitioner, as they frequently outperform relaxation-based techniques -
popular heuristics include projected gradient descent and alternating
minimization. However, these are often poorly understood in terms of their
convergence and other properties.
This monograph presents a selection of recent advances that bridge a
long-standing gap in our understanding of these heuristics. The monograph will
lead the reader through several widely used non-convex optimization techniques,
as well as applications thereof. The goal of this monograph is to both,
introduce the rich literature in this area, as well as equip the reader with
the tools and techniques needed to analyze these simple procedures for
non-convex problems.Comment: The official publication is available from now publishers via
http://dx.doi.org/10.1561/220000005
Low Complexity Regularization of Linear Inverse Problems
Inverse problems and regularization theory is a central theme in contemporary
signal processing, where the goal is to reconstruct an unknown signal from
partial indirect, and possibly noisy, measurements of it. A now standard method
for recovering the unknown signal is to solve a convex optimization problem
that enforces some prior knowledge about its structure. This has proved
efficient in many problems routinely encountered in imaging sciences,
statistics and machine learning. This chapter delivers a review of recent
advances in the field where the regularization prior promotes solutions
conforming to some notion of simplicity/low-complexity. These priors encompass
as popular examples sparsity and group sparsity (to capture the compressibility
of natural signals and images), total variation and analysis sparsity (to
promote piecewise regularity), and low-rank (as natural extension of sparsity
to matrix-valued data). Our aim is to provide a unified treatment of all these
regularizations under a single umbrella, namely the theory of partial
smoothness. This framework is very general and accommodates all low-complexity
regularizers just mentioned, as well as many others. Partial smoothness turns
out to be the canonical way to encode low-dimensional models that can be linear
spaces or more general smooth manifolds. This review is intended to serve as a
one stop shop toward the understanding of the theoretical properties of the
so-regularized solutions. It covers a large spectrum including: (i) recovery
guarantees and stability to noise, both in terms of -stability and
model (manifold) identification; (ii) sensitivity analysis to perturbations of
the parameters involved (in particular the observations), with applications to
unbiased risk estimation ; (iii) convergence properties of the forward-backward
proximal splitting scheme, that is particularly well suited to solve the
corresponding large-scale regularized optimization problem
Randomized Low-Memory Singular Value Projection
Affine rank minimization algorithms typically rely on calculating the
gradient of a data error followed by a singular value decomposition at every
iteration. Because these two steps are expensive, heuristic approximations are
often used to reduce computational burden. To this end, we propose a recovery
scheme that merges the two steps with randomized approximations, and as a
result, operates on space proportional to the degrees of freedom in the
problem. We theoretically establish the estimation guarantees of the algorithm
as a function of approximation tolerance. While the theoretical approximation
requirements are overly pessimistic, we demonstrate that in practice the
algorithm performs well on the quantum tomography recovery problem.Comment: 13 pages. This version has a revised theorem and new numerical
experiment
Learning Sparsely Used Overcomplete Dictionaries via Alternating Minimization
We consider the problem of sparse coding, where each sample consists of a
sparse linear combination of a set of dictionary atoms, and the task is to
learn both the dictionary elements and the mixing coefficients. Alternating
minimization is a popular heuristic for sparse coding, where the dictionary and
the coefficients are estimated in alternate steps, keeping the other fixed.
Typically, the coefficients are estimated via minimization, keeping
the dictionary fixed, and the dictionary is estimated through least squares,
keeping the coefficients fixed. In this paper, we establish local linear
convergence for this variant of alternating minimization and establish that the
basin of attraction for the global optimum (corresponding to the true
dictionary and the coefficients) is \order{1/s^2}, where is the sparsity
level in each sample and the dictionary satisfies RIP. Combined with the recent
results of approximate dictionary estimation, this yields provable guarantees
for exact recovery of both the dictionary elements and the coefficients, when
the dictionary elements are incoherent.Comment: Local linear convergence now holds under RIP and also more general
restricted eigenvalue condition
- …