15,698 research outputs found
Non-convex Optimization for Machine Learning
A vast majority of machine learning algorithms train their models and perform
inference by solving optimization problems. In order to capture the learning
and prediction problems accurately, structural constraints such as sparsity or
low rank are frequently imposed or else the objective itself is designed to be
a non-convex function. This is especially true of algorithms that operate in
high-dimensional spaces or that train non-linear models such as tensor models
and deep networks.
The freedom to express the learning problem as a non-convex optimization
problem gives immense modeling power to the algorithm designer, but often such
problems are NP-hard to solve. A popular workaround to this has been to relax
non-convex problems to convex ones and use traditional methods to solve the
(convex) relaxed optimization problems. However this approach may be lossy and
nevertheless presents significant challenges for large scale optimization.
On the other hand, direct approaches to non-convex optimization have met with
resounding success in several domains and remain the methods of choice for the
practitioner, as they frequently outperform relaxation-based techniques -
popular heuristics include projected gradient descent and alternating
minimization. However, these are often poorly understood in terms of their
convergence and other properties.
This monograph presents a selection of recent advances that bridge a
long-standing gap in our understanding of these heuristics. The monograph will
lead the reader through several widely used non-convex optimization techniques,
as well as applications thereof. The goal of this monograph is to both,
introduce the rich literature in this area, as well as equip the reader with
the tools and techniques needed to analyze these simple procedures for
non-convex problems.Comment: The official publication is available from now publishers via
http://dx.doi.org/10.1561/220000005
Computational Complexity versus Statistical Performance on Sparse Recovery Problems
We show that several classical quantities controlling compressed sensing
performance directly match classical parameters controlling algorithmic
complexity. We first describe linearly convergent restart schemes on
first-order methods solving a broad range of compressed sensing problems, where
sharpness at the optimum controls convergence speed. We show that for sparse
recovery problems, this sharpness can be written as a condition number, given
by the ratio between true signal sparsity and the largest signal size that can
be recovered by the observation matrix. In a similar vein, Renegar's condition
number is a data-driven complexity measure for convex programs, generalizing
classical condition numbers for linear systems. We show that for a broad class
of compressed sensing problems, the worst case value of this algorithmic
complexity measure taken over all signals matches the restricted singular value
of the observation matrix which controls robust recovery performance. Overall,
this means in both cases that, in compressed sensing problems, a single
parameter directly controls both computational complexity and recovery
performance. Numerical experiments illustrate these points using several
classical algorithms.Comment: Final version, to appear in information and Inferenc
Structured Sparsity: Discrete and Convex approaches
Compressive sensing (CS) exploits sparsity to recover sparse or compressible
signals from dimensionality reducing, non-adaptive sensing mechanisms. Sparsity
is also used to enhance interpretability in machine learning and statistics
applications: While the ambient dimension is vast in modern data analysis
problems, the relevant information therein typically resides in a much lower
dimensional space. However, many solutions proposed nowadays do not leverage
the true underlying structure. Recent results in CS extend the simple sparsity
idea to more sophisticated {\em structured} sparsity models, which describe the
interdependency between the nonzero components of a signal, allowing to
increase the interpretability of the results and lead to better recovery
performance. In order to better understand the impact of structured sparsity,
in this chapter we analyze the connections between the discrete models and
their convex relaxations, highlighting their relative advantages. We start with
the general group sparse model and then elaborate on two important special
cases: the dispersive and the hierarchical models. For each, we present the
models in their discrete nature, discuss how to solve the ensuing discrete
problems and then describe convex relaxations. We also consider more general
structures as defined by set functions and present their convex proxies.
Further, we discuss efficient optimization solutions for structured sparsity
problems and illustrate structured sparsity in action via three applications.Comment: 30 pages, 18 figure
Block-Sparse Recovery via Convex Optimization
Given a dictionary that consists of multiple blocks and a signal that lives
in the range space of only a few blocks, we study the problem of finding a
block-sparse representation of the signal, i.e., a representation that uses the
minimum number of blocks. Motivated by signal/image processing and computer
vision applications, such as face recognition, we consider the block-sparse
recovery problem in the case where the number of atoms in each block is
arbitrary, possibly much larger than the dimension of the underlying subspace.
To find a block-sparse representation of a signal, we propose two classes of
non-convex optimization programs, which aim to minimize the number of nonzero
coefficient blocks and the number of nonzero reconstructed vectors from the
blocks, respectively. Since both classes of problems are NP-hard, we propose
convex relaxations and derive conditions under which each class of the convex
programs is equivalent to the original non-convex formulation. Our conditions
depend on the notions of mutual and cumulative subspace coherence of a
dictionary, which are natural generalizations of existing notions of mutual and
cumulative coherence. We evaluate the performance of the proposed convex
programs through simulations as well as real experiments on face recognition.
We show that treating the face recognition problem as a block-sparse recovery
problem improves the state-of-the-art results by 10% with only 25% of the
training data.Comment: IEEE Transactions on Signal Processin
- …