2,616 research outputs found
Non-convex Optimization for Machine Learning
A vast majority of machine learning algorithms train their models and perform
inference by solving optimization problems. In order to capture the learning
and prediction problems accurately, structural constraints such as sparsity or
low rank are frequently imposed or else the objective itself is designed to be
a non-convex function. This is especially true of algorithms that operate in
high-dimensional spaces or that train non-linear models such as tensor models
and deep networks.
The freedom to express the learning problem as a non-convex optimization
problem gives immense modeling power to the algorithm designer, but often such
problems are NP-hard to solve. A popular workaround to this has been to relax
non-convex problems to convex ones and use traditional methods to solve the
(convex) relaxed optimization problems. However this approach may be lossy and
nevertheless presents significant challenges for large scale optimization.
On the other hand, direct approaches to non-convex optimization have met with
resounding success in several domains and remain the methods of choice for the
practitioner, as they frequently outperform relaxation-based techniques -
popular heuristics include projected gradient descent and alternating
minimization. However, these are often poorly understood in terms of their
convergence and other properties.
This monograph presents a selection of recent advances that bridge a
long-standing gap in our understanding of these heuristics. The monograph will
lead the reader through several widely used non-convex optimization techniques,
as well as applications thereof. The goal of this monograph is to both,
introduce the rich literature in this area, as well as equip the reader with
the tools and techniques needed to analyze these simple procedures for
non-convex problems.Comment: The official publication is available from now publishers via
http://dx.doi.org/10.1561/220000005
Robust Principal Component Analysis?
This paper is about a curious phenomenon. Suppose we have a data matrix,
which is the superposition of a low-rank component and a sparse component. Can
we recover each component individually? We prove that under some suitable
assumptions, it is possible to recover both the low-rank and the sparse
components exactly by solving a very convenient convex program called Principal
Component Pursuit; among all feasible decompositions, simply minimize a
weighted combination of the nuclear norm and of the L1 norm. This suggests the
possibility of a principled approach to robust principal component analysis
since our methodology and results assert that one can recover the principal
components of a data matrix even though a positive fraction of its entries are
arbitrarily corrupted. This extends to the situation where a fraction of the
entries are missing as well. We discuss an algorithm for solving this
optimization problem, and present applications in the area of video
surveillance, where our methodology allows for the detection of objects in a
cluttered background, and in the area of face recognition, where it offers a
principled way of removing shadows and specularities in images of faces
- …