49 research outputs found
Convex Optimization: Algorithms and Complexity
This monograph presents the main complexity theorems in convex optimization
and their corresponding algorithms. Starting from the fundamental theory of
black-box optimization, the material progresses towards recent advances in
structural optimization and stochastic optimization. Our presentation of
black-box optimization, strongly influenced by Nesterov's seminal book and
Nemirovski's lecture notes, includes the analysis of cutting plane methods, as
well as (accelerated) gradient descent schemes. We also pay special attention
to non-Euclidean settings (relevant algorithms include Frank-Wolfe, mirror
descent, and dual averaging) and discuss their relevance in machine learning.
We provide a gentle introduction to structural optimization with FISTA (to
optimize a sum of a smooth and a simple non-smooth term), saddle-point mirror
prox (Nemirovski's alternative to Nesterov's smoothing), and a concise
description of interior point methods. In stochastic optimization we discuss
stochastic gradient descent, mini-batches, random coordinate descent, and
sublinear algorithms. We also briefly touch upon convex relaxation of
combinatorial problems and the use of randomness to round solutions, as well as
random walks based methods.Comment: A previous version of the manuscript was titled "Theory of Convex
Optimization for Machine Learning
Alternating Randomized Block Coordinate Descent
Block-coordinate descent algorithms and alternating minimization methods are
fundamental optimization algorithms and an important primitive in large-scale
optimization and machine learning. While various block-coordinate-descent-type
methods have been studied extensively, only alternating minimization -- which
applies to the setting of only two blocks -- is known to have convergence time
that scales independently of the least smooth block. A natural question is
then: is the setting of two blocks special?
We show that the answer is "no" as long as the least smooth block can be
optimized exactly -- an assumption that is also needed in the setting of
alternating minimization. We do so by introducing a novel algorithm AR-BCD,
whose convergence time scales independently of the least smooth (possibly
non-smooth) block. The basic algorithm generalizes both alternating
minimization and randomized block coordinate (gradient) descent, and we also
provide its accelerated version -- AAR-BCD. As a special case of AAR-BCD, we
obtain the first nontrivial accelerated alternating minimization algorithm.Comment: Version 1 appeared Proc. ICML'18. v1 -> v2: added remarks about how
accelerated alternating minimization follows directly from the results that
appeared in ICML'18; no new technical results were needed for thi
Hardness of parameter estimation in graphical models
We consider the problem of learning the canonical parameters specifying an
undirected graphical model (Markov random field) from the mean parameters. For
graphical models representing a minimal exponential family, the canonical
parameters are uniquely determined by the mean parameters, so the problem is
feasible in principle. The goal of this paper is to investigate the
computational feasibility of this statistical task. Our main result shows that
parameter estimation is in general intractable: no algorithm can learn the
canonical parameters of a generic pair-wise binary graphical model from the
mean parameters in time bounded by a polynomial in the number of variables
(unless RP = NP). Indeed, such a result has been believed to be true (see the
monograph by Wainwright and Jordan (2008)) but no proof was known.
Our proof gives a polynomial time reduction from approximating the partition
function of the hard-core model, known to be hard, to learning approximate
parameters. Our reduction entails showing that the marginal polytope boundary
has an inherent repulsive property, which validates an optimization procedure
over the polytope that does not use any knowledge of its structure (as required
by the ellipsoid method and others).Comment: 15 pages. To appear in NIPS 201