93 research outputs found
Solving Variational Inequalities with Monotone Operators on Domains Given by Linear Minimization Oracles
The standard algorithms for solving large-scale convex-concave saddle point
problems, or, more generally, variational inequalities with monotone operators,
are proximal type algorithms which at every iteration need to compute a
prox-mapping, that is, to minimize over problem's domain the sum of a
linear form and the specific convex distance-generating function underlying the
algorithms in question. Relative computational simplicity of prox-mappings,
which is the standard requirement when implementing proximal algorithms,
clearly implies the possibility to equip with a relatively computationally
cheap Linear Minimization Oracle (LMO) able to minimize over linear forms.
There are, however, important situations where a cheap LMO indeed is available,
but where no proximal setup with easy-to-compute prox-mappings is known. This
fact motivates our goal in this paper, which is to develop techniques for
solving variational inequalities with monotone operators on domains given by
Linear Minimization Oracles. The techniques we develope can be viewed as a
substantial extension of the proposed in [5] method of nonsmooth convex
minimization over an LMO-represented domain
Decomposition Techniques for Bilinear Saddle Point Problems and Variational Inequalities with Affine Monotone Operators on Domains Given by Linear Minimization Oracles
The majority of First Order methods for large-scale convex-concave saddle
point problems and variational inequalities with monotone operators are
proximal algorithms which at every iteration need to minimize over problem's
domain X the sum of a linear form and a strongly convex function. To make such
an algorithm practical, X should be proximal-friendly -- admit a strongly
convex function with easy to minimize linear perturbations. As a byproduct, X
admits a computationally cheap Linear Minimization Oracle (LMO) capable to
minimize over X linear forms. There are, however, important situations where a
cheap LMO indeed is available, but X is not proximal-friendly, which motivates
search for algorithms based solely on LMO's. For smooth convex minimization,
there exists a classical LMO-based algorithm -- Conditional Gradient. In
contrast, known to us LMO-based techniques for other problems with convex
structure (nonsmooth convex minimization, convex-concave saddle point problems,
even as simple as bilinear ones, and variational inequalities with monotone
operators, even as simple as affine) are quite recent and utilize common
approach based on Fenchel-type representations of the associated
objectives/vector fields. The goal of this paper is to develop an alternative
(and seemingly much simpler) LMO-based decomposition techniques for bilinear
saddle point problems and for variational inequalities with affine monotone
operators
Semi-proximal Mirror-Prox for Nonsmooth Composite Minimization
We propose a new first-order optimisation algorithm to solve high-dimensional
non-smooth composite minimisation problems. Typical examples of such problems
have an objective that decomposes into a non-smooth empirical risk part and a
non-smooth regularisation penalty. The proposed algorithm, called Semi-Proximal
Mirror-Prox, leverages the Fenchel-type representation of one part of the
objective while handling the other part of the objective via linear
minimization over the domain. The algorithm stands in contrast with more
classical proximal gradient algorithms with smoothing, which require the
computation of proximal operators at each iteration and can therefore be
impractical for high-dimensional problems. We establish the theoretical
convergence rate of Semi-Proximal Mirror-Prox, which exhibits the optimal
complexity bounds, i.e. , for the number of calls to linear
minimization oracle. We present promising experimental results showing the
interest of the approach in comparison to competing methods
Frank-Wolfe Algorithms for Saddle Point Problems
We extend the Frank-Wolfe (FW) optimization algorithm to solve constrained
smooth convex-concave saddle point (SP) problems. Remarkably, the method only
requires access to linear minimization oracles. Leveraging recent advances in
FW optimization, we provide the first proof of convergence of a FW-type saddle
point solver over polytopes, thereby partially answering a 30 year-old
conjecture. We also survey other convergence results and highlight gaps in the
theoretical underpinnings of FW-style algorithms. Motivating applications
without known efficient alternatives are explored through structured prediction
with combinatorial penalties as well as games over matching polytopes involving
an exponential number of constraints.Comment: Appears in: Proceedings of the 20th International Conference on
Artificial Intelligence and Statistics (AISTATS 2017). 39 page
Stochastic Gradient Descent-Ascent: Unified Theory and New Efficient Methods
Stochastic Gradient Descent-Ascent (SGDA) is one of the most prominent
algorithms for solving min-max optimization and variational inequalities
problems (VIP) appearing in various machine learning tasks. The success of the
method led to several advanced extensions of the classical SGDA, including
variants with arbitrary sampling, variance reduction, coordinate randomization,
and distributed variants with compression, which were extensively studied in
the literature, especially during the last few years. In this paper, we propose
a unified convergence analysis that covers a large variety of stochastic
gradient descent-ascent methods, which so far have required different
intuitions, have different applications and have been developed separately in
various communities. A key to our unified framework is a parametric assumption
on the stochastic estimates. Via our general theoretical framework, we either
recover the sharpest known rates for the known special cases or tighten them.
Moreover, to illustrate the flexibility of our approach we develop several new
variants of SGDA such as a new variance-reduced method (L-SVRGDA), new
distributed methods with compression (QSGDA, DIANA-SGDA, VR-DIANA-SGDA), and a
new method with coordinate randomization (SEGA-SGDA). Although variants of the
new methods are known for solving minimization problems, they were never
considered or analyzed for solving min-max problems and VIPs. We also
demonstrate the most important properties of the new methods through extensive
numerical experiments.Comment: 72 pages, 4 figures, 3 tables. Changes in v2: new results were added
(Theorem 2.5 and its corollaries), few typos were fixed, more clarifications
were added. Code: https://github.com/hugobb/sgd
- …