1,955 research outputs found
Inertial Proximal Incremental Aggregated Gradient Method
In this paper, we introduce an inertial version of the Proximal Incremental
Aggregated Gradient method (PIAG) for minimizing the sum of smooth convex
component functions and a possibly nonsmooth convex regularization function.
Theoretically, we show that the inertial Proximal Incremental Aggregated
Gradiend (iPIAG) method enjoys a global linear convergence under a quadratic
growth condition, which is strictly weaker than strong convexity, provided that
the stepsize is not larger than a constant. Moreover, we present two numerical
expreiments which demonstrate that iPIAG outperforms the original PIAG.Comment: 17 pages, 3 figure
Proximal-Like Incremental Aggregated Gradient Method with Linear Convergence under Bregman Distance Growth Conditions
We introduce a unified algorithmic framework, called proximal-like
incremental aggregated gradient (PLIAG) method, for minimizing the sum of a
convex function that consists of additive relatively smooth convex components
and a proper lower semi-continuous convex regularization function, over an
abstract feasible set whose geometry can be captured by using the domain of a
Legendre function. The PLIAG method includes many existing algorithms in the
literature as special cases such as the proximal gradient method, the Bregman
proximal gradient method (also called NoLips algorithm), the incremental
aggregated gradient method, the incremental aggregated proximal method, and the
proximal incremental aggregated gradient method. It also includes some novel
interesting iteration schemes. First we show the PLIAG method is globally
sublinearly convergent without requiring a growth condition, which extends the
sublinear convergence result for the proximal gradient algorithm to incremental
aggregated type first order methods. Then by embedding a so-called Bregman
distance growth condition into a descent-type lemma to construct a special
Lyapunov function, we show that the PLIAG method is globally linearly
convergent in terms of both function values and Bregman distances to the
optimal solution set, provided that the step size is not greater than some
positive constant. These convergence results derived in this paper are all
established beyond the standard assumptions in the literature (i.e., without
requiring the strong convexity and the Lipschitz gradient continuity of the
smooth part of the objective). When specialized to many existing algorithms,
our results recover or supplement their convergence results under strictly
weaker conditions.Comment: 28 page
Linear Convergence of Cyclic SAGA
In this work, we present and analyze C-SAGA, a (deterministic) cyclic variant
of SAGA. C-SAGA is an incremental gradient method that minimizes a sum of
differentiable convex functions by cyclically accessing their gradients. Even
though the theory of stochastic algorithms is more mature than that of cyclic
counterparts in general, practitioners often prefer cyclic algorithms. We prove
C-SAGA converges linearly under the standard assumptions. Then, we compare the
rate of convergence with the full gradient method, (stochastic) SAGA, and
incremental aggregated gradient (IAG), theoretically and experimentally.Comment: Published in Optimization Letter
Incremental Aggregated Proximal and Augmented Lagrangian Algorithms
We consider minimization of the sum of a large number of convex functions,
and we propose an incremental aggregated version of the proximal algorithm,
which bears similarity to the incremental aggregated gradient and subgradient
methods that have received a lot of recent attention. Under cost function
differentiability and strong convexity assumptions, we show linear convergence
for a sufficiently small constant stepsize. This result also applies to
distributed asynchronous variants of the method, involving bounded
interprocessor communication delays.
We then consider dual versions of incremental proximal algorithms, which are
incremental augmented Lagrangian methods for separable equality-constrained
optimization problems. Contrary to the standard augmented Lagrangian method,
these methods admit decomposition in the minimization of the augmented
Lagrangian, and update the multipliers far more frequently. Our incremental
aggregated augmented Lagrangian methods bear similarity to several known
decomposition algorithms, including the alternating direction method of
multipliers (ADMM) and more recent variations. We compare these methods in
terms of their properties, and highlight their potential advantages and
limitations.
We also address the solution of separable inequality-constrained optimization
problems through the use of nonquadratic augmented Lagrangiias such as the
exponential, and we dually consider a corresponding incremental aggregated
version of the proximal algorithm that uses nonquadratic regularization, such
as an entropy function. We finally propose a closely related linearly
convergent method for minimization of large differentiable sums subject to an
orthant constraint, which may be viewed as an incremental aggregated version of
the mirror descent method
Curvature-aided Incremental Aggregated Gradient Method
We propose a new algorithm for finite sum optimization which we call the
curvature-aided incremental aggregated gradient (CIAG) method. Motivated by the
problem of training a classifier for a d-dimensional problem, where the number
of training data is and , the CIAG method seeks to
accelerate incremental aggregated gradient (IAG) methods using aids from the
curvature (or Hessian) information, while avoiding the evaluation of matrix
inverses required by the incremental Newton (IN) method. Specifically, our idea
is to exploit the incrementally aggregated Hessian matrix to trace the full
gradient vector at every incremental step, therefore achieving an improved
linear convergence rate over the state-of-the-art IAG methods. For strongly
convex problems, the fast linear convergence rate requires the objective
function to be close to quadratic, or the initial point to be close to optimal
solution. Importantly, we show that running one iteration of the CIAG method
yields the same improvement to the optimality gap as running one iteration of
the full gradient method, while the complexity is for CIAG and
for the full gradient. Overall, the CIAG method strikes a balance between the
high computation complexity incremental Newton-type methods and the slow IAG
method. Our numerical results support the theoretical findings and show that
the CIAG method often converges with much fewer iterations than IAG, and
requires much shorter running time than IN when the problem dimension is high.Comment: Final version submitted to Allerton Conference 2017 on Oct 8, 201
An Inertial Parallel and Asynchronous Fixed-Point Iteration for Convex Optimization
Two characteristics that make convex decomposition algorithms attractive are
simplicity of operations and generation of parallelizable structures. In
principle, these schemes require that all coordinates update at the same time,
i.e., they are synchronous by construction. Introducing asynchronicity in the
updates can resolve several issues that appear in the synchronous case, like
load imbalances in the computations or failing communication links. However,
and to the best of our knowledge, there are no instances of asynchronous
versions of commonly-known algorithms combined with inertial acceleration
techniques. In this work we propose an inertial asynchronous and parallel
fixed-point iteration from which several new versions of existing convex
optimization algorithms emanate. Departing from the norm that the frequency of
the coordinates' updates should comply to some prior distribution, we propose a
scheme where the only requirement is that the coordinates update within a
bounded interval. We prove convergence of the sequence of iterates generated by
the scheme at a linear rate. One instance of the proposed scheme is implemented
to solve a distributed optimization load sharing problem in a smart grid
setting and its superiority with respect to the non-accelerated version is
illustrated
POLO: a POLicy-based Optimization library
We present POLO --- a C++ library for large-scale parallel optimization
research that emphasizes ease-of-use, flexibility and efficiency in algorithm
design. It uses multiple inheritance and template programming to decompose
algorithms into essential policies and facilitate code reuse. With its clear
separation between algorithm and execution policies, it provides researchers
with a simple and powerful platform for prototyping ideas, evaluating them on
different parallel computing architectures and hardware platforms, and
generating compact and efficient production code. A C-API is included for
customization and data loading in high-level languages. POLO enables users to
move seamlessly from serial to multi-threaded shared-memory and multi-node
distributed-memory executors. We demonstrate how POLO allows users to implement
state-of-the-art asynchronous parallel optimization algorithms in just a few
lines of code and report experiment results from shared and distributed-memory
computing architectures. We provide both POLO and POLO.jl, a wrapper around
POLO written in the Julia language, at https://github.com/pologrp under the
permissive MIT license.Comment: 25 pages, 7 figure
Achieving Geometric Convergence for Distributed Optimization over Time-Varying Graphs
This paper considers the problem of distributed optimization over
time-varying graphs. For the case of undirected graphs, we introduce a
distributed algorithm, referred to as DIGing, based on a combination of a
distributed inexact gradient method and a gradient tracking technique. The
DIGing algorithm uses doubly stochastic mixing matrices and employs fixed
step-sizes and, yet, drives all the agents' iterates to a global and consensual
minimizer. When the graphs are directed, in which case the implementation of
doubly stochastic mixing matrices is unrealistic, we construct an algorithm
that incorporates the push-sum protocol into the DIGing structure, thus
obtaining Push-DIGing algorithm. The Push-DIGing uses column stochastic
matrices and fixed step-sizes, but it still converges to a global and
consensual minimizer. Under the strong convexity assumption, we prove that the
algorithms converge at R-linear (geometric) rates as long as the step-sizes do
not exceed some upper bounds. We establish explicit estimates for the
convergence rates. When the graph is undirected it shows that DIGing scales
polynomially in the number of agents. We also provide some numerical
experiments to demonstrate the efficacy of the proposed algorithms and to
validate our theoretical findings
An Asynchronous Distributed Framework for Large-scale Learning Based on Parameter Exchanges
In many distributed learning problems, the heterogeneous loading of computing
machines may harm the overall performance of synchronous strategies. In this
paper, we propose an effective asynchronous distributed framework for the
minimization of a sum of smooth functions, where each machine performs
iterations in parallel on its local function and updates a shared parameter
asynchronously. In this way, all machines can continuously work even though
they do not have the latest version of the shared parameter. We prove the
convergence of the consistency of this general distributed asynchronous method
for gradient iterations then show its efficiency on the matrix factorization
problem for recommender systems and on binary classification.Comment: 16 page
Nonasymptotic convergence of stochastic proximal point algorithms for constrained convex optimization
A very popular approach for solving stochastic optimization problems is the
stochastic gradient descent method (SGD). Although the SGD iteration is
computationally cheap and the practical performance of this method may be
satisfactory under certain circumstances, there is recent evidence of its
convergence difficulties and instability for unappropriate parameters choice.
To avoid these drawbacks naturally introduced by the SGD scheme, the stochastic
proximal point algorithms have been recently considered in the literature. We
introduce a new variant of the stochastic proximal point method (SPP) for
solving stochastic convex optimization problems subject to (in)finite
intersection of constraints satisfying a linear regularity type condition. For
the newly introduced SPP scheme we prove new nonasymptotic convergence results.
In particular, for convex and Lipschitz continuous objective functions, we
prove nonasymptotic estimates for the rate of convergence in terms of the
expected value function gap of order , where is the
iteration counter. We also derive better nonasymptotic bounds for the rate of
convergence in terms of expected quadratic distance from the iterates to the
optimal solution for smooth strongly convex objective functions, which in the
best case is of order . Since these convergence rates can be
attained by our SPP algorithm only under some natural restrictions on the
stepsize, we also introduce a restarting variant of SPP method that overcomes
these difficulties and derive the corresponding nonasymptotic convergence
rates. Numerical evidence supports the effectiveness of our methods in
real-world problems
- …