4,355 research outputs found
POLO: a POLicy-based Optimization library
We present POLO --- a C++ library for large-scale parallel optimization
research that emphasizes ease-of-use, flexibility and efficiency in algorithm
design. It uses multiple inheritance and template programming to decompose
algorithms into essential policies and facilitate code reuse. With its clear
separation between algorithm and execution policies, it provides researchers
with a simple and powerful platform for prototyping ideas, evaluating them on
different parallel computing architectures and hardware platforms, and
generating compact and efficient production code. A C-API is included for
customization and data loading in high-level languages. POLO enables users to
move seamlessly from serial to multi-threaded shared-memory and multi-node
distributed-memory executors. We demonstrate how POLO allows users to implement
state-of-the-art asynchronous parallel optimization algorithms in just a few
lines of code and report experiment results from shared and distributed-memory
computing architectures. We provide both POLO and POLO.jl, a wrapper around
POLO written in the Julia language, at https://github.com/pologrp under the
permissive MIT license.Comment: 25 pages, 7 figure
A2BCD: An Asynchronous Accelerated Block Coordinate Descent Algorithm With Optimal Complexity
In this paper, we propose the Asynchronous Accelerated Nonuniform Randomized
Block Coordinate Descent algorithm (A2BCD), the first asynchronous
Nesterov-accelerated algorithm that achieves optimal complexity. This parallel
algorithm solves the unconstrained convex minimization problem, using p
computing nodes which compute updates to shared solution vectors, in an
asynchronous fashion with no central coordination. Nodes in asynchronous
algorithms do not wait for updates from other nodes before starting a new
iteration, but simply compute updates using the most recent solution
information available. This allows them to complete iterations much faster than
traditional ones, especially at scale, by eliminating the costly
synchronization penalty of traditional algorithms.
We first prove that A2BCD converges linearly to a solution with a fast
accelerated rate that matches the recently proposed NU_ACDM, so long as the
maximum delay is not too large. Somewhat surprisingly, A2BCD pays no complexity
penalty for using outdated information. We then prove lower complexity bounds
for randomized coordinate descent methods, which show that A2BCD (and hence
NU_ACDM) has optimal complexity to within a constant factor. We confirm with
numerical experiments that A2BCD outperforms NU_ACDM, which is the current
fastest coordinate descent algorithm, even at small scale. We also derive and
analyze a second-order ordinary differential equation, which is the
continuous-time limit of our algorithm, and prove it converges linearly to a
solution with a similar accelerated rate.Comment: 33 pages, 6 figure
Distributed Nesterov gradient methods over arbitrary graphs
In this letter, we introduce a distributed Nesterov method, termed as
, that does not require doubly-stochastic weight matrices.
Instead, the implementation is based on a simultaneous application of both row-
and column-stochastic weights that makes this method applicable to arbitrary
(strongly-connected) graphs. Since constructing column-stochastic weights needs
additional information (the number of outgoing neighbors at each agent), not
available in certain communication protocols, we derive a variation, termed as
FROZEN, that only requires row-stochastic weights but at the expense of
additional iterations for eigenvector learning. We numerically study these
algorithms for various objective functions and network parameters and show that
the proposed distributed Nesterov methods achieve acceleration compared to the
current state-of-the-art methods for distributed optimization
Accelerated Distributed Nesterov Gradient Descent
This paper considers the distributed optimization problem over a network,
where the objective is to optimize a global function formed by a sum of local
functions, using only local computation and communication. We develop an
Accelerated Distributed Nesterov Gradient Descent (Acc-DNGD) method. When the
objective function is convex and -smooth, we show that it achieves a
convergence rate for all .
We also show the convergence rate can be improved to if the
objective function is a composition of a linear map and a strongly-convex and
smooth function. When the objective function is -strongly convex and
-smooth, we show that it achieves a linear convergence rate of , where is the condition number of
the objective, and is some constant that does not depend on
.Comment: 55 pages, 8 figure
Improving Fast Dual Ascent for MPC - Part I: The Distributed Case
In dual decomposition, the dual to an optimization problem with a specific
structure is solved in distributed fashion using (sub)gradient and recently
also fast gradient methods. The traditional dual decomposition suffers from two
main short-comings. The first is that the convergence is often slow, although
fast gradient methods have significantly improved the situation. The second is
that computation of the optimal step-size requires centralized computations,
which hinders a fully distributed implementation of the algorithm. In this
paper, the first issue is addressed by providing a tighter characterization of
the dual function than what has previously been reported in the literature.
Then a distributed and a parallel algorithm are presented in which the provided
dual function approximation is minimized in each step. Since the approximation
is more accurate than the approximation used in standard and fast dual
decomposition, the convergence properties are improved. For the second issue,
we extend a recent result to allow for a fully distributed parameter selection
in the algorithm. Further, we show how to apply the proposed algorithms to
optimization problems arising in distributed model predictive control (DMPC)
and show that the proposed distributed algorithm enjoys distributed
reconfiguration, i.e. plug-and-play, in the DMPC context
Optimal Algorithms for Distributed Optimization
In this paper, we study the optimal convergence rate for distributed convex
optimization problems in networks. We model the communication restrictions
imposed by the network as a set of affine constraints and provide optimal
complexity bounds for four different setups, namely: the function F(\xb)
\triangleq \sum_{i=1}^{m}f_i(\xb) is strongly convex and smooth, either
strongly convex or smooth or just convex. Our results show that Nesterov's
accelerated gradient descent on the dual problem can be executed in a
distributed manner and obtains the same optimal rates as in the centralized
version of the problem (up to constant or logarithmic factors) with an
additional cost related to the spectral gap of the interaction matrix. Finally,
we discuss some extensions to the proposed setup such as proximal friendly
functions, time-varying graphs, improvement of the condition numbers
Hybrid Conditional Gradient - Smoothing Algorithms with Applications to Sparse and Low Rank Regularization
We study a hybrid conditional gradient - smoothing algorithm (HCGS) for
solving composite convex optimization problems which contain several terms over
a bounded set. Examples of these include regularization problems with several
norms as penalties and a norm constraint. HCGS extends conditional gradient
methods to cases with multiple nonsmooth terms, in which standard conditional
gradient methods may be difficult to apply. The HCGS algorithm borrows
techniques from smoothing proximal methods and requires first-order
computations (subgradients and proximity operations). Unlike proximal methods,
HCGS benefits from the advantages of conditional gradient methods, which render
it more efficient on certain large scale optimization problems. We demonstrate
these advantages with simulations on two matrix optimization problems:
regularization of matrices with combined and trace norm penalties; and
a convex relaxation of sparse PCA
Harnessing Smoothness to Accelerate Distributed Optimization
There has been a growing effort in studying the distributed optimization
problem over a network. The objective is to optimize a global function formed
by a sum of local functions, using only local computation and communication.
Literature has developed consensus-based distributed (sub)gradient descent
(DGD) methods and has shown that they have the same convergence rate
as the centralized (sub)gradient methods (CGD)
when the function is convex but possibly nonsmooth. However, when the function
is convex and smooth, under the framework of DGD, it is unclear how to harness
the smoothness to obtain a faster convergence rate comparable to CGD's
convergence rate. In this paper, we propose a distributed algorithm that,
despite using the same amount of communication per iteration as DGD, can
effectively harnesses the function smoothness and converge to the optimum with
a rate of . If the objective function is further strongly
convex, our algorithm has a linear convergence rate. Both rates match the
convergence rate of CGD. The key step in our algorithm is a novel gradient
estimation scheme that uses history information to achieve fast and accurate
estimation of the average gradient. To motivate the necessity of history
information, we also show that it is impossible for a class of distributed
algorithms like DGD to achieve a linear convergence rate without using history
information even if the objective function is strongly convex and smooth.Comment: 30 pages, 4 figure
A Unification and Generalization of Exact Distributed First Order Methods
Recently, there has been significant progress in the development of
distributed first order methods. (At least) two different types of methods,
designed from very different perspectives, have been proposed that achieve both
exact and linear convergence when a constant step size is used -- a favorable
feature that was not achievable by most prior methods. In this paper, we unify,
generalize, and improve convergence speed of these exact distributed first
order methods. We first carry out a novel unifying analysis that sheds light on
how the different existing methods compare. The analysis reveals that a major
difference between the methods is on how a past dual gradient of an associated
augmented Lagrangian dual function is weighted. We then capitalize on the
insights from the analysis to derive a novel method -- with a tuned past
gradient weighting -- that improves upon the existing methods. We establish for
the proposed generalized method global R-linear convergence rate under strongly
convex costs with Lipschitz continuous gradients.Comment: revised Dec 17, 201
Learning Supervised PageRank with Gradient-Based and Gradient-Free Optimization Methods
In this paper, we consider a non-convex loss-minimization problem of learning
Supervised PageRank models, which can account for some properties not
considered by classical approaches such as the classical PageRank model. We
propose gradient-based and random gradient-free methods to solve this problem.
Our algorithms are based on the concept of an inexact oracle and unlike the
state state-of-the-art gradient-based method we manage to provide theoretically
the convergence rate guarantees for both of them. In particular, under the
assumption of local convexity of the loss function, our random gradient-free
algorithm guarantees decrease of the loss function value expectation. At the
same time, we theoretically justify that without convexity assumption for the
loss function our gradient-based algorithm allows to find a point where the
stationary condition is fulfilled with a given accuracy. For both proposed
optimization algorithms, we find the settings of hyperparameters which give the
lowest complexity (i.e., the number of arithmetic operations needed to achieve
the given accuracy of the solution of the loss-minimization problem). The
resulting estimates of the complexity are also provided. Finally, we apply
proposed optimization algorithms to the web page ranking problem and compare
proposed and state-of-the-art algorithms in terms of the considered loss
function.Comment: 34 pages, 5 figure
- …