240 research outputs found
Optimal Algorithms for Non-Smooth Distributed Optimization in Networks
In this work, we consider the distributed optimization of non-smooth convex
functions using a network of computing units. We investigate this problem under
two regularity assumptions: (1) the Lipschitz continuity of the global
objective function, and (2) the Lipschitz continuity of local individual
functions. Under the local regularity assumption, we provide the first optimal
first-order decentralized algorithm called multi-step primal-dual (MSPD) and
its corresponding optimal convergence rate. A notable aspect of this result is
that, for non-smooth functions, while the dominant term of the error is in
, the structure of the communication network only impacts a
second-order term in , where is time. In other words, the error due
to limits in communication resources decreases at a fast rate even in the case
of non-strongly-convex objective functions. Under the global regularity
assumption, we provide a simple yet efficient algorithm called distributed
randomized smoothing (DRS) based on a local smoothing of the objective
function, and show that DRS is within a multiplicative factor of the
optimal convergence rate, where is the underlying dimension.Comment: 17 page
GMRES-Accelerated ADMM for Quadratic Objectives
We consider the sequence acceleration problem for the alternating direction
method-of-multipliers (ADMM) applied to a class of equality-constrained
problems with strongly convex quadratic objectives, which frequently arise as
the Newton subproblem of interior-point methods. Within this context, the ADMM
update equations are linear, the iterates are confined within a Krylov
subspace, and the General Minimum RESidual (GMRES) algorithm is optimal in its
ability to accelerate convergence. The basic ADMM method solves a
-conditioned problem in iterations. We give
theoretical justification and numerical evidence that the GMRES-accelerated
variant consistently solves the same problem in iterations
for an order-of-magnitude reduction in iterations, despite a worst-case bound
of iterations. The method is shown to be competitive against
standard preconditioned Krylov subspace methods for saddle-point problems. The
method is embedded within SeDuMi, a popular open-source solver for conic
optimization written in MATLAB, and used to solve many large-scale semidefinite
programs with error that decreases like , instead of ,
where is the iteration index.Comment: 31 pages, 7 figures. Accepted for publication in SIAM Journal on
Optimization (SIOPT
Dual-Free Stochastic Decentralized Optimization with Variance Reduction
We consider the problem of training machine learning models on distributed
data in a decentralized way. For finite-sum problems, fast single-machine
algorithms for large datasets rely on stochastic updates combined with variance
reduction. Yet, existing decentralized stochastic algorithms either do not
obtain the full speedup allowed by stochastic updates, or require oracles that
are more expensive than regular gradients. In this work, we introduce a
Decentralized stochastic algorithm with Variance Reduction called DVR. DVR only
requires computing stochastic gradients of the local functions, and is
computationally as fast as a standard stochastic variance-reduced algorithms
run on a fraction of the dataset, where is the number of nodes. To
derive DVR, we use Bregman coordinate descent on a well-chosen dual problem,
and obtain a dual-free algorithm using a specific Bregman divergence. We give
an accelerated version of DVR based on the Catalyst framework, and illustrate
its effectiveness with simulations on real data
- …