331 research outputs found
Block stochastic gradient iteration for convex and nonconvex optimization
The stochastic gradient (SG) method can minimize an objective function
composed of a large number of differentiable functions, or solve a stochastic
optimization problem, to a moderate accuracy. The block coordinate
descent/update (BCD) method, on the other hand, handles problems with multiple
blocks of variables by updating them one at a time; when the blocks of
variables are easier to update individually than together, BCD has a lower
per-iteration cost. This paper introduces a method that combines the features
of SG and BCD for problems with many components in the objective and with
multiple (blocks of) variables.
Specifically, a block stochastic gradient (BSG) method is proposed for
solving both convex and nonconvex programs. At each iteration, BSG approximates
the gradient of the differentiable part of the objective by randomly sampling a
small set of data or sampling a few functions from the sum term in the
objective, and then, using those samples, it updates all the blocks of
variables in either a deterministic or a randomly shuffled order. Its
convergence for both convex and nonconvex cases are established in different
senses. In the convex case, the proposed method has the same order of
convergence rate as the SG method. In the nonconvex case, its convergence is
established in terms of the expected violation of a first-order optimality
condition. The proposed method was numerically tested on problems including
stochastic least squares and logistic regression, which are convex, as well as
low-rank tensor recovery and bilinear logistic regression, which are nonconvex
On the Convergence of Alternating Direction Lagrangian Methods for Nonconvex Structured Optimization Problems
Nonconvex and structured optimization problems arise in many engineering
applications that demand scalable and distributed solution methods. The study
of the convergence properties of these methods is in general difficult due to
the nonconvexity of the problem. In this paper, two distributed solution
methods that combine the fast convergence properties of augmented
Lagrangian-based methods with the separability properties of alternating
optimization are investigated. The first method is adapted from the classic
quadratic penalty function method and is called the Alternating Direction
Penalty Method (ADPM). Unlike the original quadratic penalty function method,
in which single-step optimizations are adopted, ADPM uses an alternating
optimization, which in turn makes it scalable. The second method is the
well-known Alternating Direction Method of Multipliers (ADMM). It is shown that
ADPM for nonconvex problems asymptotically converges to a primal feasible point
under mild conditions and an additional condition ensuring that it
asymptotically reaches the standard first order necessary conditions for local
optimality are introduced. In the case of the ADMM, novel sufficient conditions
under which the algorithm asymptotically reaches the standard first order
necessary conditions are established. Based on this, complete convergence of
ADMM for a class of low dimensional problems are characterized. Finally, the
results are illustrated by applying ADPM and ADMM to a nonconvex localization
problem in wireless sensor networks.Comment: 13 pages, 6 figure
Asynchronous and Parallel Distributed Pose Graph Optimization
We present Asynchronous Stochastic Parallel Pose Graph Optimization (ASAPP),
the first asynchronous algorithm for distributed pose graph optimization (PGO)
in multi-robot simultaneous localization and mapping. By enabling robots to
optimize their local trajectory estimates without synchronization, ASAPP offers
resiliency against communication delays and alleviates the need to wait for
stragglers in the network. Furthermore, ASAPP can be applied on the
rank-restricted relaxations of PGO, a crucial class of non-convex Riemannian
optimization problems that underlies recent breakthroughs on globally optimal
PGO. Under bounded delay, we establish the global first-order convergence of
ASAPP using a sufficiently small stepsize. The derived stepsize depends on the
worst-case delay and inherent problem sparsity, and furthermore matches known
result for synchronous algorithms when there is no delay. Numerical evaluations
on simulated and real-world datasets demonstrate favorable performance compared
to state-of-the-art synchronous approach, and show ASAPP's resilience against a
wide range of delays in practice.Comment: full paper with appendice
- …