225 research outputs found

### NESTT: A Nonconvex Primal-Dual Splitting Method for Distributed and Stochastic Optimization

We study a stochastic and distributed algorithm for nonconvex problems whose
objective consists of a sum of $N$ nonconvex $L_i/N$-smooth functions, plus a
nonsmooth regularizer. The proposed NonconvEx primal-dual SpliTTing (NESTT)
algorithm splits the problem into $N$ subproblems, and utilizes an augmented
Lagrangian based primal-dual scheme to solve it in a distributed and stochastic
manner. With a special non-uniform sampling, a version of NESTT achieves
$\epsilon$-stationary solution using
$\mathcal{O}((\sum_{i=1}^N\sqrt{L_i/N})^2/\epsilon)$ gradient evaluations,
which can be up to $\mathcal{O}(N)$ times better than the (proximal) gradient
descent methods. It also achieves Q-linear convergence rate for nonconvex
$\ell_1$ penalized quadratic problems with polyhedral constraints. Further, we
reveal a fundamental connection between primal-dual based methods and a few
primal only methods such as IAG/SAG/SAGA.Comment: 35 pages, 2 figure

### Iteration Complexity Analysis of Block Coordinate Descent Methods

In this paper, we provide a unified iteration complexity analysis for a
family of general block coordinate descent (BCD) methods, covering popular
methods such as the block coordinate gradient descent (BCGD) and the block
coordinate proximal gradient (BCPG), under various different coordinate update
rules. We unify these algorithms under the so-called Block Successive
Upper-bound Minimization (BSUM) framework, and show that for a broad class of
multi-block nonsmooth convex problems, all algorithms covered by the BSUM
framework achieve a global sublinear iteration complexity of $O(1/r)$, where r
is the iteration index. Moreover, for the case of block coordinate minimization
(BCM) where each block is minimized exactly, we establish the sublinear
convergence rate of $O(1/r)$ without per block strong convexity assumption.
Further, we show that when there are only two blocks of variables, a special
BSUM algorithm with Gauss-Seidel rule can be accelerated to achieve an improved
rate of $O(1/r^2)$

### Asynchronous Distributed ADMM for Large-Scale Optimization- Part I: Algorithm and Convergence Analysis

Aiming at solving large-scale learning problems, this paper studies
distributed optimization methods based on the alternating direction method of
multipliers (ADMM). By formulating the learning problem as a consensus problem,
the ADMM can be used to solve the consensus problem in a fully parallel fashion
over a computer network with a star topology. However, traditional synchronized
computation does not scale well with the problem size, as the speed of the
algorithm is limited by the slowest workers. This is particularly true in a
heterogeneous network where the computing nodes experience different
computation and communication delays. In this paper, we propose an asynchronous
distributed ADMM (AD-AMM) which can effectively improve the time efficiency of
distributed optimization. Our main interest lies in analyzing the convergence
conditions of the AD-ADMM, under the popular partially asynchronous model,
which is defined based on a maximum tolerable delay of the network.
Specifically, by considering general and possibly non-convex cost functions, we
show that the AD-ADMM is guaranteed to converge to the set of
Karush-Kuhn-Tucker (KKT) points as long as the algorithm parameters are chosen
appropriately according to the network delay. We further illustrate that the
asynchrony of the ADMM has to be handled with care, as slightly modifying the
implementation of the AD-ADMM can jeopardize the algorithm convergence, even
under a standard convex setting.Comment: 37 page

### Parallel Successive Convex Approximation for Nonsmooth Nonconvex Optimization

Consider the problem of minimizing the sum of a smooth (possibly non-convex)
and a convex (possibly nonsmooth) function involving a large number of
variables. A popular approach to solve this problem is the block coordinate
descent (BCD) method whereby at each iteration only one variable block is
updated while the remaining variables are held fixed. With the recent advances
in the developments of the multi-core parallel processing technology, it is
desirable to parallelize the BCD method by allowing multiple blocks to be
updated simultaneously at each iteration of the algorithm. In this work, we
propose an inexact parallel BCD approach where at each iteration, a subset of
the variables is updated in parallel by minimizing convex approximations of the
original objective function. We investigate the convergence of this parallel
BCD method for both randomized and cyclic variable selection rules. We analyze
the asymptotic and non-asymptotic convergence behavior of the algorithm for
both convex and non-convex objective functions. The numerical experiments
suggest that for a special case of Lasso minimization problem, the cyclic block
selection rule can outperform the randomized rule

### Averaged Iterative Water-Filling Algorithm: Robustness and Convergence

The convergence properties of the Iterative water-filling (IWF) based
algorithms have been derived in the ideal situation where the transmitters in
the network are able to obtain the exact value of the interference plus noise
(IPN) experienced at the corresponding receivers in each iteration of the
algorithm. However, these algorithms are not robust because they diverge when
there is it time-varying estimation error of the IPN, a situation that arises
in real communication system. In this correspondence, we propose an algorithm
that possesses convergence guarantees in the presence of various forms of such
time-varying error. Moreover, we also show by simulation that in scenarios
where the interference is strong, the conventional IWF diverges while our
proposed algorithm still converges

- …