252 research outputs found

    NESTT: A Nonconvex Primal-Dual Splitting Method for Distributed and Stochastic Optimization

    Get PDF
    We study a stochastic and distributed algorithm for nonconvex problems whose objective consists of a sum of NN nonconvex Li/NL_i/N-smooth functions, plus a nonsmooth regularizer. The proposed NonconvEx primal-dual SpliTTing (NESTT) algorithm splits the problem into NN subproblems, and utilizes an augmented Lagrangian based primal-dual scheme to solve it in a distributed and stochastic manner. With a special non-uniform sampling, a version of NESTT achieves ŌĶ\epsilon-stationary solution using O((‚ąĎi=1NLi/N)2/ŌĶ)\mathcal{O}((\sum_{i=1}^N\sqrt{L_i/N})^2/\epsilon) gradient evaluations, which can be up to O(N)\mathcal{O}(N) times better than the (proximal) gradient descent methods. It also achieves Q-linear convergence rate for nonconvex ‚Ąď1\ell_1 penalized quadratic problems with polyhedral constraints. Further, we reveal a fundamental connection between primal-dual based methods and a few primal only methods such as IAG/SAG/SAGA.Comment: 35 pages, 2 figure

    Iteration Complexity Analysis of Block Coordinate Descent Methods

    Get PDF
    In this paper, we provide a unified iteration complexity analysis for a family of general block coordinate descent (BCD) methods, covering popular methods such as the block coordinate gradient descent (BCGD) and the block coordinate proximal gradient (BCPG), under various different coordinate update rules. We unify these algorithms under the so-called Block Successive Upper-bound Minimization (BSUM) framework, and show that for a broad class of multi-block nonsmooth convex problems, all algorithms covered by the BSUM framework achieve a global sublinear iteration complexity of O(1/r)O(1/r), where r is the iteration index. Moreover, for the case of block coordinate minimization (BCM) where each block is minimized exactly, we establish the sublinear convergence rate of O(1/r)O(1/r) without per block strong convexity assumption. Further, we show that when there are only two blocks of variables, a special BSUM algorithm with Gauss-Seidel rule can be accelerated to achieve an improved rate of O(1/r2)O(1/r^2)

    Asynchronous Distributed ADMM for Large-Scale Optimization- Part I: Algorithm and Convergence Analysis

    Get PDF
    Aiming at solving large-scale learning problems, this paper studies distributed optimization methods based on the alternating direction method of multipliers (ADMM). By formulating the learning problem as a consensus problem, the ADMM can be used to solve the consensus problem in a fully parallel fashion over a computer network with a star topology. However, traditional synchronized computation does not scale well with the problem size, as the speed of the algorithm is limited by the slowest workers. This is particularly true in a heterogeneous network where the computing nodes experience different computation and communication delays. In this paper, we propose an asynchronous distributed ADMM (AD-AMM) which can effectively improve the time efficiency of distributed optimization. Our main interest lies in analyzing the convergence conditions of the AD-ADMM, under the popular partially asynchronous model, which is defined based on a maximum tolerable delay of the network. Specifically, by considering general and possibly non-convex cost functions, we show that the AD-ADMM is guaranteed to converge to the set of Karush-Kuhn-Tucker (KKT) points as long as the algorithm parameters are chosen appropriately according to the network delay. We further illustrate that the asynchrony of the ADMM has to be handled with care, as slightly modifying the implementation of the AD-ADMM can jeopardize the algorithm convergence, even under a standard convex setting.Comment: 37 page

    Parallel Successive Convex Approximation for Nonsmooth Nonconvex Optimization

    Get PDF
    Consider the problem of minimizing the sum of a smooth (possibly non-convex) and a convex (possibly nonsmooth) function involving a large number of variables. A popular approach to solve this problem is the block coordinate descent (BCD) method whereby at each iteration only one variable block is updated while the remaining variables are held fixed. With the recent advances in the developments of the multi-core parallel processing technology, it is desirable to parallelize the BCD method by allowing multiple blocks to be updated simultaneously at each iteration of the algorithm. In this work, we propose an inexact parallel BCD approach where at each iteration, a subset of the variables is updated in parallel by minimizing convex approximations of the original objective function. We investigate the convergence of this parallel BCD method for both randomized and cyclic variable selection rules. We analyze the asymptotic and non-asymptotic convergence behavior of the algorithm for both convex and non-convex objective functions. The numerical experiments suggest that for a special case of Lasso minimization problem, the cyclic block selection rule can outperform the randomized rule

    Averaged Iterative Water-Filling Algorithm: Robustness and Convergence

    Full text link
    The convergence properties of the Iterative water-filling (IWF) based algorithms have been derived in the ideal situation where the transmitters in the network are able to obtain the exact value of the interference plus noise (IPN) experienced at the corresponding receivers in each iteration of the algorithm. However, these algorithms are not robust because they diverge when there is it time-varying estimation error of the IPN, a situation that arises in real communication system. In this correspondence, we propose an algorithm that possesses convergence guarantees in the presence of various forms of such time-varying error. Moreover, we also show by simulation that in scenarios where the interference is strong, the conventional IWF diverges while our proposed algorithm still converges
    • ‚Ķ