Search CORE

1,381 research outputs found

Distributed Big-Data Optimization via Block-Iterative Convexification and Averaging

Author: Notarnicola Ivano
Notarstefano Giuseppe
Scutari Gesualdo
Sun Ying
Publication venue
Publication date: 01/01/2017
Field of study

In this paper, we study distributed big-data nonconvex optimization in multi-agent networks. We consider the (constrained) minimization of the sum of a smooth (possibly) nonconvex function, i.e., the agents' sum-utility, plus a convex (possibly) nonsmooth regularizer. Our interest is in big-data problems wherein there is a large number of variables to optimize. If treated by means of standard distributed optimization algorithms, these large-scale problems may be intractable, due to the prohibitive local computation and communication burden at each node. We propose a novel distributed solution method whereby at each iteration agents optimize and then communicate (in an uncoordinated fashion) only a subset of their decision variables. To deal with non-convexity of the cost function, the novel scheme hinges on Successive Convex Approximation (SCA) techniques coupled with i) a tracking mechanism instrumental to locally estimate gradient averages; and ii) a novel block-wise consensus-based protocol to perform local block-averaging operations and gradient tacking. Asymptotic convergence to stationary solutions of the nonconvex problem is established. Finally, numerical results show the effectiveness of the proposed algorithm and highlight how the block dimension impacts on the communication overhead and practical convergence speed

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Distributed Nonconvex Multiagent Optimization Over Time-Varying Networks

Author: Palomar Daniel
Scutari Gesualdo
Sun Ying
Publication venue
Publication date: 14/12/2016
Field of study

We study nonconvex distributed optimization in multiagent networks where the communications between nodes is modeled as a time-varying sequence of arbitrary digraphs. We introduce a novel broadcast-based distributed algorithmic framework for the (constrained) minimization of the sum of a smooth (possibly nonconvex and nonseparable) function, i.e., the agents' sum-utility, plus a convex (possibly nonsmooth and nonseparable) regularizer. The latter is usually employed to enforce some structure in the solution, typically sparsity. The proposed method hinges on Successive Convex Approximation (SCA) techniques coupled with i) a tracking mechanism instrumental to locally estimate the gradients of agents' cost functions; and ii) a novel broadcast protocol to disseminate information and distribute the computation among the agents. Asymptotic convergence to stationary solutions is established. A key feature of the proposed algorithm is that it neither requires the double-stochasticity of the consensus matrices (but only column stochasticity) nor the knowledge of the graph sequence to implement. To the best of our knowledge, the proposed framework is the first broadcast-based distributed algorithm for convex and nonconvex constrained optimization over arbitrary, time-varying digraphs. Numerical results show that our algorithm outperforms current schemes on both convex and nonconvex problems.Comment: Copyright 2001 SS&C. Published in the Proceedings of the 50th annual Asilomar conference on signals, systems, and computers, Nov. 6-9, 2016, CA, US

arXiv.org e-Print Archive

Crossref

Distributed Big-Data Optimization via Block Communications

Author: Notarnicola Ivano
Notarstefano Giuseppe
Scutari Gesualdo
Sun Ying
Publication venue
Publication date: 01/01/2017
Field of study

We study distributed multi-agent large-scale optimization problems, wherein the cost function is composed of a smooth possibly nonconvex sum-utility plus a DC (Difference-of-Convex) regularizer. We consider the scenario where the dimension of the optimization variables is so large that optimizing and/or transmitting the entire set of variables could cause unaffordable computation and communication overhead. To address this issue, we propose the first distributed algorithm whereby agents optimize and communicate only a portion of their local variables. The scheme hinges on successive convex approximation (SCA) to handle the nonconvexity of the objective function, coupled with a novel block-signal tracking scheme, aiming at locally estimating the average of the agents' gradients. Asymptotic convergence to stationary solutions of the nonconvex problem is established. Numerical results on a sparse regression problem show the effectiveness of the proposed algorithm and the impact of the block size on its practical convergence speed and communication cost

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Stochastic Training of Neural Networks via Successive Convex Approximations

Author: Di Lorenzo Paolo
Scardapane Simone
Publication venue
Publication date: 15/06/2017
Field of study

This paper proposes a new family of algorithms for training neural networks (NNs). These are based on recent developments in the field of non-convex optimization, going under the general name of successive convex approximation (SCA) techniques. The basic idea is to iteratively replace the original (non-convex, highly dimensional) learning problem with a sequence of (strongly convex) approximations, which are both accurate and simple to optimize. Differently from similar ideas (e.g., quasi-Newton algorithms), the approximations can be constructed using only first-order information of the neural network function, in a stochastic fashion, while exploiting the overall structure of the learning problem for a faster convergence. We discuss several use cases, based on different choices for the loss function (e.g., squared loss and cross-entropy loss), and for the regularization of the NN's weights. We experiment on several medium-sized benchmark problems, and on a large-scale dataset involving simulated physical data. The results show how the algorithm outperforms state-of-the-art techniques, providing faster convergence to a better minimum. Additionally, we show how the algorithm can be easily parallelized over multiple computational units without hindering its performance. In particular, each computational unit can optimize a tailored surrogate function defined on a randomly assigned subset of the input variables, whose dimension can be selected depending entirely on the available computational power.Comment: Preprint submitted to IEEE Transactions on Neural Networks and Learning System

arXiv.org e-Print Archive

Archivio della ricerca- Università di Roma La Sapienza