59 research outputs found

    A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method

    Full text link
    In this note, we present a new averaging technique for the projected stochastic subgradient method. By using a weighted average with a weight of t+1 for each iterate w_t at iteration t, we obtain the convergence rate of O(1/t) with both an easy proof and an easy implementation. The new scheme is compared empirically to existing techniques, with similar performance behavior.Comment: 8 pages, 6 figures. Changes with previous version: Added reference to concurrently submitted work arXiv:1212.1824v1; clarifications added; typos corrected; title changed to 'subgradient method' as 'subgradient descent' is misnome

    Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

    Full text link
    In this work, we consider the distributed optimization of non-smooth convex functions using a network of computing units. We investigate this problem under two regularity assumptions: (1) the Lipschitz continuity of the global objective function, and (2) the Lipschitz continuity of local individual functions. Under the local regularity assumption, we provide the first optimal first-order decentralized algorithm called multi-step primal-dual (MSPD) and its corresponding optimal convergence rate. A notable aspect of this result is that, for non-smooth functions, while the dominant term of the error is in O(1/t)O(1/\sqrt{t}), the structure of the communication network only impacts a second-order term in O(1/t)O(1/t), where tt is time. In other words, the error due to limits in communication resources decreases at a fast rate even in the case of non-strongly-convex objective functions. Under the global regularity assumption, we provide a simple yet efficient algorithm called distributed randomized smoothing (DRS) based on a local smoothing of the objective function, and show that DRS is within a d1/4d^{1/4} multiplicative factor of the optimal convergence rate, where dd is the underlying dimension.Comment: 17 page

    Stochastic Subgradient Algorithms for Strongly Convex Optimization over Distributed Networks

    Full text link
    We study diffusion and consensus based optimization of a sum of unknown convex objective functions over distributed networks. The only access to these functions is through stochastic gradient oracles, each of which is only available at a different node, and a limited number of gradient oracle calls is allowed at each node. In this framework, we introduce a convex optimization algorithm based on the stochastic gradient descent (SGD) updates. Particularly, we use a carefully designed time-dependent weighted averaging of the SGD iterates, which yields a convergence rate of O(NNT)O\left(\frac{N\sqrt{N}}{T}\right) after TT gradient updates for each node on a network of NN nodes. We then show that after TT gradient oracle calls, the average SGD iterate achieves a mean square deviation (MSD) of O(NT)O\left(\frac{\sqrt{N}}{T}\right). This rate of convergence is optimal as it matches the performance lower bound up to constant terms. Similar to the SGD algorithm, the computational complexity of the proposed algorithm also scales linearly with the dimensionality of the data. Furthermore, the communication load of the proposed method is the same as the communication load of the SGD algorithm. Thus, the proposed algorithm is highly efficient in terms of complexity and communication load. We illustrate the merits of the algorithm with respect to the state-of-art methods over benchmark real life data sets and widely studied network topologies
    corecore