14,806 research outputs found

    Efficient Distributed Online Prediction and Stochastic Optimization with Approximate Distributed Averaging

    Full text link
    We study distributed methods for online prediction and stochastic optimization. Our approach is iterative: in each round nodes first perform local computations and then communicate in order to aggregate information and synchronize their decision variables. Synchronization is accomplished through the use of a distributed averaging protocol. When an exact distributed averaging protocol is used, it is known that the optimal regret bound of O(m)\mathcal{O}(\sqrt{m}) can be achieved using the distributed mini-batch algorithm of Dekel et al. (2012), where mm is the total number of samples processed across the network. We focus on methods using approximate distributed averaging protocols and show that the optimal regret bound can also be achieved in this setting. In particular, we propose a gossip-based optimization method which achieves the optimal regret bound. The amount of communication required depends on the network topology through the second largest eigenvalue of the transition matrix of a random walk on the network. In the setting of stochastic optimization, the proposed gossip-based approach achieves nearly-linear scaling: the optimization error is guaranteed to be no more than ϵ\epsilon after O(1nϵ2)\mathcal{O}(\frac{1}{n \epsilon^2}) rounds, each of which involves O(logn)\mathcal{O}(\log n) gossip iterations, when nodes communicate over a well-connected graph. This scaling law is also observed in numerical experiments on a cluster.Comment: 30 pages, 2 figure

    Communication Efficient Distributed Optimization using an Approximate Newton-type Method

    Full text link
    We present a novel Newton-type method for distributed optimization, which is particularly well suited for stochastic optimization and learning problems. For quadratic objectives, the method enjoys a linear rate of convergence which provably \emph{improves} with the data size, requiring an essentially constant number of iterations under reasonable assumptions. We provide theoretical and empirical evidence of the advantages of our method compared to other approaches, such as one-shot parameter averaging and ADMM
    corecore