3 research outputs found

    A Robust Gradient Tracking Method for Distributed Optimization over Directed Networks

    Full text link
    In this paper, we consider the problem of distributed consensus optimization over multi-agent networks with directed network topology. Assuming each agent has a local cost function that is smooth and strongly convex, the global objective is to minimize the average of all the local cost functions. To solve the problem, we introduce a robust gradient tracking method (R-Push-Pull) adapted from the recently proposed Push-Pull/AB algorithm. R-Push-Pull inherits the advantages of Push-Pull and enjoys linear convergence to the optimal solution with exact communication. Under noisy information exchange, R-Push-Pull is more robust than the existing gradient tracking based algorithms; the solutions obtained by each agent reach a neighborhood of the optimum in expectation exponentially fast under a constant stepsize policy. We provide a numerical example that demonstrate the effectiveness of R-Push-Pull

    On the Benefits of Multiple Gossip Steps in Communication-Constrained Decentralized Optimization

    Full text link
    In decentralized optimization, it is common algorithmic practice to have nodes interleave (local) gradient descent iterations with gossip (i.e. averaging over the network) steps. Motivated by the training of large-scale machine learning models, it is also increasingly common to require that messages be {\em lossy compressed} versions of the local parameters. In this paper, we show that, in such compressed decentralized optimization settings, there are benefits to having {\em multiple} gossip steps between subsequent gradient iterations, even when the cost of doing so is appropriately accounted for e.g. by means of reducing the precision of compressed information. In particular, we show that having O(log1ϵ)O(\log\frac{1}{\epsilon}) gradient iterations {with constant step size} - and O(log1ϵ)O(\log\frac{1}{\epsilon}) gossip steps between every pair of these iterations - enables convergence to within ϵ\epsilon of the optimal value for smooth non-convex objectives satisfying Polyak-\L{}ojasiewicz condition. This result also holds for smooth strongly convex objectives. To our knowledge, this is the first work that derives convergence results for nonconvex optimization under arbitrary communication compression

    A Primal-Dual SGD Algorithm for Distributed Nonconvex Optimization

    Full text link
    The distributed nonconvex optimization problem of minimizing a global cost function formed by a sum of nn local cost functions by using local information exchange is considered. This problem is an important component of many machine learning techniques with data parallelism, such as deep learning and federated learning. We propose a distributed primal-dual stochastic gradient descent (SGD) algorithm, suitable for arbitrarily connected communication networks and any smooth (possibly nonconvex) cost functions. We show that the proposed algorithm achieves the linear speedup convergence rate O(1/nT)\mathcal{O}(1/\sqrt{nT}) for general nonconvex cost functions and {\color{blue} the linear speedup convergence rate O(1/(nT))\mathcal{O}(1/(nT))} when the global cost function satisfies the Polyak-{\L}ojasiewicz (P-L) condition, where TT is the total number of iterations. We also show that the output of the proposed algorithm with fixed parameters linearly converges to a neighborhood of a global optimum. We demonstrate through numerical experiments the efficiency of our algorithm in comparison with the baseline centralized SGD and recently proposed distributed SGD algorithms.Comment: arXiv admin note: text overlap with arXiv:1912.1211
    corecore