1,435 research outputs found

    Near-Optimal Decentralized Momentum Method for Nonconvex-PL Minimax Problems

    Full text link
    Minimax optimization plays an important role in many machine learning tasks such as generative adversarial networks (GANs) and adversarial training. Although recently a wide variety of optimization methods have been proposed to solve the minimax problems, most of them ignore the distributed setting where the data is distributed on multiple workers. Meanwhile, the existing decentralized minimax optimization methods rely on the strictly assumptions such as (strongly) concavity and variational inequality conditions. In the paper, thus, we propose an efficient decentralized momentum-based gradient descent ascent (DM-GDA) method for the distributed nonconvex-PL minimax optimization, which is nonconvex in primal variable and is nonconcave in dual variable and satisfies the Polyak-Lojasiewicz (PL) condition. In particular, our DM-GDA method simultaneously uses the momentum-based techniques to update variables and estimate the stochastic gradients. Moreover, we provide a solid convergence analysis for our DM-GDA method, and prove that it obtains a near-optimal gradient complexity of O(ϵ−3)O(\epsilon^{-3}) for finding an ϵ\epsilon-stationary solution of the nonconvex-PL stochastic minimax problems, which reaches the lower bound of nonconvex stochastic optimization. To the best of our knowledge, we first study the decentralized algorithm for Nonconvex-PL stochastic minimax optimization over a network.Comment: 31 page

    Decentralized projected Riemannian gradient method for smooth optimization on compact submanifolds

    Full text link
    We consider the problem of decentralized nonconvex optimization over a compact submanifold, where each local agent's objective function defined by the local dataset is smooth. Leveraging the powerful tool of proximal smoothness, we establish local linear convergence of the projected gradient descent method with unit step size for solving the consensus problem over the compact manifold. This serves as the basis for analyzing decentralized algorithms on manifolds. Then, we propose two decentralized methods, namely the decentralized projected Riemannian gradient descent (DPRGD) and the decentralized projected Riemannian gradient tracking (DPRGT) methods. We establish their convergence rates of O(1/K)\mathcal{O}(1/\sqrt{K}) and O(1/K)\mathcal{O}(1/K), respectively, to reach a stationary point. To the best of our knowledge, DPRGT is the first decentralized algorithm to achieve exact convergence for solving decentralized optimization over a compact manifold. The key ingredients in the proof are the Lipschitz-type inequalities of the projection operator on the compact manifold and smooth functions on the manifold, which could be of independent interest. Finally, we demonstrate the effectiveness of our proposed methods compared to state-of-the-art ones through numerical experiments on eigenvalue problems and low-rank matrix completion.Comment: 32 page

    Robust and Communication-Efficient Collaborative Learning

    Get PDF
    We consider a decentralized learning problem, where a set of computing nodes aim at solving a non-convex optimization problem collaboratively. It is well-known that decentralized optimization schemes face two major system bottlenecks: stragglers' delay and communication overhead. In this paper, we tackle these bottlenecks by proposing a novel decentralized and gradient-based optimization algorithm named as QuanTimed-DSGD. Our algorithm stands on two main ideas: (i) we impose a deadline on the local gradient computations of each node at each iteration of the algorithm, and (ii) the nodes exchange quantized versions of their local models. The first idea robustifies to straggling nodes and the second alleviates communication efficiency. The key technical contribution of our work is to prove that with non-vanishing noises for quantization and stochastic gradients, the proposed method exactly converges to the global optimal for convex loss functions, and finds a first-order stationary point in non-convex scenarios. Our numerical evaluations of the QuanTimed-DSGD on training benchmark datasets, MNIST and CIFAR-10, demonstrate speedups of up to 3x in run-time, compared to state-of-the-art decentralized optimization methods
    • …
    corecore