3,749 research outputs found

    Graph Oracle Models, Lower Bounds, and Gaps for Parallel Stochastic Optimization

    Full text link
    We suggest a general oracle-based framework that captures different parallel stochastic optimization settings described by a dependency graph, and derive generic lower bounds in terms of this graph. We then use the framework and derive lower bounds for several specific parallel optimization settings, including delayed updates and parallel processing with intermittent communication. We highlight gaps between lower and upper bounds on the oracle complexity, and cases where the "natural" algorithms are not known to be optimal

    Adaptive Federated Minimax Optimization with Lower complexities

    Full text link
    Federated learning is a popular distributed and privacy-preserving machine learning paradigm. Meanwhile, minimax optimization, as an effective hierarchical optimization, is widely applied in machine learning. Recently, some federated optimization methods have been proposed to solve the distributed minimax problems. However, these federated minimax methods still suffer from high gradient and communication complexities. Meanwhile, few algorithm focuses on using adaptive learning rate to accelerate algorithms. To fill this gap, in the paper, we study a class of nonconvex minimax optimization, and propose an efficient adaptive federated minimax optimization algorithm (i.e., AdaFGDA) to solve these distributed minimax problems. Specifically, our AdaFGDA builds on the momentum-based variance reduced and local-SGD techniques, and it can flexibly incorporate various adaptive learning rates by using the unified adaptive matrix. Theoretically, we provide a solid convergence analysis framework for our AdaFGDA algorithm under non-i.i.d. setting. Moreover, we prove our algorithms obtain lower gradient (i.e., stochastic first-order oracle, SFO) complexity of O~(ϵ−3)\tilde{O}(\epsilon^{-3}) with lower communication complexity of O~(ϵ−2)\tilde{O}(\epsilon^{-2}) in finding ϵ\epsilon-stationary point of the nonconvex minimax problems. Experimentally, we conduct some experiments on the deep AUC maximization and robust neural network training tasks to verify efficiency of our algorithms.Comment: Submitted to AISTATS-202

    Byzantine Stochastic Gradient Descent

    Full text link
    This paper studies the problem of distributed stochastic optimization in an adversarial setting where, out of the mm machines which allegedly compute stochastic gradients every iteration, an α\alpha-fraction are Byzantine, and can behave arbitrarily and adversarially. Our main result is a variant of stochastic gradient descent (SGD) which finds ε\varepsilon-approximate minimizers of convex functions in T=O~(1ε2m+α2ε2)T = \tilde{O}\big( \frac{1}{\varepsilon^2 m} + \frac{\alpha^2}{\varepsilon^2} \big) iterations. In contrast, traditional mini-batch SGD needs T=O(1ε2m)T = O\big( \frac{1}{\varepsilon^2 m} \big) iterations, but cannot tolerate Byzantine failures. Further, we provide a lower bound showing that, up to logarithmic factors, our algorithm is information-theoretically optimal both in terms of sampling complexity and time complexity
    • …
    corecore