14,188 research outputs found

    A unified analysis of stochastic momentum methods for deep learning

    Full text link
    © 2018 International Joint Conferences on Artificial Intelligence. All right reserved. Stochastic momentum methods have been widely adopted in training deep neural networks. However, their theoretical analysis of convergence of the training objective and the generalization error for prediction is still under-explored. This paper aims to bridge the gap between practice and theory by analyzing the stochastic gradient (SG) method, and the stochastic momentum methods including two famous variants, i.e., the stochastic heavy-ball (SHB) method and the stochastic variant of Nesterov's accelerated gradient (SNAG) method. We propose a framework that unifies the three variants. We then derive the convergence rates of the norm of gradient for the non-convex optimization problem, and analyze the generalization performance through the uniform stability approach. Particularly, the convergence analysis of the training objective exhibits that SHB and SNAG have no advantage over SG. However, the stability analysis shows that the momentum term can improve the stability of the learned model and hence improve the generalization performance. These theoretical insights verify the common wisdom and are also corroborated by our empirical analysis on deep learning

    Adaptive Federated Minimax Optimization with Lower complexities

    Full text link
    Federated learning is a popular distributed and privacy-preserving machine learning paradigm. Meanwhile, minimax optimization, as an effective hierarchical optimization, is widely applied in machine learning. Recently, some federated optimization methods have been proposed to solve the distributed minimax problems. However, these federated minimax methods still suffer from high gradient and communication complexities. Meanwhile, few algorithm focuses on using adaptive learning rate to accelerate algorithms. To fill this gap, in the paper, we study a class of nonconvex minimax optimization, and propose an efficient adaptive federated minimax optimization algorithm (i.e., AdaFGDA) to solve these distributed minimax problems. Specifically, our AdaFGDA builds on the momentum-based variance reduced and local-SGD techniques, and it can flexibly incorporate various adaptive learning rates by using the unified adaptive matrix. Theoretically, we provide a solid convergence analysis framework for our AdaFGDA algorithm under non-i.i.d. setting. Moreover, we prove our algorithms obtain lower gradient (i.e., stochastic first-order oracle, SFO) complexity of O~(ϵ−3)\tilde{O}(\epsilon^{-3}) with lower communication complexity of O~(ϵ−2)\tilde{O}(\epsilon^{-2}) in finding ϵ\epsilon-stationary point of the nonconvex minimax problems. Experimentally, we conduct some experiments on the deep AUC maximization and robust neural network training tasks to verify efficiency of our algorithms.Comment: Submitted to AISTATS-202
    • …
    corecore