ADMM Training Algorithms for Residual Networks: Convergence, Complexity
  and Parallel Training

Li, Yifei; Xing, Wenxun; Xu, Jintao

ADMM Training Algorithms for Residual Networks: Convergence, Complexity and Parallel Training

Authors: Yifei Li
Wenxun Xing
Jintao Xu
Publication date: 23 October 2023
Publisher

Abstract

We design a series of serial and parallel proximal point (gradient) ADMMs for the fully connected residual networks (FCResNets) training problem by introducing auxiliary variables. Convergence of the proximal point version is proven based on a Kurdyka-Lojasiewicz (KL) property analysis framework, and we can ensure a locally R-linear or sublinear convergence rate depending on the different ranges of the Kurdyka-Lojasiewicz (KL) exponent, in which a necessary auxiliary function is constructed to realize our goal. Moreover, the advantages of the parallel implementation in terms of lower time complexity and less (per-node) memory consumption are analyzed theoretically. To the best of our knowledge, this is the first work analyzing the convergence, convergence rate, time complexity and (per-node) runtime memory requirement of the ADMM applied in the FCResNets training problem theoretically. Experiments are reported to show the high speed, better performance, robustness and potential in the deep network training tasks. Finally, we present the advantage and potential of our parallel training in large-scale problems

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2310.15334

Last time updated on 16/01/2024