We design a series of serial and parallel proximal point (gradient) ADMMs for
the fully connected residual networks (FCResNets) training problem by
introducing auxiliary variables. Convergence of the proximal point version is
proven based on a Kurdyka-Lojasiewicz (KL) property analysis framework, and we
can ensure a locally R-linear or sublinear convergence rate depending on the
different ranges of the Kurdyka-Lojasiewicz (KL) exponent, in which a necessary
auxiliary function is constructed to realize our goal. Moreover, the advantages
of the parallel implementation in terms of lower time complexity and less
(per-node) memory consumption are analyzed theoretically. To the best of our
knowledge, this is the first work analyzing the convergence, convergence rate,
time complexity and (per-node) runtime memory requirement of the ADMM applied
in the FCResNets training problem theoretically. Experiments are reported to
show the high speed, better performance, robustness and potential in the deep
network training tasks. Finally, we present the advantage and potential of our
parallel training in large-scale problems