The ProxSkip algorithm for decentralized and federated learning is gaining
increasing attention due to its proven benefits in accelerating communication
complexity while maintaining robustness against data heterogeneity. However,
existing analyses of ProxSkip are limited to the strongly convex setting and do
not achieve linear speedup, where convergence performance increases linearly
with respect to the number of nodes. So far, questions remain open about how
ProxSkip behaves in the non-convex setting and whether linear speedup is
achievable.
In this paper, we revisit decentralized ProxSkip and address both questions.
We demonstrate that the leading communication complexity of ProxSkip is
O(nϵ2pσ2) for non-convex and
convex settings, and O(nϵpσ2) for
the strongly convex setting, where n represents the number of nodes, p
denotes the probability of communication, σ2 signifies the level of
stochastic noise, and ϵ denotes the desired accuracy level. This
result illustrates that ProxSkip achieves linear speedup and can asymptotically
reduce communication overhead proportional to the probability of communication.
Additionally, for the strongly convex setting, we further prove that ProxSkip
can achieve linear speedup with network-independent stepsizes