2 research outputs found
Intermittent Pulling with Local Compensation for Communication-Efficient Federated Learning
Federated Learning is a powerful machine learning paradigm to cooperatively
train a global model with highly distributed data. A major bottleneck on the
performance of distributed Stochastic Gradient Descent (SGD) algorithm for
large-scale Federated Learning is the communication overhead on pushing local
gradients and pulling global model. In this paper, to reduce the communication
complexity of Federated Learning, a novel approach named Pulling Reduction with
Local Compensation (PRLC) is proposed. Specifically, each training node
intermittently pulls the global model from the server in SGD iterations,
resulting in that it is sometimes unsynchronized with the server. In such a
case, it will use its local update to compensate the gap between the local
model and the global model. Our rigorous theoretical analysis of PRLC achieves
two important findings. First, we prove that the convergence rate of PRLC
preserves the same order as the classical synchronous SGD for both
strongly-convex and non-convex cases with good scalability due to the linear
speedup with respect to the number of training nodes. Second, we show that PRLC
admits lower pulling frequency than the existing pulling reduction method
without local compensation. We also conduct extensive experiments on various
machine learning models to validate our theoretical results. Experimental
results show that our approach achieves a significant pulling reduction over
the state-of-the-art methods, e.g., PRLC requiring only half of the pulling
operations of LAG
On the Convergence of Quantized Parallel Restarted SGD for Central Server Free Distributed Training
Communication is a crucial phase in the context of distributed training.
Because parameter server (PS) frequently experiences network congestion, recent
studies have found that training paradigms without a centralized server
outperform the traditional server-based paradigms in terms of communication
efficiency. However, with the increasing growth of model sizes, these
server-free paradigms are also confronted with substantial communication
overhead that seriously deteriorates the performance of distributed training.
In this paper, we focus on communication efficiency of two serverless
paradigms, i.e., Ring All-Reduce (RAR) and gossip, by proposing the Quantized
Parallel Restarted Stochastic Gradient Descent (QPRSGD), an algorithm that
allows multiple local SGD updates before a global synchronization, in synergy
with the quantization to significantly reduce the communication overhead. We
establish the bound of accumulative errors according to the synchronization
mode and the network topology, which is essential to ensure the convergence
property. Under both aggregation paradigms, the algorithm achieves the linear
speedup property with respect to the number of local updates as well as the
number of workers. Remarkably, the proposed algorithm achieves a convergence
rate under the gossip paradigm and outperforms all existing
compression methods, where is the times of global synchronizations, and
is the number of local updates, while is the number of nodes. An empirical
study on various machine learning models demonstrates that the communication
overhead is reduced by 90\%, and the convergence speed is boosted by up to 18.6
times, in a low bandwidth network, in comparison with Parallel SGD