50 research outputs found
SASG: Sparsification with Adaptive Stochastic Gradients for Communication-efficient Distributed Learning
Stochastic optimization algorithms implemented on distributed computing
architectures are increasingly used to tackle large-scale machine learning
applications. A key bottleneck in such distributed systems is the communication
overhead for exchanging information such as stochastic gradients between
different workers. Sparse communication with memory and the adaptive
aggregation methodology are two successful frameworks among the various
techniques proposed to address this issue. In this paper, we creatively exploit
the advantages of Sparse communication and Adaptive aggregated Stochastic
Gradients to design a communication-efficient distributed algorithm named SASG.
Specifically, we first determine the workers that need to communicate based on
the adaptive aggregation rule and then sparse this transmitted information.
Therefore, our algorithm reduces both the overhead of communication rounds and
the number of communication bits in the distributed system. We define an
auxiliary sequence and give convergence results of the algorithm with the help
of Lyapunov function analysis. Experiments on training deep neural networks
show that our algorithm can significantly reduce the number of communication
rounds and bits compared to the previous methods, with little or no impact on
training and testing accuracy.Comment: 12 pages, 5 figure
Training Faster with Compressed Gradient
Although the distributed machine learning methods show the potential for the
speed-up of training large deep neural networks, the communication cost has
been the notorious bottleneck to constrain the performance. To address this
challenge, the gradient compression based communication-efficient distributed
learning methods were designed to reduce the communication cost, and more
recently the local error feedback was incorporated to compensate for the
performance loss. However, in this paper, we will show the "gradient mismatch"
problem of the local error feedback in centralized distributed training and
this issue can lead to degraded performance compared with full-precision
training. To solve this critical problem, we propose two novel techniques: 1)
step ahead; 2) error averaging. Both our theoretical and empirical results show
that our new methods can alleviate the "gradient mismatch" problem. Experiments
show that we can even train \textbf{faster with compressed gradient} than
full-precision training \textbf{regarding training epochs}
FedDIP: Federated Learning with Extreme Dynamic Pruning and Incremental Regularization
Federated Learning (FL) has been successfully adopted for distributed
training and inference of large-scale Deep Neural Networks (DNNs). However,
DNNs are characterized by an extremely large number of parameters, thus,
yielding significant challenges in exchanging these parameters among
distributed nodes and managing the memory. Although recent DNN compression
methods (e.g., sparsification, pruning) tackle such challenges, they do not
holistically consider an adaptively controlled reduction of parameter exchange
while maintaining high accuracy levels. We, therefore, contribute with a novel
FL framework (coined FedDIP), which combines (i) dynamic model pruning with
error feedback to eliminate redundant information exchange, which contributes
to significant performance improvement, with (ii) incremental regularization
that can achieve \textit{extreme} sparsity of models. We provide convergence
analysis of FedDIP and report on a comprehensive performance and comparative
assessment against state-of-the-art methods using benchmark data sets and DNN
models. Our results showcase that FedDIP not only controls the model sparsity
but efficiently achieves similar or better performance compared to other model
pruning methods adopting incremental regularization during distributed model
training. The code is available at: https://github.com/EricLoong/feddip.Comment: Accepted for publication at ICDM 2023 (Full version in arxiv). The
associated code is available at https://github.com/EricLoong/feddi