Federated learning enables cooperative training among massively distributed
clients by sharing their learned local model parameters. However, with
increasing model size, deploying federated learning requires a large
communication bandwidth, which limits its deployment in wireless networks. To
address this bottleneck, we introduce a residual-based federated learning
framework (ResFed), where residuals rather than model parameters are
transmitted in communication networks for training. In particular, we integrate
two pairs of shared predictors for the model prediction in both
server-to-client and client-to-server communication. By employing a common
prediction rule, both locally and globally updated models are always fully
recoverable in clients and the server. We highlight that the residuals only
indicate the quasi-update of a model in a single inter-round, and hence contain
more dense information and have a lower entropy than the model, comparing to
model weights and gradients. Based on this property, we further conduct lossy
compression of the residuals by sparsification and quantization and encode them
for efficient communication. The experimental evaluation shows that our ResFed
needs remarkably less communication costs and achieves better accuracy by
leveraging less sensitive residuals, compared to standard federated learning.
For instance, to train a 4.08 MB CNN model on CIFAR-10 with 10 clients under
non-independent and identically distributed (Non-IID) setting, our approach
achieves a compression ratio over 700X in each communication round with minimum
impact on the accuracy. To reach an accuracy of 70%, it saves around 99% of the
total communication volume from 587.61 Mb to 6.79 Mb in up-streaming and to
4.61 Mb in down-streaming on average for all clients