In Federated Learning (FL), a number of clients or devices collaborate to
train a model without sharing their data. Models are optimized locally at each
client and further communicated to a central hub for aggregation. While FL is
an appealing decentralized training paradigm, heterogeneity among data from
different clients can cause the local optimization to drift away from the
global objective. In order to estimate and therefore remove this drift,
variance reduction techniques have been incorporated into FL optimization
recently. However, these approaches inaccurately estimate the clients' drift
and ultimately fail to remove it properly. In this work, we propose an adaptive
algorithm that accurately estimates drift across clients. In comparison to
previous works, our approach necessitates less storage and communication
bandwidth, as well as lower compute costs. Additionally, our proposed
methodology induces stability by constraining the norm of estimates for client
drift, making it more practical for large scale FL. Experimental findings
demonstrate that the proposed algorithm converges significantly faster and
achieves higher accuracy than the baselines across various FL benchmarks.Comment: AdaBes