Federated Learning (FL) is a machine learning paradigm that enables clients
to jointly train a global model by aggregating the locally trained models
without sharing any local training data. In practice, there can often be
substantial heterogeneity (e.g., class imbalance) across the local data
distributions observed by each of these clients. Under such non-iid data
distributions across clients, FL suffers from the 'client-drift' problem where
every client converges to its own local optimum. This results in slower
convergence and poor performance of the aggregated model. To address this
limitation, we propose a novel regularization technique based on adaptive
self-distillation (ASD) for training models on the client side. Our
regularization scheme adaptively adjusts to the client's training data based
on: (1) the closeness of the local model's predictions with that of the global
model and (2) the client's label distribution. The proposed regularization can
be easily integrated atop existing, state-of-the-art FL algorithms leading to a
further boost in the performance of these off-the-shelf methods. We demonstrate
the efficacy of our proposed FL approach through extensive experiments on
multiple real-world benchmarks (including datasets with common corruptions and
perturbations) and show substantial gains in performance over the
state-of-the-art methods