Decentralized learning enables serverless training of deep neural networks
(DNNs) in a distributed manner on multiple nodes. This allows for the use of
large datasets, as well as the ability to train with a wide variety of data
sources. However, one of the key challenges with decentralized learning is
heterogeneity in the data distribution across the nodes. In this paper, we
propose In-Distribution Knowledge Distillation (IDKD) to address the challenge
of heterogeneous data distribution. The goal of IDKD is to homogenize the data
distribution across the nodes. While such data homogenization can be achieved
by exchanging data among the nodes sacrificing privacy, IDKD achieves the same
objective using a common public dataset across nodes without breaking the
privacy constraint. This public dataset is different from the training dataset
and is used to distill the knowledge from each node and communicate it to its
neighbors through the generated labels. With traditional knowledge
distillation, the generalization of the distilled model is reduced because all
the public dataset samples are used irrespective of their similarity to the
local dataset. Thus, we introduce an Out-of-Distribution (OoD) detector at each
node to label a subset of the public dataset that maps close to the local
training data distribution. Finally, only labels corresponding to these subsets
are exchanged among the nodes and with appropriate label averaging each node is
finetuned on these data subsets along with its local data. Our experiments on
multiple image classification datasets and graph topologies show that the
proposed IDKD scheme is more effective than traditional knowledge distillation
and achieves state-of-the-art generalization performance on heterogeneously
distributed data with minimal communication overhead