Federated Learning (FL) enables the training of Deep Learning models without
centrally collecting possibly sensitive raw data. This paves the way for
stronger privacy guarantees when building predictive models. The most used
algorithms for FL are parameter-averaging based schemes (e.g., Federated
Averaging) that, however, have well known limits: (i) Clients must implement
the same model architecture; (ii) Transmitting model weights and model updates
implies high communication cost, which scales up with the number of model
parameters; (iii) In presence of non-IID data distributions,
parameter-averaging aggregation schemes perform poorly due to client model
drifts. Federated adaptations of regular Knowledge Distillation (KD) can solve
and/or mitigate the weaknesses of parameter-averaging FL algorithms while
possibly introducing other trade-offs. In this article, we provide a review of
KD-based algorithms tailored for specific FL issues.Comment: 9 pages, 1 figur