Knowledge Distillation for Federated Learning: a Practical Guide

Bellavista, Paolo; Mora, Alessio; Rish, Irina; Tenison, Irene

Knowledge Distillation for Federated Learning: a Practical Guide

Authors: Paolo Bellavista
Alessio Mora
Irina Rish
Irene Tenison
Publication date: 9 November 2022
Publisher

Abstract

Federated Learning (FL) enables the training of Deep Learning models without centrally collecting possibly sensitive raw data. This paves the way for stronger privacy guarantees when building predictive models. The most used algorithms for FL are parameter-averaging based schemes (e.g., Federated Averaging) that, however, have well known limits: (i) Clients must implement the same model architecture; (ii) Transmitting model weights and model updates implies high communication cost, which scales up with the number of model parameters; (iii) In presence of non-IID data distributions, parameter-averaging aggregation schemes perform poorly due to client model drifts. Federated adaptations of regular Knowledge Distillation (KD) can solve and/or mitigate the weaknesses of parameter-averaging FL algorithms while possibly introducing other trade-offs. In this article, we provide a review of KD-based algorithms tailored for specific FL issues.Comment: 9 pages, 1 figur

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2211.04742

Last time updated on 12/12/2022