Most cross-device federated learning (FL) studies focus on the
model-homogeneous setting where the global server model and local client models
are identical. However, such constraint not only excludes low-end clients who
would otherwise make unique contributions to model training but also restrains
clients from training large models due to on-device resource bottlenecks. In
this work, we propose FedRolex, a partial training (PT)-based approach that
enables model-heterogeneous FL and can train a global server model larger than
the largest client model. At its core, FedRolex employs a rolling sub-model
extraction scheme that allows different parts of the global server model to be
evenly trained, which mitigates the client drift induced by the inconsistency
between individual client models and server model architectures. We show that
FedRolex outperforms state-of-the-art PT-based model-heterogeneous FL methods
(e.g. Federated Dropout) and reduces the gap between model-heterogeneous and
model-homogeneous FL, especially under the large-model large-dataset regime. In
addition, we provide theoretical statistical analysis on its advantage over
Federated Dropout and evaluate FedRolex on an emulated real-world device
distribution to show that FedRolex can enhance the inclusiveness of FL and
boost the performance of low-end devices that would otherwise not benefit from
FL. Our code is available at: https://github.com/AIoT-MLSys-Lab/FedRolexComment: 20 pages, 7 Figures, Published in 36th Conference on Neural
Information Processing And System