5 research outputs found
Tackling the Unlimited Staleness in Federated Learning with Intertwined Data and Device Heterogeneities
The efficiency of Federated Learning (FL) is often affected by both data and
device heterogeneities. Data heterogeneity is defined as the heterogeneity of
data distributions on different clients. Device heterogeneity is defined as the
clients' variant latencies in uploading their local model updates due to
heterogeneous conditions of local hardware resources, and causes the problem of
staleness when being addressed by asynchronous FL. Traditional schemes of
tackling the impact of staleness consider data and device heterogeneities as
two separate and independent aspects in FL, but this assumption is unrealistic
in many practical FL scenarios where data and device heterogeneities are
intertwined. In these cases, traditional schemes of weighted aggregation in FL
have been proved to be ineffective, and a better approach is to convert a stale
model update into a non-stale one. In this paper, we present a new FL framework
that leverages the gradient inversion technique for such conversion, hence
efficiently tackling unlimited staleness in clients' model updates. Our basic
idea is to use gradient inversion to get estimations of clients' local training
data from their uploaded stale model updates, and use these estimations to
compute non-stale client model updates. In this way, we address the problem of
possible data quality drop when using gradient inversion, while still
preserving the clients' local data privacy. We compared our approach with the
existing FL strategies on mainstream datasets and models, and experiment
results demonstrate that when tackling unlimited staleness, our approach can
significantly improve the trained model accuracy by up to 20% and speed up the
FL training progress by up to 35%.Comment: 14 page
Out of Thin Air: Exploring Data-Free Adversarial Robustness Distillation
Adversarial Robustness Distillation (ARD) is a promising task to solve the
issue of limited adversarial robustness of small capacity models while
optimizing the expensive computational costs of Adversarial Training (AT).
Despite the good robust performance, the existing ARD methods are still
impractical to deploy in natural high-security scenes due to these methods rely
entirely on original or publicly available data with a similar distribution. In
fact, these data are almost always private, specific, and distinctive for
scenes that require high robustness. To tackle these issues, we propose a
challenging but significant task called Data-Free Adversarial Robustness
Distillation (DFARD), which aims to train small, easily deployable, robust
models without relying on data. We demonstrate that the challenge lies in the
lower upper bound of knowledge transfer information, making it crucial to
mining and transferring knowledge more efficiently. Inspired by human
education, we design a plug-and-play Interactive Temperature Adjustment (ITA)
strategy to improve the efficiency of knowledge transfer and propose an
Adaptive Generator Balance (AGB) module to retain more data information. Our
method uses adaptive hyperparameters to avoid a large number of parameter
tuning, which significantly outperforms the combination of existing techniques.
Meanwhile, our method achieves stable and reliable performance on multiple
benchmarks.Comment: Accepted by AAAI2
Industrial Cyber-Physical Systems-based Cloud IoT Edge for Federated Heterogeneous Distillation.
Deep convoloutional networks have achieved remarkable performance in a wide range of vision-based tasks in modern internet of things (IoT). Due to privacy issue and transmission cost, mannually annotated data for training the deep learning models are usually stored in different sites with fog and edge devices of various computing capacity. It has been proved that knowledge distillation technique can effectively compress well trained neural networks into light-weight models suitable to particular devices. However, different fog and edge devices may perform different sub-tasks, and simplely performing model compression on powerful cloud servers failed to make use of the private data sotred at different sites. To overcome these obstacles, we propose an novel knowledge distillation method for object recognition in real-world IoT sencarios. Our method enables flexible bidirectional online training of heterogeneous models distributed datasets with a new ``brain storming'' mechanism and optimizable temperature parameters. In our comparison experiments, this heterogeneous brain storming method were compared to multiple state-of-the-art single-model compression methods, as well as the newest heterogeneous and homogeneous multi-teacher knowledge distillation methods. Our methods outperformed the state of the arts in both conventional and heterogeneous tasks. Further analysis of the ablation expxeriment results shows that introducing the trainable temperature parameters into the conventional knowledge distillation loss can effectively ease the learning process of student networks in different methods. To the best of our knowledge, this is the IoT-oriented method that allows asynchronous bidirectional heterogeneous knowledge distillation in deep networks