82 research outputs found
On-device training of machine learning models on microcontrollers with a look at federated learning
Recent progress in machine learning frameworks makes it now possible to run an inference with sophisticated machine learning models on tiny microcontrollers. Model training, however, is typically done separately on powerful computers. There, the training process has abundant CPU and memory resources to process the stored datasets. In this work, we explore a different approach: training the model directly on the microcontroller. We implement this approach for a keyword spotting task. Then, we extend the training process using federated learning among microcontrollers. Our experiments with model training show an overall trend of decreasing loss with the increase of training epochs.This work was partially funded by the Spanish Government under contracts PID2019-106774RB-C21, PCI2019-111850-2 (DiPET CHIST-ERA), PCI2019-111851-2 (LeadingEdge CHIST-ERA), and the Generalitat de Catalunya as Consolidated Research Group 2017- SGR-990.Peer ReviewedPostprint (author's final draft
On-device training of machine learning models on microcontrollers with federated learning
Recent progress in machine learning frameworks has made it possible to now perform inference with models using cheap, tiny microcontrollers. Training of machine learning models for these tiny devices, however, is typically done separately on powerful computers. This way, the training process has abundant CPU and memory resources to process large stored datasets. In this work, we explore a different approach: training the machine learning model directly on the microcontroller and extending the training process with federated learning. We implement this approach for a keyword spotting task. We conduct experiments with real devices to characterize the learning behavior and resource consumption for different hyperparameters and federated learning configurations. We observed that in the case of training locally with fewer data, more frequent federated learning rounds more quickly reduced the training loss but involved a cost of higher bandwidth usage and longer training time. Our results indicate that, depending on the specific application, there is a need to determine the trade-off between the requirements and the resource usage of the system.This work has received funding from the Spanish Government under contracts PID2019-106774RB-C21, PCI2019-111851-2 (LeadingEdge CHIST-ERA), PCI2019-111850-2 (DiPET CHIST-ERA).Peer ReviewedPostprint (published version
Federated Neural Architecture Search
To preserve user privacy while enabling mobile intelligence, techniques have
been proposed to train deep neural networks on decentralized data. However,
training over decentralized data makes the design of neural architecture quite
difficult as it already was. Such difficulty is further amplified when
designing and deploying different neural architectures for heterogeneous mobile
platforms. In this work, we propose an automatic neural architecture search
into the decentralized training, as a new DNN training paradigm called
Federated Neural Architecture Search, namely federated NAS. To deal with the
primary challenge of limited on-client computational and communication
resources, we present FedNAS, a highly optimized framework for efficient
federated NAS. FedNAS fully exploits the key opportunity of insufficient model
candidate re-training during the architecture search process, and incorporates
three key optimizations: parallel candidates training on partial clients, early
dropping candidates with inferior performance, and dynamic round numbers.
Tested on large-scale datasets and typical CNN architectures, FedNAS achieves
comparable model accuracy as state-of-the-art NAS algorithm that trains models
with centralized data, and also reduces the client cost by up to two orders of
magnitude compared to a straightforward design of federated NAS
Federated learning on embedded devices
TinyML ha guanyat molta popularitat en aquests els últims anys, portant ML a dispositius amb poca memòria, capacitat de computació i ús d'energia. Entrenar models en ordinadors potents amb grans datasets i exportar el model comprimit resultant per a fer-lo servir només per a inferència als microcontroladors ha estat estudiat extensivament. Però aquest mètode no permet que el dispositiu continuï aprenent a partir de noves dades. A una era on la privacitat de les dades és essencial, guardar i administrar els datasets usats per a entrenar aquests models pot ser un problema. Moure l'entrenament de la xarxa neuronal al dispositiu pot eliminar la necessitat de guardar i transmetre dades sensitives, ja que aquestes dades seran capturades, emprades per entrenar el model i eliminades després al dispositiu. Fer servir FL permet diversos dispositius entrenar i compartir els seus models sense la necessitat de transmetre les dades recol·lectades. En aquest treball explorem com TinyML amb entrenament al mateix dispositiu en conjunció amb FL pot ser usat, les dificultats que planteja i possibles solucions.TinyML has gained a lot of popularity in recent years, bringing ML to devices constrained by memory, computation capacity and power. Training models on powerful computers with big datasets and exporting the compressed resulting model to be used for only inference on microcontrollers has been studied extensively. But this method does not allow for an edge device to keep on learning from new data. In an era where data privacy is essential, storing and managing the datasets used to train these models can be a problem. Moving the training of the NN to the edge device can eradicate the need of storing or transmitting any sensitive data, since this data will be captured, used once to train the model and discarded afterwards. Also, using FL enables multiple devices to train and share their models without the need of transmitting any collected data. In this work, we explore how can TinyML can be used with on-device training in combination with FL, what issues does it raise and possible solutions
ZeroFL: Efficient On-Device Training for Federated Learning with Local Sparsity
When the available hardware cannot meet the memory and compute requirements
to efficiently train high performing machine learning models, a compromise in
either the training quality or the model complexity is needed. In Federated
Learning (FL), nodes are orders of magnitude more constrained than traditional
server-grade hardware and are often battery powered, severely limiting the
sophistication of models that can be trained under this paradigm. While most
research has focused on designing better aggregation strategies to improve
convergence rates and in alleviating the communication costs of FL, fewer
efforts have been devoted to accelerating on-device training. Such stage, which
repeats hundreds of times (i.e. every round) and can involve thousands of
devices, accounts for the majority of the time required to train federated
models and, the totality of the energy consumption at the client side. In this
work, we present the first study on the unique aspects that arise when
introducing sparsity at training time in FL workloads. We then propose ZeroFL,
a framework that relies on highly sparse operations to accelerate on-device
training. Models trained with ZeroFL and 95% sparsity achieve up to 2.3% higher
accuracy compared to competitive baselines obtained from adapting a
state-of-the-art sparse training framework to the FL setting.Comment: Published as a conference paper at ICLR 202
GPT-FL: Generative Pre-trained Model-Assisted Federated Learning
In this work, we propose GPT-FL, a generative pre-trained model-assisted
federated learning (FL) framework. At its core, GPT-FL leverages generative
pre-trained models to generate diversified synthetic data. These generated data
are used to train a downstream model on the server, which is then fine-tuned
with private client data under the standard FL framework. We show that GPT-FL
consistently outperforms state-of-the-art FL methods in terms of model test
accuracy, communication efficiency, and client sampling efficiency. Through
comprehensive ablation analysis, we discover that the downstream model
generated by synthetic data plays a crucial role in controlling the direction
of gradient diversity during FL training, which enhances convergence speed and
contributes to the notable accuracy boost observed with GPT-FL. Also,
regardless of whether the target data falls within or outside the domain of the
pre-trained generative model, GPT-FL consistently achieves significant
performance gains, surpassing the results obtained by models trained solely
with FL or synthetic data
A first look into the carbon footprint of federated learning
Despite impressive results, deep learning-based technologies also raise
severe privacy and environmental concerns induced by the training procedure
often conducted in datacenters. In response, alternatives to centralized
training such as Federated Learning (FL) have emerged. Perhaps unexpectedly,
FL, in particular, is starting to be deployed at a global scale by companies
that must adhere to new legal demands and policies originating from governments
and civil society for privacy protection. However, the potential environmental
impact related to FL remains unclear and unexplored. This paper offers the
first-ever systematic study of the carbon footprint of FL. First, we propose a
rigorous model to quantify the carbon footprint, hence facilitating the
investigation of the relationship between FL design and carbon emissions. Then,
we compare the carbon footprint of FL to traditional centralized learning. Our
findings show that FL, despite being slower to converge in some cases, may
result in a comparatively greener impact than a centralized equivalent setup.
We performed extensive experiments across different types of datasets,
settings, and various deep learning models with FL. Finally, we highlight and
connect the reported results to the future challenges and trends in FL to
reduce its environmental impact, including algorithms efficiency, hardware
capabilities, and stronger industry transparency.Comment: arXiv admin note: substantial text overlap with arXiv:2010.0653
- …