82 research outputs found

    On-device training of machine learning models on microcontrollers with a look at federated learning

    Get PDF
    Recent progress in machine learning frameworks makes it now possible to run an inference with sophisticated machine learning models on tiny microcontrollers. Model training, however, is typically done separately on powerful computers. There, the training process has abundant CPU and memory resources to process the stored datasets. In this work, we explore a different approach: training the model directly on the microcontroller. We implement this approach for a keyword spotting task. Then, we extend the training process using federated learning among microcontrollers. Our experiments with model training show an overall trend of decreasing loss with the increase of training epochs.This work was partially funded by the Spanish Government under contracts PID2019-106774RB-C21, PCI2019-111850-2 (DiPET CHIST-ERA), PCI2019-111851-2 (LeadingEdge CHIST-ERA), and the Generalitat de Catalunya as Consolidated Research Group 2017- SGR-990.Peer ReviewedPostprint (author's final draft

    On-device training of machine learning models on microcontrollers with federated learning

    Get PDF
    Recent progress in machine learning frameworks has made it possible to now perform inference with models using cheap, tiny microcontrollers. Training of machine learning models for these tiny devices, however, is typically done separately on powerful computers. This way, the training process has abundant CPU and memory resources to process large stored datasets. In this work, we explore a different approach: training the machine learning model directly on the microcontroller and extending the training process with federated learning. We implement this approach for a keyword spotting task. We conduct experiments with real devices to characterize the learning behavior and resource consumption for different hyperparameters and federated learning configurations. We observed that in the case of training locally with fewer data, more frequent federated learning rounds more quickly reduced the training loss but involved a cost of higher bandwidth usage and longer training time. Our results indicate that, depending on the specific application, there is a need to determine the trade-off between the requirements and the resource usage of the system.This work has received funding from the Spanish Government under contracts PID2019-106774RB-C21, PCI2019-111851-2 (LeadingEdge CHIST-ERA), PCI2019-111850-2 (DiPET CHIST-ERA).Peer ReviewedPostprint (published version

    Federated Neural Architecture Search

    Full text link
    To preserve user privacy while enabling mobile intelligence, techniques have been proposed to train deep neural networks on decentralized data. However, training over decentralized data makes the design of neural architecture quite difficult as it already was. Such difficulty is further amplified when designing and deploying different neural architectures for heterogeneous mobile platforms. In this work, we propose an automatic neural architecture search into the decentralized training, as a new DNN training paradigm called Federated Neural Architecture Search, namely federated NAS. To deal with the primary challenge of limited on-client computational and communication resources, we present FedNAS, a highly optimized framework for efficient federated NAS. FedNAS fully exploits the key opportunity of insufficient model candidate re-training during the architecture search process, and incorporates three key optimizations: parallel candidates training on partial clients, early dropping candidates with inferior performance, and dynamic round numbers. Tested on large-scale datasets and typical CNN architectures, FedNAS achieves comparable model accuracy as state-of-the-art NAS algorithm that trains models with centralized data, and also reduces the client cost by up to two orders of magnitude compared to a straightforward design of federated NAS

    Federated learning on embedded devices

    Get PDF
    TinyML ha guanyat molta popularitat en aquests els últims anys, portant ML a dispositius amb poca memòria, capacitat de computació i ús d'energia. Entrenar models en ordinadors potents amb grans datasets i exportar el model comprimit resultant per a fer-lo servir només per a inferència als microcontroladors ha estat estudiat extensivament. Però aquest mètode no permet que el dispositiu continuï aprenent a partir de noves dades. A una era on la privacitat de les dades és essencial, guardar i administrar els datasets usats per a entrenar aquests models pot ser un problema. Moure l'entrenament de la xarxa neuronal al dispositiu pot eliminar la necessitat de guardar i transmetre dades sensitives, ja que aquestes dades seran capturades, emprades per entrenar el model i eliminades després al dispositiu. Fer servir FL permet diversos dispositius entrenar i compartir els seus models sense la necessitat de transmetre les dades recol·lectades. En aquest treball explorem com TinyML amb entrenament al mateix dispositiu en conjunció amb FL pot ser usat, les dificultats que planteja i possibles solucions.TinyML has gained a lot of popularity in recent years, bringing ML to devices constrained by memory, computation capacity and power. Training models on powerful computers with big datasets and exporting the compressed resulting model to be used for only inference on microcontrollers has been studied extensively. But this method does not allow for an edge device to keep on learning from new data. In an era where data privacy is essential, storing and managing the datasets used to train these models can be a problem. Moving the training of the NN to the edge device can eradicate the need of storing or transmitting any sensitive data, since this data will be captured, used once to train the model and discarded afterwards. Also, using FL enables multiple devices to train and share their models without the need of transmitting any collected data. In this work, we explore how can TinyML can be used with on-device training in combination with FL, what issues does it raise and possible solutions

    ZeroFL: Efficient On-Device Training for Federated Learning with Local Sparsity

    Full text link
    When the available hardware cannot meet the memory and compute requirements to efficiently train high performing machine learning models, a compromise in either the training quality or the model complexity is needed. In Federated Learning (FL), nodes are orders of magnitude more constrained than traditional server-grade hardware and are often battery powered, severely limiting the sophistication of models that can be trained under this paradigm. While most research has focused on designing better aggregation strategies to improve convergence rates and in alleviating the communication costs of FL, fewer efforts have been devoted to accelerating on-device training. Such stage, which repeats hundreds of times (i.e. every round) and can involve thousands of devices, accounts for the majority of the time required to train federated models and, the totality of the energy consumption at the client side. In this work, we present the first study on the unique aspects that arise when introducing sparsity at training time in FL workloads. We then propose ZeroFL, a framework that relies on highly sparse operations to accelerate on-device training. Models trained with ZeroFL and 95% sparsity achieve up to 2.3% higher accuracy compared to competitive baselines obtained from adapting a state-of-the-art sparse training framework to the FL setting.Comment: Published as a conference paper at ICLR 202

    GPT-FL: Generative Pre-trained Model-Assisted Federated Learning

    Full text link
    In this work, we propose GPT-FL, a generative pre-trained model-assisted federated learning (FL) framework. At its core, GPT-FL leverages generative pre-trained models to generate diversified synthetic data. These generated data are used to train a downstream model on the server, which is then fine-tuned with private client data under the standard FL framework. We show that GPT-FL consistently outperforms state-of-the-art FL methods in terms of model test accuracy, communication efficiency, and client sampling efficiency. Through comprehensive ablation analysis, we discover that the downstream model generated by synthetic data plays a crucial role in controlling the direction of gradient diversity during FL training, which enhances convergence speed and contributes to the notable accuracy boost observed with GPT-FL. Also, regardless of whether the target data falls within or outside the domain of the pre-trained generative model, GPT-FL consistently achieves significant performance gains, surpassing the results obtained by models trained solely with FL or synthetic data

    A first look into the carbon footprint of federated learning

    Full text link
    Despite impressive results, deep learning-based technologies also raise severe privacy and environmental concerns induced by the training procedure often conducted in datacenters. In response, alternatives to centralized training such as Federated Learning (FL) have emerged. Perhaps unexpectedly, FL, in particular, is starting to be deployed at a global scale by companies that must adhere to new legal demands and policies originating from governments and civil society for privacy protection. However, the potential environmental impact related to FL remains unclear and unexplored. This paper offers the first-ever systematic study of the carbon footprint of FL. First, we propose a rigorous model to quantify the carbon footprint, hence facilitating the investigation of the relationship between FL design and carbon emissions. Then, we compare the carbon footprint of FL to traditional centralized learning. Our findings show that FL, despite being slower to converge in some cases, may result in a comparatively greener impact than a centralized equivalent setup. We performed extensive experiments across different types of datasets, settings, and various deep learning models with FL. Finally, we highlight and connect the reported results to the future challenges and trends in FL to reduce its environmental impact, including algorithms efficiency, hardware capabilities, and stronger industry transparency.Comment: arXiv admin note: substantial text overlap with arXiv:2010.0653
    • …
    corecore