5 research outputs found

    Federated learning on embedded devices

    Get PDF
    TinyML ha guanyat molta popularitat en aquests els últims anys, portant ML a dispositius amb poca memòria, capacitat de computació i ús d'energia. Entrenar models en ordinadors potents amb grans datasets i exportar el model comprimit resultant per a fer-lo servir només per a inferència als microcontroladors ha estat estudiat extensivament. Però aquest mètode no permet que el dispositiu continuï aprenent a partir de noves dades. A una era on la privacitat de les dades és essencial, guardar i administrar els datasets usats per a entrenar aquests models pot ser un problema. Moure l'entrenament de la xarxa neuronal al dispositiu pot eliminar la necessitat de guardar i transmetre dades sensitives, ja que aquestes dades seran capturades, emprades per entrenar el model i eliminades després al dispositiu. Fer servir FL permet diversos dispositius entrenar i compartir els seus models sense la necessitat de transmetre les dades recol·lectades. En aquest treball explorem com TinyML amb entrenament al mateix dispositiu en conjunció amb FL pot ser usat, les dificultats que planteja i possibles solucions.TinyML has gained a lot of popularity in recent years, bringing ML to devices constrained by memory, computation capacity and power. Training models on powerful computers with big datasets and exporting the compressed resulting model to be used for only inference on microcontrollers has been studied extensively. But this method does not allow for an edge device to keep on learning from new data. In an era where data privacy is essential, storing and managing the datasets used to train these models can be a problem. Moving the training of the NN to the edge device can eradicate the need of storing or transmitting any sensitive data, since this data will be captured, used once to train the model and discarded afterwards. Also, using FL enables multiple devices to train and share their models without the need of transmitting any collected data. In this work, we explore how can TinyML can be used with on-device training in combination with FL, what issues does it raise and possible solutions

    Embedded federated learning over a LoRa mesh network

    Get PDF
    In on-device training of machine learning models on microcontrollers a neural network is trained on the device. A specific approach for collaborative on-device training is federated learning. In this paper, we propose embedded federated learning on microcontroller boards using the communication capacity of a LoRa mesh network. We apply a dual board design: The machine learning application that contains a neural network is trained for a keyword spotting task on the Arduino Portenta H7. For the networking of the federated learning process, the Portenta is connected to a TTGO LORA32 board that operates as a router within a LoRa mesh network. We experiment the federated learning application on the LoRa mesh network and analyze the network, system, and application level performance. The results from our experimentation suggest the feasibility of the proposed system and exemplify an implementation of a distributed application with re-trainable compute nodes, interconnected over LoRa, entirely deployed at the tiny edge.This work was supported by the Spanish Government under contracts PID2019-106774RB-C21, PCI2019-111851-2 (LeadingEdge CHIST-ERA), PCI2019-111850-2 (DiPET CHIST-ERA).Peer ReviewedPostprint (published version

    On-device training of machine learning models on microcontrollers with federated learning

    Get PDF
    Recent progress in machine learning frameworks has made it possible to now perform inference with models using cheap, tiny microcontrollers. Training of machine learning models for these tiny devices, however, is typically done separately on powerful computers. This way, the training process has abundant CPU and memory resources to process large stored datasets. In this work, we explore a different approach: training the machine learning model directly on the microcontroller and extending the training process with federated learning. We implement this approach for a keyword spotting task. We conduct experiments with real devices to characterize the learning behavior and resource consumption for different hyperparameters and federated learning configurations. We observed that in the case of training locally with fewer data, more frequent federated learning rounds more quickly reduced the training loss but involved a cost of higher bandwidth usage and longer training time. Our results indicate that, depending on the specific application, there is a need to determine the trade-off between the requirements and the resource usage of the system.This work has received funding from the Spanish Government under contracts PID2019-106774RB-C21, PCI2019-111851-2 (LeadingEdge CHIST-ERA), PCI2019-111850-2 (DiPET CHIST-ERA).Peer ReviewedPostprint (published version

    The Effects of Weight Quantization on Online Federated Learning for the IoT: A Case Study

    Get PDF
    Many weight quantization approaches were explored to save the communication bandwidth between the clients and the server in federated learning using high-end computing machines. However, there is a lack of weight quantization research for online federated learning using TinyML devices which are restricted by the mini-batch size, the neural network size, and the communication method due to their severe hardware resource constraints and power budgets. We name Tiny Online Federated Learning (TinyOFL) for online federated learning using TinyML devices in the Internet of Things (IoT). This paper performs a comprehensive analysis of the effects of weight quantization in TinyOFL in terms of accuracy, stability, overfitting, communication efficiency, energy consumption, and delivery time, and extracts practical guidelines on how to apply the weight quantization to TinyOFL. Our analysis is supported by a TinyOFL case study with three Arduino Portenta H7 boards running federated learning clients for a keyword spotting task. Our findings include that in TinyOFL, a more aggressive weight quantization can be allowed than in online learning without FL, without affecting the accuracy thanks to TinyOFL’s quasi-batch training property. For example, using 7-bit weights achieved the equivalent accuracy to 32-bit floating point weights, while saving communication bandwidth by 4.6× . Overfitting by increasing network width rarely occurs in TinyOFL, but may occur if strong weight quantization is applied. The experiments also showed that there is a design space for TinyOFL applications by compensating for the accuracy loss due to weight quantization with an increase of the neural network size

    On-Device Training of Machine Learning Models on Microcontrollers with Federated Learning

    No full text
    Recent progress in machine learning frameworks has made it possible to now perform inference with models using cheap, tiny microcontrollers. Training of machine learning models for these tiny devices, however, is typically done separately on powerful computers. This way, the training process has abundant CPU and memory resources to process large stored datasets. In this work, we explore a different approach: training the machine learning model directly on the microcontroller and extending the training process with federated learning. We implement this approach for a keyword spotting task. We conduct experiments with real devices to characterize the learning behavior and resource consumption for different hyperparameters and federated learning configurations. We observed that in the case of training locally with fewer data, more frequent federated learning rounds more quickly reduced the training loss but involved a cost of higher bandwidth usage and longer training time. Our results indicate that, depending on the specific application, there is a need to determine the trade-off between the requirements and the resource usage of the system
    corecore