158 research outputs found

    Energy-efficient adaptive machine learning on IoT end-nodes with class-dependent confidence

    Get PDF
    Energy-efficient machine learning models that can run directly on edge devices are of great interest in IoT applications, as they can reduce network pressure and response latency, and improve privacy. An effective way to obtain energy-efficiency with small accuracy drops is to sequentially execute a set of increasingly complex models, early-stopping the procedure for 'easy' inputs that can be confidently classified by the smallest models. As a stopping criterion, current methods employ a single threshold on the output probabilities produced by each model. In this work, we show that such a criterion is sub-optimal for datasets that include classes of different complexity, and we demonstrate a more general approach based on per-classes thresholds. With experiments on a low-power end-node, we show that our method can significantly reduce the energy consumption compared to the single-threshold approach

    Predicting Hard Disk Failures in Data Centers Using Temporal Convolutional Neural Networks

    Get PDF
    In modern data centers, storage system failures are major contributors to downtimes and maintenance costs. Predicting these failures by collecting measurements from disks and analyzing them with machine learning techniques can effectively reduce their impact, enabling timely maintenance. While there is a vast literature on this subject, most approaches attempt to predict hard disk failures using either classic machine learning solutions, such as Random Forests (RFs) or deep Recurrent Neural Networks (RNNs). In this work, we address hard disk failure prediction using Temporal Convolutional Networks (TCNs), a novel type of deep neural network for time series analysis. Using a real-world dataset, we show that TCNs outperform both RFs and RNNs. Specifically, we can improve the Fault Detection Rate (FDR) of ≈ 7.5% (FDR = 89.1%) compared to the state-of-the-art, while simultaneously reducing the False Alarm Rate (FAR = 0.052%). Moreover, we explore the network architecture design space showing that TCNs are consistently superior to RNNs for a given model size and complexity and that even relatively small TCNs can reach satisfactory performance. All the codes to reproduce the results presented in this paper are available at https://github.com/ABurrello/tcn-hard-disk-failure-prediction

    Channel-wise Mixed-precision Assignment for DNN Inference on Constrained Edge Nodes

    Get PDF
    Quantization is widely employed in both cloud and edge systems to reduce the memory occupation, latency, and energy consumption of deep neural networks. In particular, mixed-precision quantization, i.e., the use of different bit-widths for different portions of the network, has been shown to provide excellent efficiency gains with limited accuracy drops, especially with optimized bit-width assignments determined by automated Neural Architecture Search (NAS) tools. State-of-The-Art mixed-precision works layer-wise, i.e., it uses different bit-widths for the weights and activations tensors of each network layer. In this work, we widen the search space, proposing a novel NAS that selects the bit-width of each weight tensor channel independently. This gives the tool the additional flexibility of assigning a higher precision only to the weights associated with the most informative features. Testing on the MLPerf Tiny benchmark suite, we obtain a rich collection of Pareto-optimal models in the accuracy vs model size and accuracy vs energy spaces. When deployed on the MPIC RISC-V edge processor, our networks reduce the memory and energy for inference by up to 63% and 27% respectively compared to a layer-wise approach, for the same accuracy

    Improving PPG-based Heart-Rate Monitoring with Synthetically Generated Data

    Get PDF
    Improving the quality of heart-rate monitoring is the basis for a full-time assessment of people’s daily care. Recent state-of-the-art heart-rate monitoring algorithms exploit PPG and inertial data to efficiently estimate subjects’ beats-per-minute (BPM) directly on wearable devices. Despite the easy-recording of these signals (e.g., through commercial smartwatches), which makes this approach appealing, new challenges are arising. The first problem is fitting these algorithms into low-power memory-constrained MCUs. Further, the PPG signal usually has a low signal-to-noise ratio due to the presence of motion artifacts (MAs) arising from movements of subjects’ arms. In this work, we propose using synthetically generated data to improve the accuracy of PPG-based heart-rate tracking using deep neural networks without increasing the algorithm’s complexity. Using the TEMPONet network as baseline, we show that the HR tracking Mean Absolute Error (MAE) can be reduced from 5.28 to 4.86 BPM on PPGDalia dataset. Noteworthy, to do so, we only increase the training time, keeping the inference step unchanged. Consequently, the new and more accurate network can still fit the small memory of the GAP8 MCU, occupying 429 KB when quantized to 8bits

    Bioformers: Embedding Transformers for Ultra-Low Power sEMG-based Gesture Recognition

    Get PDF
    Human-machine interaction is gaining traction in rehabilitation tasks, such as controlling prosthetic hands or robotic arms. Gesture recognition exploiting surface electromyographic (sEMG) signals is one of the most promising approaches, given that sEMG signal acquisition is non-invasive and is directly related to muscle contraction. However, the analysis of these signals still presents many challenges since similar gestures result in similar muscle contractions. Thus the resulting signal shapes are almost identical, leading to low classification accuracy. To tackle this challenge, complex neural networks are employed, which require large memory footprints, consume relatively high energy and limit the maximum battery life of devices used for classification. This work addresses this problem with the introduction of the Bioformers. This new family of ultra-small attention-based architectures approaches state-of-the-art performance while reducing the number of parameters and operations of 4.9 ×. Additionally, by introducing a new inter-subjects pre-training, we improve the accuracy of our best Bioformer by 3.39 %, matching state-of-the-art accuracy without any additional inference cost. Deploying our best performing Bioformer on a Parallel, Ultra-Low Power (PULP) microcontroller unit (MCU), the GreenWaves GAP8, we achieve an inference latency and energy of 2.72 ms and 0.14 mJ, respectively, 8.0× lower than the previous state-of-the-art neural network, while occupying just 94.2 kB of memory

    Ultra-compact binary neural networks for human activity recognition on RISC-V processors

    Get PDF
    Human Activity Recognition (HAR) is a relevant inference task in many mobile applications. State-of-the-art HAR at the edge is typically achieved with lightweight machine learning models such as decision trees and Random Forests (RFs), whereas deep learning is less common due to its high computational complexity. In this work, we propose a novel implementation of HAR based on deep neural networks, and precisely on Binary Neural Networks (BNNs), targeting low-power general purpose processors with a RISC-V instruction set. BNNs yield very small memory footprints and low inference complexity, thanks to the replacement of arithmetic operations with bit-wise ones. However, existing BNN implementations on general purpose processors impose constraints tailored to complex computer vision tasks, which result in over-parametrized models for simpler problems like HAR. Therefore, we also introduce a new BNN inference library, which targets ultra-compact models explicitly. With experiments on a single-core RISC-V processor, we show that BNNs trained on two HAR datasets obtain higher classification accuracy compared to a state-of-the-art baseline based on RFs. Furthermore, our BNN reaches the same accuracy of a RF with either less memory (up to 91%) or more energy-efficiency (up to 70%), depending on the complexity of the features extracted by the RF

    Ambient Intelligence: A Computational Platform Perspective

    Full text link
    Computational platforms are a key enabling technology for materializing the Ambient Intelligence vision. Ambient intelligence devices will require a widely ranging computational power under widely ranging system-level constraints on cost, reliability, power consumption. We coarsely group computational architectures in three broad classes, namely: fixed-base network (the workhorses), wireless base network (the hummingbirds), and wireless sensor network (the butterflies). Speed and power requirements for devices in these three classes span six orders of magnitude. In this paper, we analyze commonalities and differences between these three classes of computational architectures, and moving from the analysis of representative state-of-the-art devices, we survey design trends directions of research
    • …
    corecore