23,690 research outputs found

    Leveraging Automated Mixed-Low-Precision Quantization for Tiny Edge Microcontrollers

    Get PDF
    The severe on-chip memory limitations are currently preventing the deployment of the most accurate Deep Neural Network (DNN) models on tiny MicroController Units (MCUs), even if leveraging an effective 8-bit quantization scheme. To tackle this issue, in this paper we present an automated mixed-precision quantization flow based on the HAQ framework but tailored for the memory and computational characteristics of MCU devices. Specifically, a Reinforcement Learning agent searches for the best uniform quantization levels, among 2, 4, 8 bits, of individual weight and activation tensors, under the tight constraints on RAM and FLASH embedded memory sizes. We conduct an experimental analysis on MobileNetV1, MobileNetV2 and MNasNet models for Imagenet classification. Concerning the quantization policy search, the RL agent selects quantization policies that maximize the memory utilization. Given an MCU-class memory bound of 2 MB for weight-only quantization, the compressed models produced by the mixed-precision engine result as accurate as the state-of-the-art solutions quantized with a non-uniform function, which is not tailored for CPUs featuring integer-only arithmetic. This denotes the viability of uniform quantization, required for MCU deployments, for deep weights compression. When also limiting the activation memory budget to 512 kB, the best MobileNetV1 model scores up to 68.4% on Imagenet thanks to the found quantization policy, resulting to be 4% more accurate than the other 8-bit networks fitting the same memory constraints

    Self-Learning Hot Data Prediction: Where Echo State Network Meets NAND Flash Memories

    Get PDF
    © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Well understanding the access behavior of hot data is significant for NAND flash memory due to its crucial impact on the efficiency of garbage collection (GC) and wear leveling (WL), which respectively dominate the performance and life span of SSD. Generally, both GC and WL rely greatly on the recognition accuracy of hot data identification (HDI). However, in this paper, the first time we propose a novel concept of hot data prediction (HDP), where the conventional HDI becomes unnecessary. First, we develop a hybrid optimized echo state network (HOESN), where sufficiently unbiased and continuously shrunk output weights are learnt by a sparse regression based on L2 and L1/2 regularization. Second, quantum-behaved particle swarm optimization (QPSO) is employed to compute reservoir parameters (i.e., global scaling factor, reservoir size, scaling coefficient and sparsity degree) for further improving prediction accuracy and reliability. Third, in the test on a chaotic benchmark (Rossler), the HOESN performs better than those of six recent state-of-the-art methods. Finally, simulation results about six typical metrics tested on five real disk workloads and on-chip experiment outcomes verified from an actual SSD prototype indicate that our HOESN-based HDP can reliably promote the access performance and endurance of NAND flash memories.Peer reviewe

    Wearable Fall Detector Using Recurrent Neural Networks

    Get PDF
    Falls have become a relevant public health issue due to their high prevalence and negative effects in elderly people. Wearable fall detector devices allow the implementation of continuous and ubiquitous monitoring systems. The effectiveness for analyzing temporal signals with low energy consumption is one of the most relevant characteristics of these devices. Recurrent neural networks (RNNs) have demonstrated a great accuracy in some problems that require analyzing sequential inputs. However, getting appropriate response times in low power microcontrollers remains a difficult task due to their limited hardware resources. This work shows a feasibility study about using RNN-based deep learning models to detect both falls and falls’ risks in real time using accelerometer signals. The effectiveness of four different architectures was analyzed using the SisFall dataset at different frequencies. The resulting models were integrated into two different embedded systems to analyze the execution times and changes in the model effectiveness. Finally, a study of power consumption was carried out. A sensitivity of 88.2% and a specificity of 96.4% was obtained. The simplest models reached inference times lower than 34 ms, which implies the capability to detect fall events in real-time with high energy efficiency. This suggests that RNN models provide an effective method that can be implemented in low power microcontrollers for the creation of autonomous wearable fall detection systems in real-time
    corecore