899 research outputs found

    An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics

    Full text link
    Near-sensor data analytics is a promising direction for IoT endpoints, as it minimizes energy spent on communication and reduces network load - but it also poses security concerns, as valuable data is stored or sent over the network at various stages of the analytics pipeline. Using encryption to protect sensitive data at the boundary of the on-chip analytics engine is a way to address data security issues. To cope with the combined workload of analytics and encryption in a tight power envelope, we propose Fulmine, a System-on-Chip based on a tightly-coupled multi-core cluster augmented with specialized blocks for compute-intensive data processing and encryption functions, supporting software programmability for regular computing tasks. The Fulmine SoC, fabricated in 65nm technology, consumes less than 20mW on average at 0.8V achieving an efficiency of up to 70pJ/B in encryption, 50pJ/px in convolution, or up to 25MIPS/mW in software. As a strong argument for real-life flexible application of our platform, we show experimental results for three secure analytics use cases: secure autonomous aerial surveillance with a state-of-the-art deep CNN consuming 3.16pJ per equivalent RISC op; local CNN-based face detection with secured remote recognition in 5.74pJ/op; and seizure detection with encrypted data collection from EEG within 12.7pJ/op.Comment: 15 pages, 12 figures, accepted for publication to the IEEE Transactions on Circuits and Systems - I: Regular Paper

    A 64mW DNN-based Visual Navigation Engine for Autonomous Nano-Drones

    Full text link
    Fully-autonomous miniaturized robots (e.g., drones), with artificial intelligence (AI) based visual navigation capabilities are extremely challenging drivers of Internet-of-Things edge intelligence capabilities. Visual navigation based on AI approaches, such as deep neural networks (DNNs) are becoming pervasive for standard-size drones, but are considered out of reach for nanodrones with size of a few cm2{}^\mathrm{2}. In this work, we present the first (to the best of our knowledge) demonstration of a navigation engine for autonomous nano-drones capable of closed-loop end-to-end DNN-based visual navigation. To achieve this goal we developed a complete methodology for parallel execution of complex DNNs directly on-bard of resource-constrained milliwatt-scale nodes. Our system is based on GAP8, a novel parallel ultra-low-power computing platform, and a 27 g commercial, open-source CrazyFlie 2.0 nano-quadrotor. As part of our general methodology we discuss the software mapping techniques that enable the state-of-the-art deep convolutional neural network presented in [1] to be fully executed on-board within a strict 6 fps real-time constraint with no compromise in terms of flight results, while all processing is done with only 64 mW on average. Our navigation engine is flexible and can be used to span a wide performance range: at its peak performance corner it achieves 18 fps while still consuming on average just 3.5% of the power envelope of the deployed nano-aircraft.Comment: 15 pages, 13 figures, 5 tables, 2 listings, accepted for publication in the IEEE Internet of Things Journal (IEEE IOTJ

    Low Power Implementation of Non Power-of-Two FFTs on Coarse-Grain Reconfigurable Architectures

    Get PDF
    The DRM standard for digital radio broadcast in the AM band requires integrated devices for radio receivers at very low power. A System on Chip (SoC) call DiMITRI was developed based on a dual ARM9 RISC core architecture. Analyses showed that most computation power is used in the Coded Orthogonal Frequency Division Multiplexing (COFDM) demodulation to compute Fast Fourier Transforms (FFT) and inverse transforms (IFFT) on complex samples. These FFTs have to be computed on non power-of-two numbers of samples, which is very uncommon in the signal processing world. The results obtained with this chip, lead to the objective to decrease the power dissipated by the COFDM demodulation part using a coarse-grain reconfigurable structure as a coprocessor. This paper introduces two different coarse-grain architectures: PACT XPP technology and the Montium, developed by the University of Twente, and presents the implementation of a\ud Fast Fourier Transform on 1920 complex samples. The implementation result on the Montium shows a saving of a factor 35 in terms of processing time, and 14 in terms of power consumption compared to the RISC implementation, and a\ud smaller area. Then, as a conclusion, the paper presents the next steps of the development and some development issues

    Mr.Wolf: An Energy-Precision Scalable Parallel Ultra Low Power SoC for IoT Edge Processing

    Get PDF
    This paper presents Mr. Wolf, a parallel ultra-low power (PULP) system on chip (SoC) featuring a hierarchical architecture with a small (12 kgates) microcontroller (MCU) class RISC-V core augmented with an autonomous IO subsystem for efficient data transfer from a wide set of peripherals. The small core can offload compute-intensive kernels to an eight-core floating-point capable of processing engine available on demand. The proposed SoC, implemented in a 40-nm LP CMOS technology, features a 108-mu W fully retentive memory (512 kB). The IO subsystem is capable of transferring up to 1.6 Gbit/s from external devices to the memory in less than 2.5 mW. The eight-core compute cluster achieves a peak performance of 850 million of 32-bit integer multiply and accumulate per second (MMAC/s) and 500 million of 32-bit floating-point multiply and accumulate per second (MFMAC/s) -1 GFlop/s-with an energy efficiency up to 15 MMAC/s/mW and 9 MFMAC/s/mW. These building blocks are supported by aggressive on-chip power conversion and management, enabling energy-proportional heterogeneous computing for always-on IoT end nodes improving performance by several orders of magnitude with respect to traditional single-core MCUs within a power envelope of 153 mW. We demonstrated the capabilities of the proposed SoC on a wide set of near-sensor processing kernels showing that Mr. Wolf can deliver performance up to 16.4 GOp/s with energy efficiency up to 274 MOp/s/mW on real-life applications, paving the way for always-on data analytics on high-bandwidth sensors at the edge of the Internet of Things

    Instruction Set Extension of a Low-End Reconfigurable Microcontroller in Bit-Sorting Implementation

    Get PDF
    The microcontroller-based system is currently having a tremendous boost with the revelation of platforms such as the Internet of Things. Low-end families of microcontroller architecture are still in demand albeit less technologically advanced due to its better I/O better application and control. However, there is clearly a lack of computational capability of the low-end architecture that will affect the pre-processing stage of the received data. The purpose of this research is to combine the best feature of an 8-bit microcontroller architecture together with the computationally complex operations without incurring extra resources. The modules’ integration is implemented using instruction set architecture (ISA) extension technique and is developed on the Field Programmable Gate Array (FPGA). Extensive simulations were performed with the and a comprehensive methodology is proposed. It was found that the ISA extension from 12-bit to 16-bit has produced a faster execution time with fewer resource utilization when implementing the bit-sorting algorithm. The overall development process used in this research is flexible enough for further investigation either by extending its module to more complex algorithms or evaluating other designs of its components

    Fourier Based Three Phase Power Metering System

    Get PDF
    The increased application of higher frequency nonlinear loads, such as electronic fluorescent ballasts and higher speed adjustable drives, has resulted in the need to monitor the higher power system harmonics which were largely ignored in earlier power monitors. Addressing this need requires a meter with substantially higher sample rate and greater computational power. This paper describes a zero-blind, three-phase, three-element power meter that samples three voltages and four currents at 256 points per cycle. The instrument relies on the FFT to compute real and reactive power at each harmonic and reports total real, reactive, and distortion power on each phase. The innovative design is based on a multiprocessor chip which incorporates a DSP for acquisition and point metering, a CISC processor for floating point summary data, and a RISC processor for interface to support communications with the host PC. The paper concludes with a system evaluation on highly distorted industrial power system waveforms

    PULP-NN: Accelerating Quantized Neural Networks on Parallel Ultra-Low-Power RISC-V Processors

    Get PDF
    We present PULP-NN, an optimized computing library for a parallel ultra-low-power tightly coupled cluster of RISC-V processors. The key innovation in PULP-NN is a set of kernels for quantized neural network inference, targeting byte and sub-byte data types, down to INT-1, tuned for the recent trend toward aggressive quantization in deep neural network inference. The proposed library exploits both the digital signal processing extensions available in the PULP RISC-V processors and the cluster\u2019s parallelism, achieving up to 15.5 MACs/cycle on INT-8 and improving performance by up to 63 7 with respect to a sequential implementation on a single RISC-V core implementing the baseline RV32IMC ISA. Using PULP-NN, a CIFAR-10 network on an octa-core cluster runs in 30 7 and 19.6 7 less clock cycles than the current state-of-the-art ARM CMSIS-NN library, running on STM32L4 and STM32H7 MCUs, respectively. The proposed library, when running on a GAP-8 processor, outperforms by 36.8 7 and by 7.45 7 the execution on energy efficient MCUs such as STM32L4 and high-end MCUs such as STM32H7 respectively, when operating at the maximum frequency. The energy efficiency on GAP-8 is 14.1 7 higher than STM32L4 and 39.5 7 higher than STM32H7, at the maximum efficiency operating point. This article is part of the theme issue \u2018Harmonizing energy-autonomous computing and intelligence\u2019

    PULP-NN: A computing library for quantized neural network inference at the edge on RISC-V based parallel ultra low power clusters

    Get PDF
    We present PULP-NN, a multicore computing library for a parallel ultra-low-power cluster of RISC-V based processors. The library consists of a set of kernels for Quantized Neural Network (QNN) inference on edge devices, targeting byte and sub-byte data types, down to INT-1. Our software solution exploits the digital signal processing (DSP) extensions available in the PULP RISC-V processors and the cluster's parallelism, improving performance by up to 63 7 with respect to a baseline implementation on a single RISC-V core implementing the RV32IMC ISA. Using the PULP-NN routines, the inference of a CIFAR-10 QNN model runs in 30 7 and 19.6 7 less clock cycles than the current state-of-the-art ARM CMSIS-NN library, running on an STM32L4 and an STM32H7 MCUs, respectively. By running the library kernels on the GAP-8 processor at the maximum efficiency operating point, the energy efficiency on GAP-8 is 14.1 7 higher than STM32L4 and 39.5 7 than STM32H7

    An Automated Dna Strands Detection System Featuring 32-Bit Arm7tdmi Microcontroller And Vga-Cmos Digital Image Sensor.

    Get PDF
    Genetic DNA recognition is a routine experiment for detecting the origin of the species. Electrophoresis is one of the processes for such detection which has been used extensively. Pengecaman genetik DNA ialah eksperimen rutin untuk mengesan asal usul sesuatu spesis. Proses electrophoresis ialah salah satu proses pengecaman yang digunakan secara meluas
    corecore