4,341 research outputs found

    YodaNN: An Architecture for Ultra-Low Power Binary-Weight CNN Acceleration

    Get PDF
    Convolutional neural networks (CNNs) have revolutionized the world of computer vision over the last few years, pushing image classification beyond human accuracy. The computational effort of today's CNNs requires power-hungry parallel processors or GP-GPUs. Recent developments in CNN accelerators for system-on-chip integration have reduced energy consumption significantly. Unfortunately, even these highly optimized devices are above the power envelope imposed by mobile and deeply embedded applications and face hard limitations caused by CNN weight I/O and storage. This prevents the adoption of CNNs in future ultra-low power Internet of Things end-nodes for near-sensor analytics. Recent algorithmic and theoretical advancements enable competitive classification accuracy even when limiting CNNs to binary (+1/-1) weights during training. These new findings bring major optimization opportunities in the arithmetic core by removing the need for expensive multiplications, as well as reducing I/O bandwidth and storage. In this work, we present an accelerator optimized for binary-weight CNNs that achieves 1510 GOp/s at 1.2 V on a core area of only 1.33 MGE (Million Gate Equivalent) or 0.19 mm2^2 and with a power dissipation of 895 {\mu}W in UMC 65 nm technology at 0.6 V. Our accelerator significantly outperforms the state-of-the-art in terms of energy and area efficiency achieving 61.2 TOp/s/[email protected] V and 1135 GOp/s/[email protected] V, respectively

    Counting digital filters

    Get PDF
    Several embodiments of a counting digital filter of the non-recursive type are disclosed. In each embodiment two registers, at least one of which is a shift register, are included. The shift register received j sub x-bit data input words bit by bit. The kth data word is represented by the integer

    Digital servo control of random sound test excitation

    Get PDF
    A digital servocontrol system for random noise excitation of a test object in a reverberant acoustic chamber employs a plurality of sensors spaced in the sound field to produce signals in separate channels which are decorrelated and averaged. The average signal is divided into a plurality of adjacent frequency bands cyclically sampled by a time division multiplex system, converted into digital form, and compared to a predetermined spectrum value stored in digital form. The results of the comparisons are used to control a time-shared up-down counter to develop gain control signals for the respective frequency bands in the spectrum of random sound energy picked up by the microphones

    Spacecraft Microminiature PAM Decommutator System

    Get PDF
    Operation and testing of spacecraft microminiature PAM decommutator syste

    Implementation and Evaluation of Power Consumption of an Iris Pre-processing Algorithm on Modern FPGA

    Get PDF
    In this article, the efficiency and applicability of several power reduction techniques applied on a modern 65nm FPGA is described. For image erosion and dilation algorithms, two major solutions were tested and compared with respect to power and energy consumption. Firstly the algorithm was run on a general purpose processor (gpp) NIOS and then hardware architecture of an Intellectual Property (IP) was designed. Furthermore IPs design was improved by applying a number of power optimization techniques. They involved RTL level clock gating, power driven synthesis with fitting and appropriate coding style. Results show that hardware implementation is much more energy efficient than a general purpose processor and power optimization schemes can reduce the overall power consumption on an FPGA

    An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics

    Full text link
    Near-sensor data analytics is a promising direction for IoT endpoints, as it minimizes energy spent on communication and reduces network load - but it also poses security concerns, as valuable data is stored or sent over the network at various stages of the analytics pipeline. Using encryption to protect sensitive data at the boundary of the on-chip analytics engine is a way to address data security issues. To cope with the combined workload of analytics and encryption in a tight power envelope, we propose Fulmine, a System-on-Chip based on a tightly-coupled multi-core cluster augmented with specialized blocks for compute-intensive data processing and encryption functions, supporting software programmability for regular computing tasks. The Fulmine SoC, fabricated in 65nm technology, consumes less than 20mW on average at 0.8V achieving an efficiency of up to 70pJ/B in encryption, 50pJ/px in convolution, or up to 25MIPS/mW in software. As a strong argument for real-life flexible application of our platform, we show experimental results for three secure analytics use cases: secure autonomous aerial surveillance with a state-of-the-art deep CNN consuming 3.16pJ per equivalent RISC op; local CNN-based face detection with secured remote recognition in 5.74pJ/op; and seizure detection with encrypted data collection from EEG within 12.7pJ/op.Comment: 15 pages, 12 figures, accepted for publication to the IEEE Transactions on Circuits and Systems - I: Regular Paper
    • 

    corecore