1,619 research outputs found

    An Experimental Study of Reduced-Voltage Operation in Modern FPGAs for Neural Network Acceleration

    Get PDF
    We empirically evaluate an undervolting technique, i.e., underscaling the circuit supply voltage below the nominal level, to improve the power-efficiency of Convolutional Neural Network (CNN) accelerators mapped to Field Programmable Gate Arrays (FPGAs). Undervolting below a safe voltage level can lead to timing faults due to excessive circuit latency increase. We evaluate the reliability-power trade-off for such accelerators. Specifically, we experimentally study the reduced-voltage operation of multiple components of real FPGAs, characterize the corresponding reliability behavior of CNN accelerators, propose techniques to minimize the drawbacks of reduced-voltage operation, and combine undervolting with architectural CNN optimization techniques, i.e., quantization and pruning. We investigate the effect of environmental temperature on the reliability-power trade-off of such accelerators. We perform experiments on three identical samples of modern Xilinx ZCU102 FPGA platforms with five state-of-the-art image classification CNN benchmarks. This approach allows us to study the effects of our undervolting technique for both software and hardware variability. We achieve more than 3X power-efficiency (GOPs/W) gain via undervolting. 2.6X of this gain is the result of eliminating the voltage guardband region, i.e., the safe voltage region below the nominal level that is set by FPGA vendor to ensure correct functionality in worst-case environmental and circuit conditions. 43% of the power-efficiency gain is due to further undervolting below the guardband, which comes at the cost of accuracy loss in the CNN accelerator. We evaluate an effective frequency underscaling technique that prevents this accuracy loss, and find that it reduces the power-efficiency gain from 43% to 25%.Comment: To appear at the DSN 2020 conferenc

    Circuits and Systems Advances in Near Threshold Computing

    Get PDF
    Modern society is witnessing a sea change in ubiquitous computing, in which people have embraced computing systems as an indispensable part of day-to-day existence. Computation, storage, and communication abilities of smartphones, for example, have undergone monumental changes over the past decade. However, global emphasis on creating and sustaining green environments is leading to a rapid and ongoing proliferation of edge computing systems and applications. As a broad spectrum of healthcare, home, and transport applications shift to the edge of the network, near-threshold computing (NTC) is emerging as one of the promising low-power computing platforms. An NTC device sets its supply voltage close to its threshold voltage, dramatically reducing the energy consumption. Despite showing substantial promise in terms of energy efficiency, NTC is yet to see widescale commercial adoption. This is because circuits and systems operating with NTC suffer from several problems, including increased sensitivity to process variation, reliability problems, performance degradation, and security vulnerabilities, to name a few. To realize its potential, we need designs, techniques, and solutions to overcome these challenges associated with NTC circuits and systems. The readers of this book will be able to familiarize themselves with recent advances in electronics systems, focusing on near-threshold computing

    Runtime adaptive iomt node on multi-core processor platform

    Get PDF
    The Internet of Medical Things (IoMT) paradigm is becoming mainstream in multiple clinical trials and healthcare procedures. Thanks to innovative technologies, latest-generation communication networks, and state-of-the-art portable devices, IoTM opens up new scenarios for data collection and continuous patient monitoring. Two very important aspects should be considered to make the most of this paradigm. For the first aspect, moving the processing task from the cloud to the edge leads to several advantages, such as responsiveness, portability, scalability, and reliability of the sensor node. For the second aspect, in order to increase the accuracy of the system, state-of-the-art cognitive algorithms based on artificial intelligence and deep learning must be integrated. Sensory nodes often need to be battery powered and need to remain active for a long time without a different power source. Therefore, one of the challenges to be addressed during the design and development of IoMT devices concerns energy optimization. Our work proposes an implementation of cognitive data analysis based on deep learning techniques on resource-constrained computing platform. To handle power efficiency, we introduced a component called Adaptive runtime Manager (ADAM). This component takes care of reconfiguring the hardware and software of the device dynamically during the execution, in order to better adapt it to the workload and the required operating mode. To test the high computational load on a multi-core system, the Orlando prototype board by STMicroelectronics, cognitive analysis of Electrocardiogram (ECG) traces have been adopted, considering single-channel and six-channel simultaneous cases. Experimental results show that by managing the sensory node configuration at runtime, energy savings of at least 15% can be achieved

    An experimental study of reduced-voltage operation in modern FPGAs for neural network acceleration

    Get PDF
    We empirically evaluate an undervolting technique, i.e., underscaling the circuit supply voltage below the nominal level, to improve the power-efficiency of Convolutional Neural Network (CNN) accelerators mapped to Field Programmable Gate Arrays (FPGAs). Undervolting below a safe voltage level can lead to timing faults due to excessive circuit latency increase. We evaluate the reliability-power trade-off for such accelerators. Specifically, we experimentally study the reduced-voltage operation of multiple components of real FPGAs, characterize the corresponding reliability behavior of CNN accelerators, propose techniques to minimize the drawbacks of reduced-voltage operation, and combine undervolting with architectural CNN optimization techniques, i.e., quantization and pruning. We investigate the effect ofenvironmental temperature on the reliability-power trade-off of such accelerators. We perform experiments on three identical samples of modern Xilinx ZCU102 FPGA platforms with five state-of-the-art image classification CNN benchmarks. This approach allows us to study the effects of our undervolting technique for both software and hardware variability. We achieve more than 3X power-efficiency (GOPs/W ) gain via undervolting. 2.6X of this gain is the result of eliminating the voltage guardband region, i.e., the safe voltage region below the nominal level that is set by FPGA vendor to ensure correct functionality in worst-case environmental and circuit conditions. 43% of the power-efficiency gain is due to further undervolting below the guardband, which comes at the cost of accuracy loss in the CNN accelerator. We evaluate an effective frequency underscaling technique that prevents this accuracy loss, and find that it reduces the power-efficiency gain from 43% to 25%.The work done for this paper was partially supported by a HiPEAC Collaboration Grant funded by the H2020 HiPEAC Project under grant agreement No. 779656. The research leading to these results has received funding from the European Union’s Horizon 2020 Programme under the LEGaTO Project (www.legato-project.eu), grant agreement No. 780681.Peer ReviewedPostprint (author's final draft

    Optimized Biosignals Processing Algorithms for New Designs of Human Machine Interfaces on Parallel Ultra-Low Power Architectures

    Get PDF
    The aim of this dissertation is to explore Human Machine Interfaces (HMIs) in a variety of biomedical scenarios. The research addresses typical challenges in wearable and implantable devices for diagnostic, monitoring, and prosthetic purposes, suggesting a methodology for tailoring such applications to cutting edge embedded architectures. The main challenge is the enhancement of high-level applications, also introducing Machine Learning (ML) algorithms, using parallel programming and specialized hardware to improve the performance. The majority of these algorithms are computationally intensive, posing significant challenges for the deployment on embedded devices, which have several limitations in term of memory size, maximum operative frequency, and battery duration. The proposed solutions take advantage of a Parallel Ultra-Low Power (PULP) architecture, enhancing the elaboration on specific target architectures, heavily optimizing the execution, exploiting software and hardware resources. The thesis starts by describing a methodology that can be considered a guideline to efficiently implement algorithms on embedded architectures. This is followed by several case studies in the biomedical field, starting with the analysis of a Hand Gesture Recognition, based on the Hyperdimensional Computing algorithm, which allows performing a fast on-chip re-training, and a comparison with the state-of-the-art Support Vector Machine (SVM); then a Brain Machine Interface (BCI) to detect the respond of the brain to a visual stimulus follows in the manuscript. Furthermore, a seizure detection application is also presented, exploring different solutions for the dimensionality reduction of the input signals. The last part is dedicated to an exploration of typical modules for the development of optimized ECG-based applications

    Always-On 674uW @ 4GOP/s Error Resilient Binary Neural Networks with Aggressive SRAM Voltage Scaling on a 22nm IoT End-Node

    Full text link
    Binary Neural Networks (BNNs) have been shown to be robust to random bit-level noise, making aggressive voltage scaling attractive as a power-saving technique for both logic and SRAMs. In this work, we introduce the first fully programmable IoT end-node system-on-chip (SoC) capable of executing software-defined, hardware-accelerated BNNs at ultra-low voltage. Our SoC exploits a hybrid memory scheme where error-vulnerable SRAMs are complemented by reliable standard-cell memories to safely store critical data under aggressive voltage scaling. On a prototype in 22nm FDX technology, we demonstrate that both the logic and SRAM voltage can be dropped to 0.5Vwithout any accuracy penalty on a BNN trained for the CIFAR-10 dataset, improving energy efficiency by 2.2X w.r.t. nominal conditions. Furthermore, we show that the supply voltage can be dropped to 0.42V (50% of nominal) while keeping more than99% of the nominal accuracy (with a bit error rate ~1/1000). In this operating point, our prototype performs 4Gop/s (15.4Inference/s on the CIFAR-10 dataset) by computing up to 13binary ops per pJ, achieving 22.8 Inference/s/mW while keeping within a peak power envelope of 674uW - low enough to enable always-on operation in ultra-low power smart cameras, long-lifetime environmental sensors, and insect-sized pico-drones.Comment: Submitted to ISICAS2020 journal special issu

    Dynamic voltage scaling algorithms for soft and hard real-time system

    Get PDF
    Dynamic Voltage Scaling (DVS) has not been investigated completely for further minimizing the energy consumption of microprocessor and prolonging the operational life of real-time systems. In this dissertation, the workload prediction based DVS and the offline convex optimization based DVS for soft and hard real-time systems are investigated, respectively. The proposed algorithms of soft and hard real-time systems are implemented on a small scaled wireless sensor network (WSN) and a simulation model, respectively

    Challenges and Opportunities in Near-Threshold DNN Accelerators around Timing Errors

    Get PDF
    AI evolution is accelerating and Deep Neural Network (DNN) inference accelerators are at the forefront of ad hoc architectures that are evolving to support the immense throughput required for AI computation. However, much more energy efficient design paradigms are inevitable to realize the complete potential of AI evolution and curtail energy consumption. The Near-Threshold Computing (NTC) design paradigm can serve as the best candidate for providing the required energy efficiency. However, NTC operation is plagued with ample performance and reliability concerns arising from the timing errors. In this paper, we dive deep into DNN architecture to uncover some unique challenges and opportunities for operation in the NTC paradigm. By performing rigorous simulations in TPU systolic array, we reveal the severity of timing errors and its impact on inference accuracy at NTC. We analyze various attributes—such as data–delay relationship, delay disparity within arithmetic units, utilization pattern, hardware homogeneity, workload characteristics—and uncover unique localized and global techniques to deal with the timing errors in NTC

    Improving the Efficiency of Energy Harvesting Embedded System

    Get PDF
    In the past decade, mobile embedded systems, such as cell phones and tablets have infiltrated and dramatically transformed our life. The computation power, storage capacity and data communication speed of mobile devices have increases tremendously, and they have been used for more critical applications with intensive computation/communication. As a result, the battery lifetime becomes increasingly important and tends to be one of the key considerations for the consumers. Researches have been carried out to improve the efficiency of the lithium ion battery, which is a specific member in the more general Electrical Energy Storage (EES) family and is widely used in mobile systems, as well as the efficiency of other electrical energy storage systems such as supercapacitor, lead acid battery, and nickel–hydrogen battery etc. Previous studies show that hybrid electrical energy storage (HEES), which is a mixture of different EES technologies, gives the best performance. On the other hand, the Energy Harvesting (EH) technique has the potential to solve the problem once and for all by providing green and semi-permanent supply of energy to the embedded systems. However, the harvesting power must submit to the uncertainty of the environment and the variation of the weather. A stable and consistent power supply cannot always be guaranteed. The limited lifetime of the EES system and the unstableness of the EH system can be overcome by combining these two together to an energy harvesting embedded system and making them work cooperatively. In an energy harvesting embedded systems, if the harvested power is sufficient for the workload, extra power can be stored in the EES element; if the harvested power is short, the energy stored in the EES bank can be used to support the load demand. How much energy can be stored in the charging phase and how long the EES bank lifetime will be are affected by many factors including the efficiency of the energy harvesting module, the input/output voltage of the DC-DC converters, the status of the EES elements, and the characteristics of the workload. In this thesis, when the harvesting energy is abundant, our goal is to store as much surplus energy as possible in the EES bank under the variation of the harvesting power and the workload power. We investigate the impact of workload scheduling and Dynamic Voltage and Frequency Scaling (DVFS) of the embedded system on the energy efficiency of the EES bank in the charging phase. We propose a fast heuristic algorithm to minimize the energy overhead on the DC-DC converter while satisfying the timing constraints of the embedded workload and maximizing the energy stored in the HEES system. The proposed algorithm improves the efficiency of charging and discharging in an energy harvesting embedded system. On the other hand, when the harvesting rate is low, workload power consumption is supplied by the EES bank. In this case, we try to minimize the energy consumption on the embedded system to extend its EES bank life. In this thesis, we consider the scenario when workload has uncertainties and is running on a heterogeneous multi-core system. The workload variation is represented by the selection of conditional branches which activate or deactivate a set of instructions belonging to a task. We employ both task scheduling and DVFS techniques for energy optimization. Our scheduling algorithm considers the statistical information of the workload to minimize the mean power consumption of the application while satisfying a hard deadline constraint. The proposed DVFS algorithm has pseudo linear complexity and achieves comparable energy reduction as the solutions found by mathematical programming. Due to its capability of slack reclaiming, our DVFS technique is less sensitive to small change in hardware or workload and works more robustly than other techniques without slack reclaiming

    Efficient Estimation of Reliability Metrics for Circuits in Deca-Nanometer Nodes

    Get PDF
    51 σ.Καθώς η τεχνολογία οδηγεί στη κατασκευή τρανζίστορ ολοένα και μικρότερων διαστάσεων, έχουν εμφανιστεί αρκετά φαινόμενα που επηρεάζουν την αξιοπιστία των ολοκληρωμένων κυκλωμάτων. Ένα από αυτά τα φαινόμενα ονομάζεται "Bias Temperature Instability", αποτελεί σημαντικό κίνδυνο για την αξιοπιστία των ολοκληρωμένων κυκλωμάτων και έχει παρατηρηθεί εδώ και πάνω από 30 χρόνια. Το πρώτο μοντέλο που προσπάθησε να εξηγήσει αυτό το φαινόμενο εμφανίστηκε πριν από 30 περίπου χρόνια, βασίστηκε στη διάχυση υδρογόνου και ως εκ τούτου ονομάστηκε "Reaction-Diffusion model". Πριν από μερικά χρόνια δημιουργήθηκε ένα νέο ατομιστικό μοντέλο το οποίο βασίζεται κυρίως στην εμφάνιση ελαττωμάτων στο διηλεκτρικό μεταξύ της πύλης και του καναλιού των FET τρανζίστορ. Μελετώντας κανείς τη βιβλιογραφία που αφορά στο ατομιστικό αυτό μοντέλο, μπορεί να συναντήσει εργαλεία που προσομοιώνουν με ακρίβεια το μοντέλο αλλά δυστυχώς απαιτούν αρκετό χρόνο για να εκτελεστούν, κάτι το οποίο τα καθιστά απαγορευτικά για εκτενή χρήση. Παράλληλα, υπάρχουν εργαλεία βασισμένα στο μοντέλο της διάχυσης τα οποία βέβαια αδυνατούν να παράξουν σωστά και λεπτομερή αποτελέσματα, κυρίως σε τεχνολογίες μικρών διαστάσεων. Η παρούσα λοιπόν διπλωματική εργασία παρουσιάζει τα αποτελέσματα ενός νέου και καινοτόμου εργαλείου το οποίο βασίζεται στο ατομιστικό μοντέλο, ωστόσο προσομοιώνει αποδοτικά αλλά και με ακρίβεια το φαινόμενο της γήρανσης. Ένα αντιπροσωπευτικό μονοπάτι στατικής μνήμης (SRAM) θα χρησιμοποιηθεί ως παράδειγμα της λειτουργίας του μοντέλου ενώ παράλληλα θα υπολογισθούν, με βάση τα αποτελέσματα των προσομοιώσεων αυτών, μετρικές, σημαντικές για το χαρακτηρισμό της απόδοσης και αξιοπιστίας του κυκλώματος, ενώ παράλληλα θα μελετηθούν λεπτομερώς και οι σχέσεις που τις συνδέουν.In modern technologies of integrated circuits (IC) and with the downscaling of device dimensions, various degradation modes constitute major reliability concerns. Bias Temperature Instability (BTI) is a representative example, posing as a significant reliability threat in Field-Effect Transistor (FET) technologies and has been known for more than 30 years. At first, the model that tried to explain this phenomenon was based on the Reaction-Diffusion (RD) theory and was developed nearly 30 years ago. Recently, an atomistic model has been proposed, that enables the modeling of BTI in modern technologies. By observing the amount of software designed to simulate the BTI degradation, tools can be found that are based on the atomistic theory but are computationally prohibitive when it comes to simulating complex circuits consisting of a large number of devices. Tools based on the RD model are unable to accurately capture the BTI-induced degradation, especially in devices with small dimensions. The current thesis is appropriately positioned since it discusses a novel simulation framework that is efficient yet highly accurate. A subset of an embedded Static Random Access Memory (SRAM) is used for verification purposes. The estimation of the functional yield of the circuit over three years of operation will be examined as well as other reliability metrics, such as defects per million (DPM), mean time to failure (MTTF) and failures in time (FIT rate). Finally, the interplay between these metrics is discussed and efficient computation methods are proposed for each one.Μιχαήλ Α. Νόλτση
    corecore