Search CORE

1,619 research outputs found

An Experimental Study of Reduced-Voltage Operation in Modern FPGAs for Neural Network Acceleration

Author: Ergin Oguz
Kestelman Adrian Cristal
Koc Fahrettin
Mutlu Onur
Onural Erhan Baturay
Salami Behzad
Sarbazi-Azad Hamid
Unsal Osman S.
Yuksel Ismail Emir
Publication venue
Publication date: 01/01/2020
Field of study

We empirically evaluate an undervolting technique, i.e., underscaling the circuit supply voltage below the nominal level, to improve the power-efficiency of Convolutional Neural Network (CNN) accelerators mapped to Field Programmable Gate Arrays (FPGAs). Undervolting below a safe voltage level can lead to timing faults due to excessive circuit latency increase. We evaluate the reliability-power trade-off for such accelerators. Specifically, we experimentally study the reduced-voltage operation of multiple components of real FPGAs, characterize the corresponding reliability behavior of CNN accelerators, propose techniques to minimize the drawbacks of reduced-voltage operation, and combine undervolting with architectural CNN optimization techniques, i.e., quantization and pruning. We investigate the effect of environmental temperature on the reliability-power trade-off of such accelerators. We perform experiments on three identical samples of modern Xilinx ZCU102 FPGA platforms with five state-of-the-art image classification CNN benchmarks. This approach allows us to study the effects of our undervolting technique for both software and hardware variability. We achieve more than 3X power-efficiency (GOPs/W) gain via undervolting. 2.6X of this gain is the result of eliminating the voltage guardband region, i.e., the safe voltage region below the nominal level that is set by FPGA vendor to ensure correct functionality in worst-case environmental and circuit conditions. 43% of the power-efficiency gain is due to further undervolting below the guardband, which comes at the cost of accuracy loss in the CNN accelerator. We evaluate an effective frequency underscaling technique that prevents this accuracy loss, and find that it reduces the power-efficiency gain from 43% to 25%.Comment: To appear at the DSN 2020 conferenc

arXiv.org e-Print Archive

Crossref

UPCommons. Portal del coneixement obert de la UPC

TOBB ETÜ Institutional Repository

Circuits and Systems Advances in Near Threshold Computing

Author
Publication venue: 'MDPI AG'
Publication date: 11/01/2022
Field of study

Modern society is witnessing a sea change in ubiquitous computing, in which people have embraced computing systems as an indispensable part of day-to-day existence. Computation, storage, and communication abilities of smartphones, for example, have undergone monumental changes over the past decade. However, global emphasis on creating and sustaining green environments is leading to a rapid and ongoing proliferation of edge computing systems and applications. As a broad spectrum of healthcare, home, and transport applications shift to the edge of the network, near-threshold computing (NTC) is emerging as one of the promising low-power computing platforms. An NTC device sets its supply voltage close to its threshold voltage, dramatically reducing the energy consumption. Despite showing substantial promise in terms of energy efficiency, NTC is yet to see widescale commercial adoption. This is because circuits and systems operating with NTC suffer from several problems, including increased sensitivity to process variation, reliability problems, performance degradation, and security vulnerabilities, to name a few. To realize its potential, we need designs, techniques, and solutions to overcome these challenges associated with NTC circuits and systems. The readers of this book will be able to familiarize themselves with recent advances in electronics systems, focusing on near-threshold computing

Directory of Open Access Books (DOAB)

Runtime adaptive iomt node on multi-core processor platform

Author: Meloni P.
Raffo L.
Sau C.
Scrugli M. A.
Publication venue: 'MDPI AG'
Publication date: 01/01/2021
Field of study

The Internet of Medical Things (IoMT) paradigm is becoming mainstream in multiple clinical trials and healthcare procedures. Thanks to innovative technologies, latest-generation communication networks, and state-of-the-art portable devices, IoTM opens up new scenarios for data collection and continuous patient monitoring. Two very important aspects should be considered to make the most of this paradigm. For the first aspect, moving the processing task from the cloud to the edge leads to several advantages, such as responsiveness, portability, scalability, and reliability of the sensor node. For the second aspect, in order to increase the accuracy of the system, state-of-the-art cognitive algorithms based on artificial intelligence and deep learning must be integrated. Sensory nodes often need to be battery powered and need to remain active for a long time without a different power source. Therefore, one of the challenges to be addressed during the design and development of IoMT devices concerns energy optimization. Our work proposes an implementation of cognitive data analysis based on deep learning techniques on resource-constrained computing platform. To handle power efficiency, we introduced a component called Adaptive runtime Manager (ADAM). This component takes care of reconfiguring the hardware and software of the device dynamically during the execution, in order to better adapt it to the workload and the required operating mode. To test the high computational load on a multi-core system, the Orlando prototype board by STMicroelectronics, cognitive analysis of Electrocardiogram (ECG) traces have been adopted, considering single-channel and six-channel simultaneous cases. Experimental results show that by managing the sensory node configuration at runtime, energy savings of at least 15% can be achieved

Archivio istituzionale della ricerca - Università di Cagliari

An experimental study of reduced-voltage operation in modern FPGAs for neural network acceleration

Author: Cristal Kestelman Adrián
Ergin Oguz
Koc Fahrettin
Mutlu Onur
Onural Erhan Baturay
Salami Behzad
Sarbazi-Azad Hamid
Unsal Osman Sabri
Yuksel Ismail Emir
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

We empirically evaluate an undervolting technique, i.e., underscaling the circuit supply voltage below the nominal level, to improve the power-efficiency of Convolutional Neural Network (CNN) accelerators mapped to Field Programmable Gate Arrays (FPGAs). Undervolting below a safe voltage level can lead to timing faults due to excessive circuit latency increase. We evaluate the reliability-power trade-off for such accelerators. Specifically, we experimentally study the reduced-voltage operation of multiple components of real FPGAs, characterize the corresponding reliability behavior of CNN accelerators, propose techniques to minimize the drawbacks of reduced-voltage operation, and combine undervolting with architectural CNN optimization techniques, i.e., quantization and pruning. We investigate the effect ofenvironmental temperature on the reliability-power trade-off of such accelerators. We perform experiments on three identical samples of modern Xilinx ZCU102 FPGA platforms with five state-of-the-art image classification CNN benchmarks. This approach allows us to study the effects of our undervolting technique for both software and hardware variability. We achieve more than 3X power-efficiency (GOPs/W ) gain via undervolting. 2.6X of this gain is the result of eliminating the voltage guardband region, i.e., the safe voltage region below the nominal level that is set by FPGA vendor to ensure correct functionality in worst-case environmental and circuit conditions. 43% of the power-efficiency gain is due to further undervolting below the guardband, which comes at the cost of accuracy loss in the CNN accelerator. We evaluate an effective frequency underscaling technique that prevents this accuracy loss, and find that it reduces the power-efficiency gain from 43% to 25%.The work done for this paper was partially supported by a HiPEAC Collaboration Grant funded by the H2020 HiPEAC Project under grant agreement No. 779656. The research leading to these results has received funding from the European Union’s Horizon 2020 Programme under the LEGaTO Project (www.legato-project.eu), grant agreement No. 780681.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Optimized Biosignals Processing Algorithms for New Designs of Human Machine Interfaces on Parallel Ultra-Low Power Architectures

Author: Montagna Fabio <1989>
Publication venue: Alma Mater Studiorum - Università di Bologna
Publication date: 25/03/2020
Field of study

The aim of this dissertation is to explore Human Machine Interfaces (HMIs) in a variety of biomedical scenarios. The research addresses typical challenges in wearable and implantable devices for diagnostic, monitoring, and prosthetic purposes, suggesting a methodology for tailoring such applications to cutting edge embedded architectures. The main challenge is the enhancement of high-level applications, also introducing Machine Learning (ML) algorithms, using parallel programming and specialized hardware to improve the performance. The majority of these algorithms are computationally intensive, posing significant challenges for the deployment on embedded devices, which have several limitations in term of memory size, maximum operative frequency, and battery duration. The proposed solutions take advantage of a Parallel Ultra-Low Power (PULP) architecture, enhancing the elaboration on specific target architectures, heavily optimizing the execution, exploiting software and hardware resources. The thesis starts by describing a methodology that can be considered a guideline to efficiently implement algorithms on embedded architectures. This is followed by several case studies in the biomedical field, starting with the analysis of a Hand Gesture Recognition, based on the Hyperdimensional Computing algorithm, which allows performing a fast on-chip re-training, and a comparison with the state-of-the-art Support Vector Machine (SVM); then a Brain Machine Interface (BCI) to detect the respond of the brain to a visual stimulus follows in the manuscript. Furthermore, a seizure detection application is also presented, exploring different solutions for the dimensionality reduction of the input signals. The last part is dedicated to an exploration of typical modules for the development of optimized ECG-based applications

AMS Tesi di Dottorato

Always-On 674uW @ 4GOP/s Error Resilient Binary Neural Networks with Aggressive SRAM Voltage Scaling on a 22nm IoT End-Node

Author: Benini Luca
Conti Francesco
Di Mauro Alfio
Rossi Davide
Schiavone Pasquale Davide
Publication venue
Publication date: 01/01/2020
Field of study

Binary Neural Networks (BNNs) have been shown to be robust to random bit-level noise, making aggressive voltage scaling attractive as a power-saving technique for both logic and SRAMs. In this work, we introduce the first fully programmable IoT end-node system-on-chip (SoC) capable of executing software-defined, hardware-accelerated BNNs at ultra-low voltage. Our SoC exploits a hybrid memory scheme where error-vulnerable SRAMs are complemented by reliable standard-cell memories to safely store critical data under aggressive voltage scaling. On a prototype in 22nm FDX technology, we demonstrate that both the logic and SRAM voltage can be dropped to 0.5Vwithout any accuracy penalty on a BNN trained for the CIFAR-10 dataset, improving energy efficiency by 2.2X w.r.t. nominal conditions. Furthermore, we show that the supply voltage can be dropped to 0.42V (50% of nominal) while keeping more than99% of the nominal accuracy (with a bit error rate ~1/1000). In this operating point, our prototype performs 4Gop/s (15.4Inference/s on the CIFAR-10 dataset) by computing up to 13binary ops per pJ, achieving 22.8 Inference/s/mW while keeping within a peak power envelope of 674uW - low enough to enable always-on operation in ultra-low power smart cameras, long-lifetime environmental sensors, and insect-sized pico-drones.Comment: Submitted to ISICAS2020 journal special issu

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Dynamic voltage scaling algorithms for soft and hard real-time system

Author: Hong lue Ik
Publication venue: Curtin University
Publication date: 01/01/2012
Field of study

Dynamic Voltage Scaling (DVS) has not been investigated completely for further minimizing the energy consumption of microprocessor and prolonging the operational life of real-time systems. In this dissertation, the workload prediction based DVS and the offline convex optimization based DVS for soft and hard real-time systems are investigated, respectively. The proposed algorithms of soft and hard real-time systems are implemented on a small scaled wireless sensor network (WSN) and a simulation model, respectively

espace@Curtin

Challenges and Opportunities in Near-Threshold DNN Accelerators around Timing Errors

Author: Basu Prabal
Chakraborty Koushik
Gundi Noel Daniel
Pandey Pramesh
Patrick Mitchell Craig
Roy Sanghamitra
Shabanian Tahmoures
Publication venue: Hosted by Utah State University Libraries
Publication date: 16/10/2020
Field of study

AI evolution is accelerating and Deep Neural Network (DNN) inference accelerators are at the forefront of ad hoc architectures that are evolving to support the immense throughput required for AI computation. However, much more energy efficient design paradigms are inevitable to realize the complete potential of AI evolution and curtail energy consumption. The Near-Threshold Computing (NTC) design paradigm can serve as the best candidate for providing the required energy efficiency. However, NTC operation is plagued with ample performance and reliability concerns arising from the timing errors. In this paper, we dive deep into DNN architecture to uncover some unique challenges and opportunities for operation in the NTC paradigm. By performing rigorous simulations in TPU systolic array, we reveal the severity of timing errors and its impact on inference accuracy at NTC. We analyze various attributes—such as data–delay relationship, delay disparity within arithmetic units, utilization pattern, hardware homogeneity, workload characteristics—and uncover unique localized and global techniques to deal with the timing errors in NTC

DigitalCommons@USU

Improving the Efficiency of Energy Harvesting Embedded System

Author: Zhang Yukan
Publication venue: SURFACE at Syracuse University
Publication date: 01/12/2016
Field of study

In the past decade, mobile embedded systems, such as cell phones and tablets have infiltrated and dramatically transformed our life. The computation power, storage capacity and data communication speed of mobile devices have increases tremendously, and they have been used for more critical applications with intensive computation/communication. As a result, the battery lifetime becomes increasingly important and tends to be one of the key considerations for the consumers. Researches have been carried out to improve the efficiency of the lithium ion battery, which is a specific member in the more general Electrical Energy Storage (EES) family and is widely used in mobile systems, as well as the efficiency of other electrical energy storage systems such as supercapacitor, lead acid battery, and nickel–hydrogen battery etc. Previous studies show that hybrid electrical energy storage (HEES), which is a mixture of different EES technologies, gives the best performance. On the other hand, the Energy Harvesting (EH) technique has the potential to solve the problem once and for all by providing green and semi-permanent supply of energy to the embedded systems. However, the harvesting power must submit to the uncertainty of the environment and the variation of the weather. A stable and consistent power supply cannot always be guaranteed. The limited lifetime of the EES system and the unstableness of the EH system can be overcome by combining these two together to an energy harvesting embedded system and making them work cooperatively. In an energy harvesting embedded systems, if the harvested power is sufficient for the workload, extra power can be stored in the EES element; if the harvested power is short, the energy stored in the EES bank can be used to support the load demand. How much energy can be stored in the charging phase and how long the EES bank lifetime will be are affected by many factors including the efficiency of the energy harvesting module, the input/output voltage of the DC-DC converters, the status of the EES elements, and the characteristics of the workload. In this thesis, when the harvesting energy is abundant, our goal is to store as much surplus energy as possible in the EES bank under the variation of the harvesting power and the workload power. We investigate the impact of workload scheduling and Dynamic Voltage and Frequency Scaling (DVFS) of the embedded system on the energy efficiency of the EES bank in the charging phase. We propose a fast heuristic algorithm to minimize the energy overhead on the DC-DC converter while satisfying the timing constraints of the embedded workload and maximizing the energy stored in the HEES system. The proposed algorithm improves the efficiency of charging and discharging in an energy harvesting embedded system. On the other hand, when the harvesting rate is low, workload power consumption is supplied by the EES bank. In this case, we try to minimize the energy consumption on the embedded system to extend its EES bank life. In this thesis, we consider the scenario when workload has uncertainties and is running on a heterogeneous multi-core system. The workload variation is represented by the selection of conditional branches which activate or deactivate a set of instructions belonging to a task. We employ both task scheduling and DVFS techniques for energy optimization. Our scheduling algorithm considers the statistical information of the workload to minimize the mean power consumption of the application while satisfying a hard deadline constraint. The proposed DVFS algorithm has pseudo linear complexity and achieves comparable energy reduction as the solutions found by mathematical programming. Due to its capability of slack reclaiming, our DVFS technique is less sensitive to small change in hardware or workload and works more robustly than other techniques without slack reclaiming

Syracuse University Research Facility and Collaborative Environment

Efficient Estimation of Reliability Metrics for Circuits in Deca-Nanometer Nodes

Author: Noltsis Michail A.
Νόλτσης Μιχαήλ Α.
Publication venue
Publication date: 11/12/2014
Field of study

51 σ.Καθώς η τεχνολογία οδηγεί στη κατασκευή τρανζίστορ ολοένα και μικρότερων διαστάσεων, έχουν εμφανιστεί αρκετά φαινόμενα που επηρεάζουν την αξιοπιστία των ολοκληρωμένων κυκλωμάτων. Ένα από αυτά τα φαινόμενα ονομάζεται "Bias Temperature Instability", αποτελεί σημαντικό κίνδυνο για την αξιοπιστία των ολοκληρωμένων κυκλωμάτων και έχει παρατηρηθεί εδώ και πάνω από 30 χρόνια. Το πρώτο μοντέλο που προσπάθησε να εξηγήσει αυτό το φαινόμενο εμφανίστηκε πριν από 30 περίπου χρόνια, βασίστηκε στη διάχυση υδρογόνου και ως εκ τούτου ονομάστηκε "Reaction-Diffusion model". Πριν από μερικά χρόνια δημιουργήθηκε ένα νέο ατομιστικό μοντέλο το οποίο βασίζεται κυρίως στην εμφάνιση ελαττωμάτων στο διηλεκτρικό μεταξύ της πύλης και του καναλιού των FET τρανζίστορ. Μελετώντας κανείς τη βιβλιογραφία που αφορά στο ατομιστικό αυτό μοντέλο, μπορεί να συναντήσει εργαλεία που προσομοιώνουν με ακρίβεια το μοντέλο αλλά δυστυχώς απαιτούν αρκετό χρόνο για να εκτελεστούν, κάτι το οποίο τα καθιστά απαγορευτικά για εκτενή χρήση. Παράλληλα, υπάρχουν εργαλεία βασισμένα στο μοντέλο της διάχυσης τα οποία βέβαια αδυνατούν να παράξουν σωστά και λεπτομερή αποτελέσματα, κυρίως σε τεχνολογίες μικρών διαστάσεων. Η παρούσα λοιπόν διπλωματική εργασία παρουσιάζει τα αποτελέσματα ενός νέου και καινοτόμου εργαλείου το οποίο βασίζεται στο ατομιστικό μοντέλο, ωστόσο προσομοιώνει αποδοτικά αλλά και με ακρίβεια το φαινόμενο της γήρανσης. Ένα αντιπροσωπευτικό μονοπάτι στατικής μνήμης (SRAM) θα χρησιμοποιηθεί ως παράδειγμα της λειτουργίας του μοντέλου ενώ παράλληλα θα υπολογισθούν, με βάση τα αποτελέσματα των προσομοιώσεων αυτών, μετρικές, σημαντικές για το χαρακτηρισμό της απόδοσης και αξιοπιστίας του κυκλώματος, ενώ παράλληλα θα μελετηθούν λεπτομερώς και οι σχέσεις που τις συνδέουν.In modern technologies of integrated circuits (IC) and with the downscaling of device dimensions, various degradation modes constitute major reliability concerns. Bias Temperature Instability (BTI) is a representative example, posing as a significant reliability threat in Field-Effect Transistor (FET) technologies and has been known for more than 30 years. At first, the model that tried to explain this phenomenon was based on the Reaction-Diffusion (RD) theory and was developed nearly 30 years ago. Recently, an atomistic model has been proposed, that enables the modeling of BTI in modern technologies. By observing the amount of software designed to simulate the BTI degradation, tools can be found that are based on the atomistic theory but are computationally prohibitive when it comes to simulating complex circuits consisting of a large number of devices. Tools based on the RD model are unable to accurately capture the BTI-induced degradation, especially in devices with small dimensions. The current thesis is appropriately positioned since it discusses a novel simulation framework that is efficient yet highly accurate. A subset of an embedded Static Random Access Memory (SRAM) is used for verification purposes. The estimation of the functional yield of the circuit over three years of operation will be examined as well as other reliability metrics, such as defects per million (DPM), mean time to failure (MTTF) and failures in time (FIT rate). Finally, the interplay between these metrics is discussed and efficient computation methods are proposed for each one.Μιχαήλ Α. Νόλτση

DSpace at NTUA