Search CORE

348 research outputs found

MoRS: An approximate fault modelling framework for reduced-voltage SRAMs

Author: Cristal Kestelman Adrián
Ergin Oguz
Salami Behzad
Unsal Osman Sabri
Yuksel Ismail Emir
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2022
Field of study

On-chip memory (usually based on Static RAMs-SRAMs) are crucial components for various computing devices including heterogeneous devices, e.g, GPUs, FPGAs, ASICs to achieve high performance. Modern workloads such as Deep Neural Networks (DNNs) running on these heterogeneous fabrics are highly dependent on the on-chip memory architecture for efficient acceleration. Hence, improving the energy-efficiency of such memories directly leads to an efficient system. One of the common methods to save energy is undervolting i.e., supply voltage underscaling below the nominal level. Such systems can be safely undervolted without incurring faults down to a certain voltage limit. This safe range is also called voltage guardband. However, reducing voltage below the guardband level without decreasing frequency causes timing-based faults. In this paper, we propose MoRS, a framework that generates the first approximate undervolting fault model using real faults extracted from experimental undervolting studies on SRAMs to build the model. We inject the faults generated by MoRS into the on-chip memory of the DNN accelerator to evaluate the resilience of the system under the test. MoRS has the advantage of simplicity without any need for high-time overhead experiments while being accurate enough in comparison to a fully randomly-generated fault injection approach. We evaluate our experiment in popular DNN workloads by mapping weights to SRAMs and measure the accuracy difference between the output of the MoRS and the real data. Our results show that the maximum difference between real fault data and the output fault model of MoRS is 6.21%, whereas the maximum difference between real data and random fault injection model is 23.2%. In terms of average proximity to the real data, the output of MoRS outperforms the random fault injection approach by 3.21x.This work is partially funded by Open Transprecision Computing (OPRECOM) project, Summer of Code 2020.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Always-On 674uW @ 4GOP/s Error Resilient Binary Neural Networks with Aggressive SRAM Voltage Scaling on a 22nm IoT End-Node

Author: Benini Luca
Conti Francesco
Di Mauro Alfio
Rossi Davide
Schiavone Pasquale Davide
Publication venue
Publication date: 01/01/2020
Field of study

Binary Neural Networks (BNNs) have been shown to be robust to random bit-level noise, making aggressive voltage scaling attractive as a power-saving technique for both logic and SRAMs. In this work, we introduce the first fully programmable IoT end-node system-on-chip (SoC) capable of executing software-defined, hardware-accelerated BNNs at ultra-low voltage. Our SoC exploits a hybrid memory scheme where error-vulnerable SRAMs are complemented by reliable standard-cell memories to safely store critical data under aggressive voltage scaling. On a prototype in 22nm FDX technology, we demonstrate that both the logic and SRAM voltage can be dropped to 0.5Vwithout any accuracy penalty on a BNN trained for the CIFAR-10 dataset, improving energy efficiency by 2.2X w.r.t. nominal conditions. Furthermore, we show that the supply voltage can be dropped to 0.42V (50% of nominal) while keeping more than99% of the nominal accuracy (with a bit error rate ~1/1000). In this operating point, our prototype performs 4Gop/s (15.4Inference/s on the CIFAR-10 dataset) by computing up to 13binary ops per pJ, achieving 22.8 Inference/s/mW while keeping within a peak power envelope of 674uW - low enough to enable always-on operation in ultra-low power smart cameras, long-lifetime environmental sensors, and insect-sized pico-drones.Comment: Submitted to ISICAS2020 journal special issu

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

MoRS: An approximate fault modelling framework for reduced-voltage SRAMs

Author: Cristal Kestelman Adrián
Ergin Oguz
Salami Behzad
Unsal Osman Sabri
Yuksel Ismail Emir
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2022
Field of study

arXiv.org e-Print Archive

UPCommons. Portal del coneixement obert de la UPC

Computing the Similarity Estimate Using Approximate Memory

Author: Ertl Otmar
Liu Shanshan
Lombardi Fabrizio
Niknia Farzad
Reviriego Vasallo Pedro
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2022
Field of study

In many computing applications there is a need to compute the similarity of sets of elements. When the sets have many elements or comparison involves many sets, computing the similarity requires significant computational effort and storage capacity. As in most cases, a reasonably accurate estimate is sufficient, many algorithms for similarity estimation have been proposed during the last decades. Those algorithms compute signatures for the sets and use them to estimate similarity. However, as the number of sets that need to be compared grows, even these similarity estimation algorithms require significant memory with its associated power dissipation. This article for the first time considers the use of approximate memories for similarity estimation. A theoretical analysis and simulation results are provided; initially it is shown that similarity sketches can tolerate large bit error rates and thus, they can benefit from using approximate memories without substantially compromising the accuracy of the similarity estimate. An understanding of the effect of errors in the stored signatures on the similarity estimate is pursued. A scheme to mitigate the impact of errors is presented; the proposed scheme tolerates even larger bit error rates and does not need additional memory. For example, bit error rates of up to 10 4 have less than a 1% impact on the accuracy of the estimate when the memory is unprotected, and larger bit errors rates can be tolerated if the memory is parity protected. These findings can be used for voltage supply scaling and increasing the refresh time in SRAMs and DRAMs. Based on those initial results, an enhanced implementation is further proposed for unprotected memories that further extends the range of tolerated BERs and enables power savings of up to 61.31% for SRAMs. In conclusion, this article shows that the use of approximate memories in sketches for similarity estimation provides significant benefits with a negligible impact on accuracy.This work was supported by ACHILLES project PID2019-104207RB-I00 and Go2Edge network RED2018-102585-T funded by the Spanish Agencia Estatal de Investigación (AEI) 10.13039/501100011033 and by the Madrid Community research project TAPIR-CM under Grant P2018/TCS-4496. The research of S. Liu and F. Lombardi was supported by NSF under Grants CCF-1953961 and 1812467

Universidad Carlos III de Madrid e-Archivo

TuRaN: True Random Number Generation Using Supply Voltage Underscaling in SRAMs

Author: Bostancı F. Nisa
Ergin Oğuz
Ghiasi Nika Mansouri
Mutlu Onur
Olgun Ataberk
Salami Behzad
Tuğrul Yahya Can
Yağlıkçı A. Giray
Yüksel İsmail Emir
Publication venue
Publication date: 20/11/2022
Field of study

Prior works propose SRAM-based TRNGs that extract entropy from SRAM arrays. SRAM arrays are widely used in a majority of specialized or general-purpose chips that perform the computation to store data inside the chip. Thus, SRAM-based TRNGs present a low-cost alternative to dedicated hardware TRNGs. However, existing SRAM-based TRNGs suffer from 1) low TRNG throughput, 2) high energy consumption, 3) high TRNG latency, and 4) the inability to generate true random numbers continuously, which limits the application space of SRAM-based TRNGs. Our goal in this paper is to design an SRAM-based TRNG that overcomes these four key limitations and thus, extends the application space of SRAM-based TRNGs. To this end, we propose TuRaN, a new high-throughput, energy-efficient, and low-latency SRAM-based TRNG that can sustain continuous operation. TuRaN leverages the key observation that accessing SRAM cells results in random access failures when the supply voltage is reduced below the manufacturer-recommended supply voltage. TuRaN generates random numbers at high throughput by repeatedly accessing SRAM cells with reduced supply voltage and post-processing the resulting random faults using the SHA-256 hash function. To demonstrate the feasibility of TuRaN, we conduct SPICE simulations on different process nodes and analyze the potential of access failure for use as an entropy source. We verify and support our simulation results by conducting real-world experiments on two commercial off-the-shelf FPGA boards. We evaluate the quality of the random numbers generated by TuRaN using the widely-adopted NIST standard randomness tests and observe that TuRaN passes all tests. TuRaN generates true random numbers with (i) an average (maximum) throughput of 1.6Gbps (1.812Gbps), (ii) 0.11nJ/bit energy consumption, and (iii) 278.46us latency

arXiv.org e-Print Archive

Repository for Publications and Research Data

Recommended from our members

A RISC-V Vector Processor With Simultaneous-Switching Switched-Capacitor DC-DC Converters in 28 nm FDSOI

Author: Alon E
Asanović K
Avizienis R
Bailey S
Blagojević M
Chen PH
Chiu PF
Flatresse P
Jevtić R
Keller B
Kwak J
Le HP
Lee Y
Nikolić B
Puggelli A
Richards B
Sutardja N
Waterman A
Zimmer B
Publication venue: eScholarship, University of California
Publication date: 01/04/2016
Field of study

This work demonstrates a RISC-V vector microprocessor implemented in 28 nm FDSOI with fully integrated simultaneous-switching switched-capacitor DC-DC (SC DC-DC) converters and adaptive clocking that generates four on-chip voltages between 0.45 and 1 V using only 1.0 V core and 1.8 V IO voltage inputs. The converters achieve high efficiency at the system level by switching simultaneously to avoid charge-sharing losses and by using an adaptive clock to maximize performance for the resulting voltage ripple. Details about the implementation of the DC-DC switches, DC-DC controller, and adaptive clock are provided, and the sources of conversion loss are analyzed based on measured results. This system pushes the capabilities of dynamic voltage scaling by enabling fast transitions (20 ns), simple packaging (no off-chip passives), low area overhead (16%), high conversion efficiency (80%-86%), and high energy efficiency (26.2 DP GFLOPS/W) for mobile devices

eScholarship - University of California

Approximate Computing Survey, Part II: Application-Specific & Architectural Approximation Techniques and Applications

Author: Armeniakos Giorgos
Hanif Muhammad Abdullah
Jiao Xun
Leon Vasileios
Pekmestzi Kiamal
Shafique Muhammad
Soudris Dimitrios
Publication venue
Publication date: 20/07/2023
Field of study

The challenging deployment of compute-intensive applications from domains such Artificial Intelligence (AI) and Digital Signal Processing (DSP), forces the community of computing systems to explore new design approaches. Approximate Computing appears as an emerging solution, allowing to tune the quality of results in the design of a system in order to improve the energy efficiency and/or performance. This radical paradigm shift has attracted interest from both academia and industry, resulting in significant research on approximation techniques and methodologies at different design layers (from system down to integrated circuits). Motivated by the wide appeal of Approximate Computing over the last 10 years, we conduct a two-part survey to cover key aspects (e.g., terminology and applications) and review the state-of-the art approximation techniques from all layers of the traditional computing stack. In Part II of our survey, we classify and present the technical details of application-specific and architectural approximation techniques, which both target the design of resource-efficient processors/accelerators & systems. Moreover, we present a detailed analysis of the application spectrum of Approximate Computing and discuss open challenges and future directions.Comment: Under Review at ACM Computing Survey

arXiv.org e-Print Archive

Study and development of innovative strategies for energy-efficient cross-layer design of digital VLSI systems based on Approximate Computing

Author: STAZI GIULIA
Publication venue
Publication date: 18/02/2020
Field of study

The increasing demand on requirements for high performance and energy efficiency in modern digital systems has led to the research of new design approaches that are able to go beyond the established energy-performance tradeoff. Looking at scientific literature, the Approximate Computing paradigm has been particularly prolific. Many applications in the domain of signal processing, multimedia, computer vision, machine learning are known to be particularly resilient to errors occurring on their input data and during computation, producing outputs that, although degraded, are still largely acceptable from the point of view of quality. The Approximate Computing design paradigm leverages the characteristics of this group of applications to develop circuits, architectures, algorithms that, by relaxing design constraints, perform their computations in an approximate or inexact manner reducing energy consumption. This PhD research aims to explore the design of hardware/software architectures based on Approximate Computing techniques, filling the gap in literature regarding effective applicability and deriving a systematic methodology to characterize its benefits and tradeoffs. The main contributions of this work are: -the introduction of approximate memory management inside the Linux OS, allowing dynamic allocation and de-allocation of approximate memory at user level, as for normal exact memory; - the development of an emulation environment for platforms with approximate memory units, where faults are injected during the simulation based on models that reproduce the effects on memory cells of circuital and architectural techniques for approximate memories; -the implementation and analysis of the impact of approximate memory hardware on real applications: the H.264 video encoder, internally modified to allocate selected data buffers in approximate memory, and signal processing applications (digital filter) using approximate memory for input/output buffers and tap registers; -the development of a fully reconfigurable and combinatorial floating point unit, which can work with reduced precision formats

Archivio della ricerca- Università di Roma La Sapienza