Search CORE

9 research outputs found

Voltage stacking for near/sub-threshold operation

Author: Singh Kamlesh
Publication venue: Eindhoven University of Technology
Publication date: 21/10/2021
Field of study

A fully integrated 2:1 self-oscillating switched-capacitor DC-DC converter in 28 nm UTBB FD-SOI

Author: Jani Mäkipää
Lauri Koskinen
Markus Hiienkari
Matthew Turnquist
Publication venue: 'MDPI AG'
Publication date: 27/10/2022
Field of study

The importance of energy-constrained processors continues to grow especially for ultra-portable sensor-based platforms for the Internet-of-Things (IoT). Processors for these IoT applications primarily operate at near-threshold (NT) voltages and have multiple power modes. Achieving high conversion efficiency within the DC–DC converter that supplies these processors is critical since energy consumption of the DC–DC/processor system is proportional to the DC–DC converter efficiency. The DC–DC converter must maintain high efficiency over a large load range generated from the multiple power modes of the processor. This paper presents a fully integrated step-down self-oscillating switched-capacitor DC–DC converter that is capable of meeting these challenges. The area of the converter is 0.0104 mm2 and is designed in 28 nm ultra-thin body and buried oxide fully-depleted SOI (UTBB FD-SOI). Back-gate biasing within FD-SOI is utilized to increase the load power range of the converter. With an input of 1 V and output of 460 mV, measurements of the converter show a minimum efficiency of 75% for 79 nW to 200 µW loads. Measurements with an off-chip NT processor load show efficiency up to 86%. The converter’s large load power range and high efficiency make it an excellent fit for energy-constrained processors.</p

UTUPub

Mr.Wolf: An Energy-Precision Scalable Parallel Ultra Low Power SoC for IoT Edge Processing

Author: Benini L.
Loi I.
Pullini A.
Rossi D.
Tagliavini G.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

This paper presents Mr. Wolf, a parallel ultra-low power (PULP) system on chip (SoC) featuring a hierarchical architecture with a small (12 kgates) microcontroller (MCU) class RISC-V core augmented with an autonomous IO subsystem for efficient data transfer from a wide set of peripherals. The small core can offload compute-intensive kernels to an eight-core floating-point capable of processing engine available on demand. The proposed SoC, implemented in a 40-nm LP CMOS technology, features a 108-mu W fully retentive memory (512 kB). The IO subsystem is capable of transferring up to 1.6 Gbit/s from external devices to the memory in less than 2.5 mW. The eight-core compute cluster achieves a peak performance of 850 million of 32-bit integer multiply and accumulate per second (MMAC/s) and 500 million of 32-bit floating-point multiply and accumulate per second (MFMAC/s) -1 GFlop/s-with an energy efficiency up to 15 MMAC/s/mW and 9 MFMAC/s/mW. These building blocks are supported by aggressive on-chip power conversion and management, enabling energy-proportional heterogeneous computing for always-on IoT end nodes improving performance by several orders of magnitude with respect to traditional single-core MCUs within a power envelope of 153 mW. We demonstrated the capabilities of the proposed SoC on a wide set of near-sensor processing kernels showing that Mr. Wolf can deliver performance up to 16.4 GOp/s with energy efficiency up to 274 MOp/s/mW on real-life applications, paving the way for always-on data analytics on high-bandwidth sensors at the edge of the Internet of Things

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Vector processing-aware advanced clock-gating techniques for low-power fused multiply-add

Author: Cristal Kestelman Adrián
Palomar Pérez Óscar
Ratkovic Ivan
Stanic Milan
Unsal Osman Sabri
Valero Cortés Mateo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

The need for power efficiency is driving a rethink of design decisions in processor architectures. While vector processors succeeded in the high-performance market in the past, they need a retailoring for the mobile market that they are entering now. Floating-point (FP) fused multiply-add (FMA), being a functional unit with high power consumption, deserves special attention. Although clock gating is a well-known method to reduce switching power in synchronous designs, there are unexplored opportunities for its application to vector processors, especially when considering active operating mode. In this research, we comprehensively identify, propose, and evaluate the most suitable clock-gating techniques for vector FMA units (VFUs). These techniques ensure power savings without jeopardizing the timing. We evaluate the proposed techniques using both synthetic and “real-world” application-based benchmarking. Using vector masking and vector multilane-aware clock gating, we report power reductions of up to 52%, assuming active VFU operating at the peak performance. Among other findings, we observe that vector instruction-based clock-gating techniques achieve power savings for all vector FP instructions. Finally, when evaluating all techniques together, using “real-world” benchmarking, the power reductions are up to 80%. Additionally, in accordance with processor design trends, we perform this research in a fully parameterizable and automated fashion.The research leading to these results has received funding from the RoMoL ERC Advanced Grant GA 321253 and is supported in part by the European Union (FEDER funds) under contract TTIN2015-65316-P. The work of I. Ratkovic was supported by a FPU research grant from the Spanish MECD.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

A RISC-V vector processor with tightly-integrated switched-capacitor DC-DC converters in 28nm FDSOI

Author: Alberto Puggelli
Andrew Waterman
Ben Keller
Brian Richards
Brian Zimmer
Elad Alon
Hanh-phuc Le
Jaehwa Kwak
Milovan Blagojevic
Nicholas Sutardja
Pi-feng Chiu
Po-hung Chen
Rimas Avizienis
Ruzica Jevtic
Stevo Bailey
Yunsup Lee
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 04/12/2015
Field of study

This work demonstrates a RISC-V vector microproces-sor implemented in 28nm FDSOI with fully-integrated non-interleaved switched-capacitor DCDC (SC-DCDC) converters and adaptive clocking that generates four on-chip voltages between 0.5V and 1V using only 1.0V core and 1.8V IO voltage inputs. The design pushes the capabilities of dynamic voltage scaling by enabling fast transitions (20ns), simple packaging (no off-chip passives), low area overhead (16%), high conversion efficiency (80-86%), and high energy effi-ciency (26.2 DP GFLOPS/W) for mobile devices

CiteSeerX

Crossref

Microarchitectures pour la sauvegarde incrémentale, robuste et efficace dans les systèmes à alimentation intermittente

Author: Pala Davide
Publication venue: HAL CCSD
Publication date: 10/11/2022
Field of study

Embedded devices powered with environmental energy harvesting, have to sustain computation while experiencing unexpected power failures.To preserve the progress across the power interruptions, Non-Volatile Memories (NVMs) are used to quickly save the state. This dissertation first presents an overview and comparison of different NVM technologies, based on different surveys from the literature. The second contribution we propose is a dedicated backup controller, called Freezer, that implements an on-demand incremental backup scheme. This can make the size of the backup 87.7% smaller then a full-memory backup strategy from the state of the art (SoA). Our third contribution addresses the problem of corruption of the state, due to interruptions during the backup process. Two algorithms are presented, that improve on the Freezer incremental backup process, making it robust to errors, by always guaranteeing the existence of a correct state, that can be restored in case of backup errors. These two algorithms can consume 23% less energy than the usual double-buffering technique used in the SoA. The fourth contribution, addresses the scalability of our proposed approach. Combining Freezer with Bloom filters, we introduce a backup scheme that can cover much larger address spaces, while achieving a backup size which is half the size of the regular Freezer approach.Les appareils embarqués alimentés par la récupération d'énergie environnementale doivent maintenir le calcul tout en subissant des pannes de courant inattendues. Pour préserver la progression à travers les interruptions de courant, des mémoires non volatiles (NVM) sont utilisées pour enregistrer rapidement l'état. Cette thèse présente d'abord une vue d'ensemble et une comparaison des différentes technologies NVM, basées sur différentes enquêtes de la littérature. La deuxième contribution que nous proposons est un contrôleur de sauvegarde dédié, appelé Freezer, qui implémente un schéma de sauvegarde incrémentale à la demande. Cela peut réduire la taille de la sauvegarde de 87,7% à celle d'une stratégie de sauvegarde à mémoire complète de l'état de l'art. Notre troisième contribution aborde le problème de la corruption de l'état, due aux interruptions pendant le processus de sauvegarde. Deux algorithmes sont présentés, qui améliorent le processus de sauvegarde incrémentale de Freezer, le rendant robuste aux erreurs, en garantissant toujours l'existence d'un état correct, qui peut être restauré en cas d'erreurs de sauvegarde. Ces deux algorithmes peuvent consommer

23\%

d'énergie en moins que la technique de ``double-buffering'' utilisée dans l'état de l'art. La quatrième contribution porte sur l'évolutivité de notre approche proposée. En combinant Freezer avec des filtres Bloom, nous introduisons un schéma de sauvegarde qui peut couvrir des espaces d'adressage beaucoup plus grands, tout en obtenant une taille de sauvegarde qui est la moitié de la taille de l'approche Freezer habituelle

INRIA a CCSD electronic archive server

Deep in-memory computing

Author: Kang Mingu
Publication venue
Publication date: 01/08/2017
Field of study

There is much interest in embedding data analytics into sensor-rich platforms such as wearables, biomedical devices, autonomous vehicles, robots, and Internet-of-Things to provide these with decision-making capabilities. Such platforms often need to implement machine learning (ML) algorithms under stringent energy constraints with battery-powered electronics. Especially, energy consumption in memory subsystems dominates such a system's energy efficiency. In addition, the memory access latency is a major bottleneck for overall system throughput. To address these issues in memory-intensive inference applications, this dissertation proposes deep in-memory accelerator (DIMA), which deeply embeds computation into the memory array, employing two key principles: (1) accessing and processing multiple rows of memory array at a time, and (2) embedding pitch-matched low-swing analog processing at the periphery of bitcell array. The signal-to-noise ratio (SNR) is budgeted by employing low-swing operations in both memory read and processing to exploit the application level's error immunity for aggressive energy efficiency. This dissertation first describes the system rationale underlying the DIMA's processing stages by identifying the common functional flow across a diverse set of inference algorithms. Based on the analysis, this dissertation presents a multi-functional DIMA to support four algorithms: support vector machine (SVM), template matching (TM), k-nearest neighbor (k-NN), and matched filter. The circuit and architectural level design techniques and guidelines are provided to address the challenges in achieving multi-functionality. A prototype integrated circuit (IC) of a multi-functional DIMA was fabricated with a 16 KB SRAM array in a 65 nm CMOS process. Measurement results show up to 5.6X and 5.8X energy and delay reductions leading to 31X energy delay product (EDP) reduction with negligible (<1%) accuracy degradation as compared to the conventional 8-b fixed-point digital implementation optimally designed for each algorithm. Then, DIMA also has been applied to more complex algorithms: (1) convolutional neural network (CNN), (2) sparse distributed memory (SDM), and (3) random forest (RF). System-level simulations of CNN using circuit behavioral models in a 45 nm SOI CMOS demonstrate that high probability (>0.99) of handwritten digit recognition can be achieved using the MNIST database, along with a 24.5X reduced EDP, a 5.0X reduced energy, and a 4.9X higher throughput as compared to the conventional system. The DIMA-based SDM architecture also achieves up to 25X and 12X delay and energy reductions, respectively, over conventional SDM with negligible accuracy degradation (within 0.4%) for 16X16 binary-pixel image classification. A DIMA-based RF was realized as a prototype IC with a 16 KB SRAM array in a 65 nm process. To the best of our knowledge, this is the first IC realization of an RF algorithm. The measurement results show that the prototype achieves a 6.8X lower EDP compared to a conventional design at the same accuracy (94%) for an eight-class traffic sign recognition problem. The multi-functional DIMA and extension to other algorithms naturally motivated us to consider a programmable DIMA instruction set architecture (ISA), namely MATI. This dissertation explores a synergistic combination of the instruction set, architecture and circuit design to achieve the programmability without losing DIMA's energy and throughput benefits. Employing silicon-validated energy, delay and behavioral models of deep in-memory components, we demonstrate that MATI is able to realize nine ML benchmarks while incurring negligible overhead in energy (< 0.1%), and area (4.5%), and in throughput, over a fixed four-function DIMA. In this process, MATI is able to simultaneously achieve enhancements in both energy (2.5X to 5.5X) and throughput (1.4X to 3.4X) for an overall EDP improvement of up to 12.6X over fixed-function digital architectures

Illinois Digital Environment for Access to Learning and Scholarship Repository

Understanding Quantum Technologies 2022

Author: Ezratty Olivier
Publication venue
Publication date: 27/10/2022
Field of study

Understanding Quantum Technologies 2022 is a creative-commons ebook that provides a unique 360 degrees overview of quantum technologies from science and technology to geopolitical and societal issues. It covers quantum physics history, quantum physics 101, gate-based quantum computing, quantum computing engineering (including quantum error corrections and quantum computing energetics), quantum computing hardware (all qubit types, including quantum annealing and quantum simulation paradigms, history, science, research, implementation and vendors), quantum enabling technologies (cryogenics, control electronics, photonics, components fabs, raw materials), quantum computing algorithms, software development tools and use cases, unconventional computing (potential alternatives to quantum and classical computing), quantum telecommunications and cryptography, quantum sensing, quantum technologies around the world, quantum technologies societal impact and even quantum fake sciences. The main audience are computer science engineers, developers and IT specialists as well as quantum scientists and students who want to acquire a global view of how quantum technologies work, and particularly quantum computing. This version is an extensive update to the 2021 edition published in October 2021.Comment: 1132 pages, 920 figures, Letter forma

arXiv.org e-Print Archive