Search CORE

6 research outputs found

High performance and energy-efficient instruction cache design and optimisation for ultra-low-power multi-core clusters

Author: Chen Jie <1991>
Publication venue: Alma Mater Studiorum - Università di Bologna
Publication date: 12/07/2022
Field of study

High Energy efficiency and high performance are the key regiments for Internet of Things (IoT) end-nodes. Exploiting cluster of multiple programmable processors has recently emerged as a suitable solution to address this challenge. However, one of the main bottlenecks for multi-core architectures is the instruction cache. While private caches fall into data replication and wasting area, fully shared caches lack scalability and form a bottleneck for the operating frequency. Hence we propose a hybrid solution where a larger shared cache (L1.5) is shared by multiple cores connected through a low-latency interconnect to small private caches (L1). However, it is still limited by large capacity miss with a small L1. Thus, we propose a sequential prefetch from L1 to L1.5 to improve the performance with little area overhead. Moreover, to cut the critical path for better timing, we optimized the core instruction fetch stage with non-blocking transfer by adopting a 4 x 32-bit ring buffer FIFO and adding a pipeline for the conditional branch. We present a detailed comparison of different instruction cache architectures' performance and energy efficiency recently proposed for Parallel Ultra-Low-Power clusters. On average, when executing a set of real-life IoT applications, our two-level cache improves the performance by up to 20% and loses 7% energy efficiency with respect to the private cache. Compared to a shared cache system, it improves performance by up to 17% and keeps the same energy efficiency. In the end, up to 20% timing (maximum frequency) improvement and software control enable the two-level instruction cache with prefetch adapt to various battery-powered usage cases to balance high performance and energy efficiency

AMS Tesi di Dottorato

Low energy digital circuit design using sub-threshold operation

Author: Calhoun Benton Highsmith, 1978-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2006
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, February 2006.Includes bibliographical references (p. 189-202).Scaling of process technologies to deep sub-micron dimensions has made power management a significant concern for circuit designers. For emerging low power applications such as distributed micro-sensor networks or medical applications, low energy operation is the primary concern instead of speed, with the eventual goal of harvesting energy from the environment. Sub-threshold operation offers a promising solution for ultra-low-energy applications because it often achieves the minimum energy per operation. While initial explorations into sub-threshold circuits demonstrate its promise, sub-threshold circuit design remains in its infancy. This thesis makes several contributions that make sub-threshold design more accessible to circuit designers. First, a model for energy consumption in sub-threshold provides an analytical solution for the optimum VDD to minimize energy. Fitting this model to a generic circuit allows easy estimation of the impact of processing and environmental parameters on the minimum energy point. Second, analysis of device sizing for sub-threshold circuits shows the trade-offs between sizing for minimum energy and for minimum voltage operation.(cont.) A programmable FIR filter test chip fabricated in 0.18pum bulk CMOS provides measurements to confirm the model and the sizing analysis. Third, a low-overhead method for integrating sub-threshold operation with high performance applications extends dynamic voltage scaling across orders of magnitude of frequency and provides energy scalability down to the minimum energy point. A 90nm bulk CMOS test chip confirms the range of operation for ultra-dynamic voltage scaling. Finally, sub-threshold operation is extended to memories. Analysis of traditional SRAM bitcells and architectures leads to development of a new bitcell for robust sub-threshold SRAM operation. The sub-threshold SRAM is analyzed experimentally in a 65nm bulk CMOS test chip.by Benton H. Calhoun.Ph.D

DSpace@MIT

Ultra-low-power SRAM design in high variability advanced CMOS

Author: Verma Naveen
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2009
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Cataloged from PDF version of thesis.Includes bibliographical references (p. 163-181).Embedded SRAMs are a critical component in modern digital systems, and their role is preferentially increasing. As a result, SRAMs strongly impact the overall power, performance, and area, and, in order to manage these severely constrained trade-offs, they must be specially designed for target applications. Highly energy-constrained systems (e.g. implantable biomedical devices, multimedia handsets, etc.) are an important class of applications driving ultra-low-power SRAMs. This thesis analyzes the energy of an SRAM sub-array. Since supply- and threshold-voltage have a strong effect, targets for these are established in order to optimize energy. Despite the heavy emphasis on leakage-energy, analysis of a high-density 256x256 sub-array in 45nm LP CMOS points to two necessary optimizations: (1) aggressive supply-voltage reduction (in addition to Vt elevation), and (2) performance enhancement. Important SRAM metrics, including read/write/hold-margin and read-current, are also investigated to identify trade-offs of these optimizations. Based on the need to lower supply-voltage, a 0.35V 256kb SRAM is demonstrated in 65nm LP CMOS. It uses an 8T bit-cell with peripheral circuit-assists to improve write-margin and bit-line leakage. Additionally, redundancy, to manage the increasing impact of variability in the periphery, is proposed to improve the area-offset trade-off of sense-amplifiers, demonstrating promise for highly advanced technology nodes. Based on the need to improve performance, which is limited by density constraints, a 64kb SRAM, using an offset-compensating sense-amplifier, is demonstrated in 45nm LP CMOS with high-density 0.25[mu]m2 bit-cells.(cont.) The sense-amplifier is regenerative, but non -strobed, overcoming timing uncertainties limiting performance, and it is single-ended, for compatibility with 8T cells. Compared to a conventional strobed sense-amplifier, it achieves 34% improvement in worst-case access-time and 4x improvement in the standard deviation of the access-time.by Naveen Verma.Ph.D

DSpace@MIT

A 128 kb 7T SRAM Using a Single-Cycle Boosting Mechanism in 28 nm FD–SOI

Author: Andersson Oskar
Cathelin Andreia
Ciampolini Lorenzo
Mohammadi Babak
Nguyen Joseph
Rodrigues Joachim
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

A 128kb ultra-low voltage SRAM, based on a leakage optimized single-WELL 7T bitcell in 28 nm FD–SOI technology is presented. An ideal power management scenario in a single supply system is achieved by permanently keeping the storage elements in the vicinity of the retention voltage. Performance and reliability is regained by boosting the voltage on critical nodes. The cost of voltage boost generation unit is minimized by 66 low-power and area efficient on-chip chargepumps, i.e., 64 for boosting the voltages on write-bitlines and 2 for the wordlines. The charge pump energy overhead is reduced by introducing a new boost paradigm with an on-demand activation mechanism that generates the required boost level in a single clock cycle. A sense amplifier-less read architecture enables a reliable and high performance read operation. Measurements identify several meritorious metrics. The minimum read energy is identified as 8.4fJ/bit-access, achieved for 90 MHz operation at0.3 V. Furthermore, the minimum operating voltage is measuredas 240 mV, and data is retained in ultra-low voltage regime,ranging down to 0.2V. The bitcell area, implemented usingstandard design rules, is 0.261µm2. The entire memory, includingthe digital test circuitry, occupies 0.161 mm2 of chip area

Lund University Publications

A 128 kb 7T SRAM Using a Single-Cycle Boosting Mechanism in 28-nm FD–SOI

Author: Andreia Cathelin
Babak Mohammadi
Joachim Neves Rodrigues
Joseph Nguyen
Lorenzo Ciampolini
Oskar Andersson
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

CMOS SPAD-based image sensor for single photon counting and time of flight imaging

Author: Dutton Neale Arthur William
Publication venue: The University of Edinburgh
Publication date: 27/06/2016
Field of study

The facility to capture the arrival of a single photon, is the fundamental limit to the detection of quantised electromagnetic radiation. An image sensor capable of capturing a picture with this ultimate optical and temporal precision is the pinnacle of photo-sensing. The creation of high spatial resolution, single photon sensitive, and time-resolved image sensors in complementary metal oxide semiconductor (CMOS) technology offers numerous benefits in a wide field of applications. These CMOS devices will be suitable to replace high sensitivity charge-coupled device (CCD) technology (electron-multiplied or electron bombarded) with significantly lower cost and comparable performance in low light or high speed scenarios. For example, with temporal resolution in the order of nano and picoseconds, detailed three-dimensional (3D) pictures can be formed by measuring the time of flight (TOF) of a light pulse. High frame rate imaging of single photons can yield new capabilities in super-resolution microscopy. Also, the imaging of quantum effects such as the entanglement of photons may be realised. The goal of this research project is the development of such an image sensor by exploiting single photon avalanche diodes (SPAD) in advanced imaging-specific 130nm front side illuminated (FSI) CMOS technology. SPADs have three key combined advantages over other imaging technologies: single photon sensitivity, picosecond temporal resolution and the facility to be integrated in standard CMOS technology. Analogue techniques are employed to create an efficient and compact imager that is scalable to mega-pixel arrays. A SPAD-based image sensor is described with 320 by 240 pixels at a pitch of 8μm and an optical efficiency or fill-factor of 26.8%. Each pixel comprises a SPAD with a hybrid analogue counting and memory circuit that makes novel use of a low-power charge transfer amplifier. Global shutter single photon counting images are captured. These exhibit photon shot noise limited statistics with unprecedented low input-referred noise at an equivalent of 0.06 electrons. The CMOS image sensor (CIS) trends of shrinking pixels, increasing array sizes, decreasing read noise, fast readout and oversampled image formation are projected towards the formation of binary single photon imagers or quanta image sensors (QIS). In a binary digital image capture mode, the image sensor offers a look-ahead to the properties and performance of future QISs with 20,000 binary frames per second readout with a bit error rate of 1.7 x 10-3. The bit density, or cumulative binary intensity, against exposure performance of this image sensor is in the shape of the famous Hurter and Driffield densitometry curves of photographic film. Oversampled time-gated binary image capture is demonstrated, capturing 3D TOF images with 3.8cm precision in a 60cm range

Edinburgh Research Archive