Search CORE

721 research outputs found

Low Power Processor Architectures and Contemporary Techniques for Power Optimization – A Review

Author: Gujarathi Hemal S
McDonald-Maier Klaus D
Qadri Muhammad Yasir
Publication venue: 'Academy Publisher'
Publication date: 01/01/2009
Field of study

The technological evolution has increased the number of transistors for a given die area significantly and increased the switching speed from few MHz to GHz range. Such inversely proportional decline in size and boost in performance consequently demands shrinking of supply voltage and effective power dissipation in chips with millions of transistors. This has triggered substantial amount of research in power reduction techniques into almost every aspect of the chip and particularly the processor cores contained in the chip. This paper presents an overview of techniques for achieving the power efficiency mainly at the processor core level but also visits related domains such as buses and memories. There are various processor parameters and features such as supply voltage, clock frequency, cache and pipelining which can be optimized to reduce the power consumption of the processor. This paper discusses various ways in which these parameters can be optimized. Also, emerging power efficient processor architectures are overviewed and research activities are discussed which should help reader identify how these factors in a processor contribute to power consumption. Some of these concepts have been already established whereas others are still active research areas. © 2009 ACADEMY PUBLISHER

University of Essex Research Repository

CiteSeerX

Crossref

Gate-level timing analysis and waveform evaluation

Author: Li Chaobo
Publication venue: SURFACE at Syracuse University
Publication date: 15/05/2015
Field of study

Static timing analysis (STA) is an integral part of modern VLSI chip design. Table lookup based methods are widely used in current industry due to its fast runtime and mature algorithms. Conventional STA algorithms based on table-lookup methods are developed under many assumptions in timing analysis; however, most of those assumptions, such as that input signals and output signals can be accurately modeled as ramp waveforms, are no longer satisfactory to meet the increasing demand of accuracy for new technologies. In this dissertation, we discuss several crucial issues that conventional STA has not taken into consideration, and propose new methods to handle these issues and show that new methods produce accurate results. In logic circuits, gates may have multiple inputs and signals can arrive at these inputs at different times and with different waveforms. Different arrival times and waveforms of signals can cause very different responses. However, multiple-input transition effects are totally overlooked by current STA tools. Using a conventional single-input transition model when multiple-input transition happens can cause significant estimation errors in timing analysis. Previous works on this issue focus on developing a complicated gate model to simulate the behavior of logic gates. These methods have high computational cost and have to make significant changes to the prevailing STA tools, and are thus not feasible in practice. This dissertation proposes a simplified gate model, uses transistor connection structures to capture the behavior of multiple-input transitions and requires no change to the current STA tools. Another issue with table lookup based methods is that the load of each gate in technology libraries is modeled as a single lumped capacitor. But in the real circuit, the Abstract 2 gate connects to its subsequent gates via metal wires. As the feature size of integrated circuit scales down, the interconnection cannot be seen as a simple capacitor since the resistive shielding effect will largely affect the equivalent capacitance seen from the gate. As the interconnection has numerous structures, tabulating the timing data for various interconnection structures is not feasible. In this dissertation, by using the concept of equivalent admittance, we reduce an arbitrary interconnection structure into an equivalent π-model RC circuit. Many previous works have mapped the π-model to an effective capacitor, which makes the table lookup based methods useful again. However, a capacitor cannot be equivalent to a π-model circuit, and will thus result in significant inaccuracy in waveform evaluation. In order to obtain an accurate waveform at gate output, a piecewise waveform evaluation method is proposed in this dissertation. Each part of the piecewise waveform is evaluated according to the gate characteristic and load structures. Another contribution of this dissertation research is a proposed equivalent waveform search method. The signal waveforms can be very complicated in the real circuits because of noises, race hazards, etc. The conventional STA only uses one attribute (i.e., transition time) to describe the waveform shape which can cause significant estimation errors. Our approach is to develop heuristic search functions to find equivalent ramps to approximate input waveforms. Here the transition time of a final ramp can be completely different from that of the original waveform, but we can get higher accuracy on output arrival time and transition time. All of the methods mentioned in this dissertation require no changes to the prevailing STA tools, and have been verified across different process technologies

Syracuse University Research Facility and Collaborative Environment

Power Reductions with Energy Recovery Using Resonant Topologies

Author: Bezzam Ignatius S.A.
Publication venue: Scholar Commons
Publication date: 05/05/2015
Field of study

The problem of power densities in system-on-chips (SoCs) and processors has become more exacerbated recently, resulting in high cooling costs and reliability issues. One of the largest components of power consumption is the low skew clock distribution network (CDN), driving large load capacitance. This can consume as much as 70% of the total dynamic power that is lost as heat, needing elaborate sensing and cooling mechanisms. To mitigate this, resonant clocking has been utilized in several applications over the past decade. An improved energy recovering reconfigurable generalized series resonance (GSR) solution with all the critical support circuitry is developed in this work. This LC resonant clock driver is shown to save about 50% driver power (\u3e40% overall), on a 22nm process node and has 50% less skew than a non-resonant driver at 2GHz. It can operate down to 0.2GHz to support other energy savings techniques like dynamic voltage and frequency scaling (DVFS). As an example, GSR can be configured for the simpler pulse series resonance (PSR) operation to enable further power saving for double data rate (DDR) applications, by using de-skewing latches instead of flip-flop banks. A PSR based subsystem for 40% savings in clocking power with 40% driver active area reduction xii is demonstrated. This new resonant driver generates tracking pulses at each transition of clock for dual edge operation across DVFS. PSR clocking is designed to drive explicit-pulsed latches with negative setup time. Simulations using 45nm IBM/PTM device and interconnect technology models, clocking 1024 flip-flops show the reductions, compared to non-resonant clocking. DVFS range from 2GHz/1.3V to 200MHz/0.5V is obtained. The PSR frequency is set \u3e3× the clock rate, needing only 1/10th the inductance of prior-art LC resonance schemes. The skew reductions are achieved without needing to increase the interconnect widths owing to negative set-up times. Applications in data circuits are shown as well with a 90nm example. Parallel resonant and split-driver non-resonant configurations as well are derived from GSR. Tradeoffs in timing performance versus power, based on theoretical analysis, are compared for the first time and verified. This enables synthesis of an optimal topology for a given application from the GSR

Scholar Commons - Santa Clara University

Glitch-Optimized Circuit Blocks for Low-Power High-Performance Booth Multipliers

Author: Anuradha Chathuranga Ranasinghe
Gerez Sabih H.
Publication venue
Publication date: 01/09/2020
Field of study

University of Twente Research Information

Statistical circuit simulations - from ‘atomistic’ compact models to statistical standard cell characterisation

Author: Kamsani Noor 'Ain
Publication venue
Publication date: 01/01/2011
Field of study

This thesis describes the development and application of statistical circuit simulation methodologies to analyse digital circuits subject to intrinsic parameter fluctuations. The specific nature of intrinsic parameter fluctuations are discussed, and we explain the crucial importance to the semiconductor industry of developing design tools which accurately account for their effects. Current work in the area is reviewed, and three important factors are made clear: any statistical circuit simulation methodology must be based on physically correct, predictive models of device variability; the statistical compact models describing device operation must be characterised for accurate transient analysis of circuits; analysis must be carried out on realistic circuit components. Improving on previous efforts in the field, we posit a statistical circuit simulation methodology which accounts for all three of these factors. The established 3-D Glasgow atomistic simulator is employed to predict electrical characteristics for devices aimed at digital circuit applications, with gate lengths from 35 nm to 13 nm. Using these electrical characteristics, extraction of BSIM4 compact models is carried out and their accuracy in performing transient analysis using SPICE is validated against well characterised mixed-mode TCAD simulation results for 35 nm devices. Static d.c. simulations are performed to test the methodology, and a useful analytic model to predict hard logic fault limitations on CMOS supply voltage scaling is derived as part of this work. Using our toolset, the effect of statistical variability introduced by random discrete dopants on the dynamic behaviour of inverters is studied in detail. As devices scaled, dynamic noise margin variation of an inverter is increased and higher output load or input slew rate improves the noise margins and its variation. Intrinsic delay variation based on CV/I delay metric is also compared using ION and IEFF definitions where the best estimate is obtained when considering ION and input transition time variations. Critical delay distribution of a path is also investigated where it is shown non-Gaussian. Finally, the impact of the cell input slew rate definition on the accuracy of the inverter cell timing characterisation in NLDM format is investigated

Glasgow Theses Service

OpenGrey Repository

Recommended from our members

Nasics: A `Fabric-Centric\u27 Approach Towards Integrated Nanosystems

Author: Narayanan Pritish
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/02/2013
Field of study

This dissertation addresses the fundamental problem of how to build computing systems for the nanoscale. With CMOS reaching fundamental limits, emerging nanomaterials such as semiconductor nanowires, carbon nanotubes, graphene etc. have been proposed as promising alternatives. However, nanoelectronics research has largely focused on a `device-first\u27 mindset without adequately addressing system-level capabilities, challenges for integration and scalable assembly. In this dissertation, we propose to develop an integrated nano-fabric, (broadly defined as nanostructures/devices in conjunction with paradigms for assembly, inter-connection and circuit styles), as opposed to approaches that focus on MOSFET replacement devices as the ultimate goal. In the `fabric-centric\u27 mindset, design choices at individual levels are made compatible with the fabric as a whole and minimize challenges for nanomanufacturing while achieving system-level benefits vs. scaled CMOS. We present semiconductor nanowire based nano-fabrics incorporating these fabric-centric principles called NASICs and N3ASICs and discuss how we have taken them from initial design to experimental prototype. Manufacturing challenges are mitigated through careful design choices at multiple levels of abstraction. Regular fabrics with limited customization mitigate overlay alignment requirements. Cross-nanowire FET devices and interconnect are assembled together as part of the uniform regular fabric without the need for arbitrary fine-grain interconnection at the nanoscale, routing or device sizing. Unconventional circuit styles are devised that are compatible with regular fabric layouts and eliminate the requirement for using complementary devices. Core fabric concepts are introduced and validated. Detailed analyses on device-circuit co-design and optimization, cascading, noise and parameter variation are presented. Benchmarking of nanowire processor designs vs. equivalent scaled 16nm CMOS shows up to 22X area, 30X power benefits at comparable performance, and with overlay precision that is achievable with present-day technology. Building on the extensive manufacturing-friendly fabric framework, we present recent experimental efforts and key milestones that have been attained towards realizing a proof-of-concept prototype at dimensions of 30nm and below

ScholarWorks@UMass Amherst

Power Efficient Data-Aware SRAM Cell for SRAM-Based FPGA Architecture

Author: Singh Ajay Kumar
Publication venue: 'IntechOpen'
Publication date: 31/05/2017
Field of study

The design of low-power SRAM cell becomes a necessity in today\u27s FPGAs, because SRAM is a critical component in FPGA design and consumes a large fraction of the total power. The present chapter provides an overview of various factors responsible for power consumption in FPGA and discusses the design techniques of low-power SRAM-based FPGA at system level, device level, and architecture levels. Finally, the chapter proposes a data-aware dynamic SRAM cell to control the power consumption in the cell. Stack effect has been adopted in the design to reduce the leakage current. The various peripheral circuits like address decoder circuit, write/read enable circuits, and sense amplifier have been modified to implement a power-efficient SRAM-based FPGA

IntechOpen

Sub-10nm Transistors for Low Power Computing: Tunnel FETs and Negative Capacitance FETs

Author: Sharma Ankit
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2018
Field of study

One of the major roadblocks in the continued scaling of standard CMOS technology is its alarmingly high leakage power consumption. Although circuit and system level methods can be employed to reduce power, the fundamental limit in the overall energy efficiency of a system is still rooted in the MOSFET operating principle: an injection of thermally distributed carriers, which does not allow subthreshold swing (SS) lower than 60mV/dec at room temperature. Recently, a new class of steep-slope devices like Tunnel FETs (TFETs) and Negative-Capacitance FETs (NCFETs) have garnered intense interest due to their ability to surpass the 60mV/dec limit on SS at room temperature. The focus of this research is on the simulation and design of TFETs and NCFETs for ultra-low power logic and memory applications. Using full band quantum mechanical model within the Non-Equilibrium Greens Function (NEGF) formalism, source-underlapping has been proposed as an effective technique to lower the SS in GaSb-InAs TFETs. Band-tail states, associated with heavy source doping, are shown to significantly degrade the SS in TFETs from their ideal value. To solve this problem, undoped source GaSb-InAs TFET in an i-i-n configuration is proposed. A detailed circuit-to-system level evaluation is performed to investigate the circuit level metrics of the proposed devices. To demonstrate their potential in a memory application, a 4T gain cell (GC) is proposed, which utilizes the low-leakage and enhanced drain capacitance of TFETs to realize a robust and long retention time GC embedded-DRAMs. The device/circuit/system level evaluation of proposed TFETs demonstrates their potential for low power digital applications. The second part of the thesis focuses on the design space exploration of hysteresis-free Negative Capacitance FETs (NCFETs). A cross-architecture analysis using HfZrOx ferroelectric (FE-HZO) integrated on bulk MOSFET, fully-depleted SOI-FETs, and sub-10nm FinFETs shows that FDSOI and FinFET configurations greatly benefit the NCFET performance due to their undoped body and improved gate-control which enables better capacitance matching with the ferroelectric. A low voltage NC-FinFET operating down to 0.25V is predicted using ultra-thin 3nm FE-HZO. Next, we propose one-transistor ferroelectric NOR type (Fe-NOR) non-volatile memory based on HfZrOx ferroelectric FETs (FeFETs). The enhanced drain-channel coupling in ultrashort channel FeFETs is utilized to dynamically modulate memory window of storage cells thereby resulting in simple erase-, program-and read-operations. The simulation analysis predicts sub-1V program/erase voltages in the proposed Fe-NOR memory array and therefore presents a significantly lower power alternative to conventional FeRAM and NOR flash memories

Purdue E-Pubs

Exploration and Design of High Performance Variation Tolerant On-Chip Interconnects

Author: Nigussie Ethiopia
Publication venue: Annales Universitatis Turkuensis AI 410
Publication date: 08/09/2010
Field of study

Siirretty Doriast

UTUPub

A fast and retargetable framework for logic-IP-internal electromigration assessment comprehending advanced waveform effects

Author: Cortadella Jordi
Jain Palkesh
Sapatnekar Sachin S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

A new methodology for system-on-chip-level logic-IP-internal electromigration verification is presented in this paper, which significantly improves accuracy by comprehending the impact of the parasitic RC loading and voltage-dependent pin capacitance in the library model. It additionally provides an on-the-fly retargeting capability for reliability constraints by allowing arbitrary specifications of lifetimes, temperatures, voltages, and failure rates, as well as interoperability of the IPs across foundries. The characterization part of the methodology is expedited through the intelligent IP-response modeling. The ultimate benefit of the proposed approach is demonstrated on a 28-nm design by providing an on-the-fly specification of retargeted reliability constraints. The results show a high correlation with SPICE and were obtained with an order of magnitude reduction in the verification runtime.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC