35 research outputs found
Circuit Techniques for Adaptive and Reliable High Performance Computing.
Increasing power density with process scaling has caused stagnation in the clock speed of modern microprocessors. Accordingly, designers have adopted message passing and shared memory based multicore architectures in order to keep up with the rapidly rising demand for computing throughput. At the same time, applications are not entirely parallel and improving single-thread performance continues to remain critical. Additionally, reliability is also worsening with process scaling, and margining for failures due to process and environmental variations in modern technologies consumes an increasingly large portion of the power/performance envelope. In the wake of multicore computing, reliability of signal synchronization between the cores is also becoming increasingly critical. This forces designers to search for alternate efficient methods to improve compute performance while addressing reliability. Accordingly, this dissertation presents innovative circuit and architectural techniques for variation-tolerance, performance and reliability targeted at datapath logic, signal synchronization and memories.
Firstly, a domino logic based design style for datapath logic is presented that uses Adaptive Robustness Tuning (ART) in addition to timing speculation to provide up to 71% performance gains over conventional domino logic in 32bx32b multiplier in 65nm CMOS. Margins are reduced until functionality errors are detected, that are used to guide the tuning.
Secondly, for signal synchronization across clock domains, a new class of dynamic logic based synchronizers with single-cycle synchronization latency is presented, where pulses, rather than stable intermediate voltages cause metastability. Such pulses are amplified using skewed inverters to improve mean time between failures by ~1e6x over jamb latches and double flip-flops at 2GHz in 65nm CMOS.
Thirdly, a reconfigurable sensing scheme for 6T SRAMs is presented that employs auto-zero calibration and pre-amplification to improve sensing reliability (by up to 1.2 standard deviations of NMOS threshold voltage in 28nm CMOS); this increased reliability is in turn traded for ~42% sensing speedup.
Finally, a main memory architecture design methodology to address reliability and power in the context of Exascale computing systems is presented. Based on 3D-stacked DRAMs, the methodology co-optimizes DRAM access energy, refresh power and the increased cost of error resilience, to meet stringent power and reliability constraints.PhDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/107238/1/bharan_1.pd
Cross-Layer Optimization for Power-Efficient and Robust Digital Circuits and Systems
With the increasing digital services demand, performance and power-efficiency
become vital requirements for digital circuits and systems. However, the
enabling CMOS technology scaling has been facing significant challenges of
device uncertainties, such as process, voltage, and temperature variations. To
ensure system reliability, worst-case corner assumptions are usually made in
each design level. However, the over-pessimistic worst-case margin leads to
unnecessary power waste and performance loss as high as 2.2x. Since
optimizations are traditionally confined to each specific level, those safe
margins can hardly be properly exploited.
To tackle the challenge, it is therefore advised in this Ph.D. thesis to
perform a cross-layer optimization for digital signal processing circuits and
systems, to achieve a global balance of power consumption and output quality.
To conclude, the traditional over-pessimistic worst-case approach leads to
huge power waste. In contrast, the adaptive voltage scaling approach saves
power (25% for the CORDIC application) by providing a just-needed supply
voltage. The power saving is maximized (46% for CORDIC) when a more aggressive
voltage over-scaling scheme is applied. These sparsely occurred circuit errors
produced by aggressive voltage over-scaling are mitigated by higher level error
resilient designs. For functions like FFT and CORDIC, smart error mitigation
schemes were proposed to enhance reliability (soft-errors and timing-errors,
respectively). Applications like Massive MIMO systems are robust against lower
level errors, thanks to the intrinsically redundant antennas. This property
makes it applicable to embrace digital hardware that trades quality for power
savings.Comment: 190 page
Sensing Schemes for STT-MRAMs structured with high TMR in low RA MTJs
In this work, we investigated the sensing challenges of spin-transfer torque MRAMs structured with perpendicular
magnetic tunnel junctions with a high tunneling magnetoresistance ratio in a low resistance-area product. To overcome
the problems of reading this type of memory, we have proposed a voltage sensing amplifier topology and compared its
performance to that of the current sensing amplifier in terms of power, speed, and bit error rate performance. We have
verified that the proposed sensing scheme offers a substantial improvement in bit-error-rate performance. To enumerate
the read operations of the proposed sensing scheme with the proposed cross-coupled capacitive feedback technique on
the clamped circuity have successfully been performed a 2.5X reduction in average low power and a 13X increase in
average reading speed compared with the previous works due to its device structure and the proposed circuit technique.This work is part of a project that has received funding from the
European Union’s H2020 research and innovation programme under the
Marie Skłodowska-Curie grant agreement No 691178, and supported by the
TUBITAK-Career project #113E76
Improving power of L1 data cache and register file utilizing critical path instructions
As transistor’s feature size shrinks, power becomes one of the limiting factors in design of modern processors. Cache and register file are the two power hungry components in processors, consuming more than one third of total processors’ power budget. In this thesis, we propose new architectures for cache and register file to reduce power consumption. In the new architectures, we have SRAM cells operating at two different voltage levels and we change the structure of the cells so that they dynamically switch between nominal and reduced supply voltage.
Since power is proportional to voltage squared, an effective method to reduce power is lowering supply voltage. However, one of the side effects of using SRAM cells with reduced voltage is performance penalty. As supply voltage reduces, it takes longer to read/write from/to an SRAM cell. In this thesis, we exploit critical path instructions to overcome the performance impact of voltage scaling. Critical path instructions are chain of dependent instructions that constrain speed of processors. Those cells that are accessed frequently by critical instructions are assigned to use nominal supply voltage to preserve performance. On the other side, the cells that are seldom accessed by critical instructions are assigned to low supply voltage to reduce power consumption. To reduce overhead of voltage switching, we monitor critical instructions within long intervals and adjust the voltage of cells only when the intervals are elapsed.
We have evaluated our optimization techniques using a combination of circuit and architectural simulators. First, we used HSPICE to measure both dynamic and static power and also latency of SRAM cells for nominal and reduced supply voltages. Then, the results from HSPICE were fed into Simplescalar for architectural evaluations. Our simulation results reveal that the low power cache and register file reduce power consumption significantly with negligible impact on performance
Perpendicular STT-MTJs with Double Reference Layers and its Application to Downscaled Memory Cells
Chip design presents problems due to scaling as the technology node reaches to
the physical limits. The roadmap to 7nm technology node and beyond is already traced
and overcome the problems in power and energy dissipation have become a
fundamental part in the chip design...El diseño del chip presenta problemas debido al escalamiento de dispositivos a
medida que el nodo tecnológico llega a sus límites físicos. La ruta para el desarrollo de
nodos de 7nm en adelante se ha trazado, y superar los problemas de potencia y
disipación de energía se ha convertido una parte fundamental para el diseño de chips..
先端プロセス技術における混載SRAMの高信頼・低電力化に関する研究
13301甲第4843号博士(工学)金沢大学博士論文本文Ful
Sub-10nm Transistors for Low Power Computing: Tunnel FETs and Negative Capacitance FETs
One of the major roadblocks in the continued scaling of standard CMOS technology is its alarmingly high leakage power consumption. Although circuit and system level methods can be employed to reduce power, the fundamental limit in the overall energy efficiency of a system is still rooted in the MOSFET operating principle: an injection of thermally distributed carriers, which does not allow subthreshold swing (SS) lower than 60mV/dec at room temperature. Recently, a new class of steep-slope devices like Tunnel FETs (TFETs) and Negative-Capacitance FETs (NCFETs) have garnered intense interest due to their ability to surpass the 60mV/dec limit on SS at room temperature. The focus of this research is on the simulation and design of TFETs and NCFETs for ultra-low power logic and memory applications. Using full band quantum mechanical model within the Non-Equilibrium Greens Function (NEGF) formalism, source-underlapping has been proposed as an effective technique to lower the SS in GaSb-InAs TFETs. Band-tail states, associated with heavy source doping, are shown to significantly degrade the SS in TFETs from their ideal value. To solve this problem, undoped source GaSb-InAs TFET in an i-i-n configuration is proposed. A detailed circuit-to-system level evaluation is performed to investigate the circuit level metrics of the proposed devices. To demonstrate their potential in a memory application, a 4T gain cell (GC) is proposed, which utilizes the low-leakage and enhanced drain capacitance of TFETs to realize a robust and long retention time GC embedded-DRAMs. The device/circuit/system level evaluation of proposed TFETs demonstrates their potential for low power digital applications. The second part of the thesis focuses on the design space exploration of hysteresis-free Negative Capacitance FETs (NCFETs). A cross-architecture analysis using HfZrOx ferroelectric (FE-HZO) integrated on bulk MOSFET, fully-depleted SOI-FETs, and sub-10nm FinFETs shows that FDSOI and FinFET configurations greatly benefit the NCFET performance due to their undoped body and improved gate-control which enables better capacitance matching with the ferroelectric. A low voltage NC-FinFET operating down to 0.25V is predicted using ultra-thin 3nm FE-HZO. Next, we propose one-transistor ferroelectric NOR type (Fe-NOR) non-volatile memory based on HfZrOx ferroelectric FETs (FeFETs). The enhanced drain-channel coupling in ultrashort channel FeFETs is utilized to dynamically modulate memory window of storage cells thereby resulting in simple erase-, program-and read-operations. The simulation analysis predicts sub-1V program/erase voltages in the proposed Fe-NOR memory array and therefore presents a significantly lower power alternative to conventional FeRAM and NOR flash memories