35 research outputs found

    Circuit Techniques for Adaptive and Reliable High Performance Computing.

    Full text link
    Increasing power density with process scaling has caused stagnation in the clock speed of modern microprocessors. Accordingly, designers have adopted message passing and shared memory based multicore architectures in order to keep up with the rapidly rising demand for computing throughput. At the same time, applications are not entirely parallel and improving single-thread performance continues to remain critical. Additionally, reliability is also worsening with process scaling, and margining for failures due to process and environmental variations in modern technologies consumes an increasingly large portion of the power/performance envelope. In the wake of multicore computing, reliability of signal synchronization between the cores is also becoming increasingly critical. This forces designers to search for alternate efficient methods to improve compute performance while addressing reliability. Accordingly, this dissertation presents innovative circuit and architectural techniques for variation-tolerance, performance and reliability targeted at datapath logic, signal synchronization and memories. Firstly, a domino logic based design style for datapath logic is presented that uses Adaptive Robustness Tuning (ART) in addition to timing speculation to provide up to 71% performance gains over conventional domino logic in 32bx32b multiplier in 65nm CMOS. Margins are reduced until functionality errors are detected, that are used to guide the tuning. Secondly, for signal synchronization across clock domains, a new class of dynamic logic based synchronizers with single-cycle synchronization latency is presented, where pulses, rather than stable intermediate voltages cause metastability. Such pulses are amplified using skewed inverters to improve mean time between failures by ~1e6x over jamb latches and double flip-flops at 2GHz in 65nm CMOS. Thirdly, a reconfigurable sensing scheme for 6T SRAMs is presented that employs auto-zero calibration and pre-amplification to improve sensing reliability (by up to 1.2 standard deviations of NMOS threshold voltage in 28nm CMOS); this increased reliability is in turn traded for ~42% sensing speedup. Finally, a main memory architecture design methodology to address reliability and power in the context of Exascale computing systems is presented. Based on 3D-stacked DRAMs, the methodology co-optimizes DRAM access energy, refresh power and the increased cost of error resilience, to meet stringent power and reliability constraints.PhDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/107238/1/bharan_1.pd

    Cross-Layer Optimization for Power-Efficient and Robust Digital Circuits and Systems

    Full text link
    With the increasing digital services demand, performance and power-efficiency become vital requirements for digital circuits and systems. However, the enabling CMOS technology scaling has been facing significant challenges of device uncertainties, such as process, voltage, and temperature variations. To ensure system reliability, worst-case corner assumptions are usually made in each design level. However, the over-pessimistic worst-case margin leads to unnecessary power waste and performance loss as high as 2.2x. Since optimizations are traditionally confined to each specific level, those safe margins can hardly be properly exploited. To tackle the challenge, it is therefore advised in this Ph.D. thesis to perform a cross-layer optimization for digital signal processing circuits and systems, to achieve a global balance of power consumption and output quality. To conclude, the traditional over-pessimistic worst-case approach leads to huge power waste. In contrast, the adaptive voltage scaling approach saves power (25% for the CORDIC application) by providing a just-needed supply voltage. The power saving is maximized (46% for CORDIC) when a more aggressive voltage over-scaling scheme is applied. These sparsely occurred circuit errors produced by aggressive voltage over-scaling are mitigated by higher level error resilient designs. For functions like FFT and CORDIC, smart error mitigation schemes were proposed to enhance reliability (soft-errors and timing-errors, respectively). Applications like Massive MIMO systems are robust against lower level errors, thanks to the intrinsically redundant antennas. This property makes it applicable to embrace digital hardware that trades quality for power savings.Comment: 190 page

    Sensing Schemes for STT-MRAMs structured with high TMR in low RA MTJs

    Get PDF
    In this work, we investigated the sensing challenges of spin-transfer torque MRAMs structured with perpendicular magnetic tunnel junctions with a high tunneling magnetoresistance ratio in a low resistance-area product. To overcome the problems of reading this type of memory, we have proposed a voltage sensing amplifier topology and compared its performance to that of the current sensing amplifier in terms of power, speed, and bit error rate performance. We have verified that the proposed sensing scheme offers a substantial improvement in bit-error-rate performance. To enumerate the read operations of the proposed sensing scheme with the proposed cross-coupled capacitive feedback technique on the clamped circuity have successfully been performed a 2.5X reduction in average low power and a 13X increase in average reading speed compared with the previous works due to its device structure and the proposed circuit technique.This work is part of a project that has received funding from the European Union’s H2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 691178, and supported by the TUBITAK-Career project #113E76

    Improving power of L1 data cache and register file utilizing critical path instructions

    Get PDF
    As transistor’s feature size shrinks, power becomes one of the limiting factors in design of modern processors. Cache and register file are the two power hungry components in processors, consuming more than one third of total processors’ power budget. In this thesis, we propose new architectures for cache and register file to reduce power consumption. In the new architectures, we have SRAM cells operating at two different voltage levels and we change the structure of the cells so that they dynamically switch between nominal and reduced supply voltage. Since power is proportional to voltage squared, an effective method to reduce power is lowering supply voltage. However, one of the side effects of using SRAM cells with reduced voltage is performance penalty. As supply voltage reduces, it takes longer to read/write from/to an SRAM cell. In this thesis, we exploit critical path instructions to overcome the performance impact of voltage scaling. Critical path instructions are chain of dependent instructions that constrain speed of processors. Those cells that are accessed frequently by critical instructions are assigned to use nominal supply voltage to preserve performance. On the other side, the cells that are seldom accessed by critical instructions are assigned to low supply voltage to reduce power consumption. To reduce overhead of voltage switching, we monitor critical instructions within long intervals and adjust the voltage of cells only when the intervals are elapsed. We have evaluated our optimization techniques using a combination of circuit and architectural simulators. First, we used HSPICE to measure both dynamic and static power and also latency of SRAM cells for nominal and reduced supply voltages. Then, the results from HSPICE were fed into Simplescalar for architectural evaluations. Our simulation results reveal that the low power cache and register file reduce power consumption significantly with negligible impact on performance

    Perpendicular STT-MTJs with Double Reference Layers and its Application to Downscaled Memory Cells

    Get PDF
    Chip design presents problems due to scaling as the technology node reaches to the physical limits. The roadmap to 7nm technology node and beyond is already traced and overcome the problems in power and energy dissipation have become a fundamental part in the chip design...El diseño del chip presenta problemas debido al escalamiento de dispositivos a medida que el nodo tecnológico llega a sus límites físicos. La ruta para el desarrollo de nodos de 7nm en adelante se ha trazado, y superar los problemas de potencia y disipación de energía se ha convertido una parte fundamental para el diseño de chips..

    先端プロセス技術における混載SRAMの高信頼・低電力化に関する研究

    Get PDF
    13301甲第4843号博士(工学)金沢大学博士論文本文Ful

    Sub-10nm Transistors for Low Power Computing: Tunnel FETs and Negative Capacitance FETs

    Get PDF
    One of the major roadblocks in the continued scaling of standard CMOS technology is its alarmingly high leakage power consumption. Although circuit and system level methods can be employed to reduce power, the fundamental limit in the overall energy efficiency of a system is still rooted in the MOSFET operating principle: an injection of thermally distributed carriers, which does not allow subthreshold swing (SS) lower than 60mV/dec at room temperature. Recently, a new class of steep-slope devices like Tunnel FETs (TFETs) and Negative-Capacitance FETs (NCFETs) have garnered intense interest due to their ability to surpass the 60mV/dec limit on SS at room temperature. The focus of this research is on the simulation and design of TFETs and NCFETs for ultra-low power logic and memory applications. Using full band quantum mechanical model within the Non-Equilibrium Greens Function (NEGF) formalism, source-underlapping has been proposed as an effective technique to lower the SS in GaSb-InAs TFETs. Band-tail states, associated with heavy source doping, are shown to significantly degrade the SS in TFETs from their ideal value. To solve this problem, undoped source GaSb-InAs TFET in an i-i-n configuration is proposed. A detailed circuit-to-system level evaluation is performed to investigate the circuit level metrics of the proposed devices. To demonstrate their potential in a memory application, a 4T gain cell (GC) is proposed, which utilizes the low-leakage and enhanced drain capacitance of TFETs to realize a robust and long retention time GC embedded-DRAMs. The device/circuit/system level evaluation of proposed TFETs demonstrates their potential for low power digital applications. The second part of the thesis focuses on the design space exploration of hysteresis-free Negative Capacitance FETs (NCFETs). A cross-architecture analysis using HfZrOx ferroelectric (FE-HZO) integrated on bulk MOSFET, fully-depleted SOI-FETs, and sub-10nm FinFETs shows that FDSOI and FinFET configurations greatly benefit the NCFET performance due to their undoped body and improved gate-control which enables better capacitance matching with the ferroelectric. A low voltage NC-FinFET operating down to 0.25V is predicted using ultra-thin 3nm FE-HZO. Next, we propose one-transistor ferroelectric NOR type (Fe-NOR) non-volatile memory based on HfZrOx ferroelectric FETs (FeFETs). The enhanced drain-channel coupling in ultrashort channel FeFETs is utilized to dynamically modulate memory window of storage cells thereby resulting in simple erase-, program-and read-operations. The simulation analysis predicts sub-1V program/erase voltages in the proposed Fe-NOR memory array and therefore presents a significantly lower power alternative to conventional FeRAM and NOR flash memories
    corecore