13 research outputs found

    Enhancing Performance and Energy Consumption of HER Caches by Adding Associativity

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-54420-0_45Unlike other previous techniques, the recently proposed Hard Error Recovery (HER) fault-tolerant cache provides 100% fault-coverage in L1 data caches. This full coverage makes the HER cache appropiate for fault-dominated future technology nodes. An n-way set-associative HER cache implements one cache way with fast SRAM banks and the remaining ways with eDRAM banks to address power and area. Since the number of eDRAM cache blocks used in a specific HER cache organization depends on the cache associativity (i.e., the implemented number of ways), we expect that the performance and energy consumption provided by a given HER cache design strongly depends on the cache geometry. In this work we study the behavior of the HER cache design when applied to a highly associative L1 cache like those found in some modern microprocessors. In particular this work explores a 32KB 8-way associative L1 data cache such as the one used in Intel Haswell microarchitecture. Experimental results show that, at low-power modes compared to a conventional cache with the same storage capacity and number of ways, area, leakage power, and dynamic energy savings of a 4-way HER cache are by 25%, 85%, and 62%, respectively. These percentages are further improved (by 40%, 89%, and 68%, respectively) when the cache associativity is increased to 8 ways, while the performance loss with respect to both an 8-way conventional cache and the 4-way HER cache is minimal.This work was supponed by Generalitat de Catalunya (200950R1250), by FP7 program of the European Commission (TRAMS-248789), by Spanish Ministerio de Economía y Competitividad (MINECO) and by FEDER funds under Grant TlN2012-38341-C04-01 and TIN2010-18368.Lorente Garcés, VJ.; Valero Bresó, A.; Canal, R. (2014). Enhancing Performance and Energy Consumption of HER Caches by Adding Associativity. En Euro-Par 2013: Parallel Processing Workshops. Springer. 454-464. https://doi.org/10.1007/978-3-642-54420-0_45S454464Bhavnagarwala, A.J., et al.: The Impact of Intrinsic Device Fluctuations on CMOS SRAM Cell Stability. IEEE Journal of Solid-State Circuits 36(4), 658–665 (2001)Mukhopadhyay, S., et al.: Modeling of Failure Probability and Statistical Design of SRAM Array for Yield Enhancement in Nanoscaled CMOS. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 24(12), 1859–1880 (2005)Shirvani, P.P., McCluskey, E.J.: PADded Cache: A New Fault-Tolerance Technique for Cache Memories. In: Proceedings of the 17th IEEE VLSI Test Symposium, pp. 440–445 (1999)Wilkerson, C., et al.: Trading off Cache Capacity for Reliability to Enable Low Voltage Operation. In: Proceedings of the 35th Annual International Symposium on Computer Architecture, pp. 203–214 (2008)Agarwal, A., et al.: Process Variation in Embedded Memories: Failure Analysis and Variation Aware Architecture. IEEE Journal of Solid-State Circuits 40(9), 1804–1814 (2005)Ansari, A., et al.: Archipelago: A Polymorphic Cache Design for Enabling Robust Near-Threshold Operation. In: Proceedings of the 17th International Symposium on High Performance Computer Architecture, pp. 539–550 (2011)Nomura, S., et al.: Sampling + DMR: Practical and Low-overhead Permanent Fault Detection. In: Proceedings of the 38th Annual International Symposium on Computer Architecture, pp. 201–212 (2011)Sinharoy, B., et al.: IBM POWER7 multicore server processor. IBM Journal of Research and Development 55(3) (2011)Lorente, V., et al.: Combining RAM technologies for hard-error recovery in L1 data caches working at very-low power modes. In: Proceedings of the Design, Automation, and Test in Europe Conference, pp. 83–88 (2013)Kanter, D.: Intel’s Haswell CPU Microarchitecture, ”Real World Technologies” (November 13, 2012), http://www.realworldtech.com/haswell-cpu/Paul, S., et al.: Reliability-Driven ECC Allocation for Multiple Bit Error Resilience in Processor Cache. IEEE Transactions on Computers 60(1), 20–34 (2011)Alameldeen, A.R., et al.: Adaptive Cache Design to Enable Reliable Low-Voltage Operation. IEEE Transactions on Computers 60, 50–63 (2011)Dreslinski, R.G., et al.: Reconfigurable Energy Efficient Near Threshold Cache Architectures. In: Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture, pp. 459–470 (2008)Wilkerson, C., et al.: Reducing Cache Power with Low-Cost, Multi-bit Error-Correcting Codes. In: Proceedings of the 37th Annual International Symposium on Computer Architecture, pp. 83–93 (2010)Burger, D., Austin, T.M.: The SimpleScalar Tool Set, Version 2.0. ACM SIGARCH Computer Architecture News 25(3), 13–25 (1997)Thoziyoor, S., et al.: CACTI 5.1. Hewlett-Packard Laboratories, Palo Alto, Technical Report (2008)spec2000: Standard Performance Evaluation Corporation, http://www.spec.org/cpu2000Kulkarni, J.P., et al.: A 160 mV, Fully Differential, Robust Schmitt Trigger Based Sub-threshold SRAM. In: Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design, pp. 171–176 (2007)Keeth, B., et al.: DRAM Circuit Design. Fundamental and High-Speed Topics. John Wiley and Sons, Inc., Hoboken (2008)Mueller, W., et al.: Challenges for the DRAM Cell Scaling to 40nm. In: IEEE International Electron Devices Meeting 4, pp. 336–339 (2005

    Online error detection and correction of erratic bits in register files

    Get PDF
    Aggressive voltage scaling needed for low power in each new process generation causes large deviations in the threshold voltage of minimally sized devices of the 6T SRAM cell. Gate oxide scaling can cause large transient gate leakage (a trap in the gate oxide), which is known as the erratic bits phenomena. Register file protection is necessary to prevent errors from quickly spreading to different parts of the system, which may cause applications to crash or silent data corruption. This paper proposes a simple and cost-effective mechanism that increases the resiliency of the register files to erratic bits. Our mechanism detects those registers that have erratic bits, recovers from the error and quarantines the faulty register. After the quarantine period, it is able to detect whether they are fully operational with low overhead.Postprint (published version

    Design and modelling of different SRAM's based on CNTFET 32nm technology

    Full text link
    Carbon nanotube field-effect transistor (CNTFET) refers to a field-effect transistor that utilizes a single carbon nanotube or an array of carbon nanotubes as the channel material instead of bulk silicon in the traditional MOSFET structure. Since it was first demonstrated in 1998, there have been tremendous developments in CNTFETs, which promise for an alternative material to replace silicon in future electronics. Carbon nanotubes are promising materials for the nano-scale electron devices such as nanotube FETs for ultra-high density integrated circuits and quantum-effect devices for novel intelligent circuits, which are expected to bring a breakthrough in the present silicon technology. A Static Random Access Memory (SRAM) is designed to plug two needs: i) The SRAM provides as cache memory, communicating between central processing unit and Dynamic Random Access Memory (DRAM). ii) The SRAM technology act as driving force for low power application since SRAM is portable compared to DRAM, and SRAM doesn't require any refresh current. On the basis of acquired knowledge, we present different SRAM's designed for the conventional CNTFET. HSPICE simulations of this circuit using Stanford CNTFET model shows a great improvement in power saving.Comment: 15 Page

    Performance-effective operation below Vcc-min

    Get PDF
    Continuous circuit miniaturization and increased process variability point to a future with diminishing returns from dynamic voltage scaling. Operation below Vcc-min has been proposed recently as a mean to reverse this trend. The goal of this paper is to minimize the performance loss due to reduced cache capacity when operating below Vcc-min. A simple method is proposed: disable faulty blocks at low voltage. The method is based on observations regarding the distributions of faults in an array according to probability theory. The key lesson, from the probability analysis, is that as the number of uniformly distributed random faulty cells in an array increases the faults increasingly occur in already faulty blocks. The probability analysis is also shown to be useful for obtaining insight about the reliability implications of other cache techniques. For one configuration used in this paper, block disabling is shown to have on the average 6.6% and up to 29% better performance than a previously proposed scheme for low voltage cache operation. Furthermore, block-disabling is simple and less costly to implement and does not degrade performance at or above Vcc-min operation. Finally, it is shown that a victim-cache enables higher and more deterministic performance for a block-disabled cache

    Variation-tolerant ultra low-power heterojunction tunnel FET SRAM design

    Full text link

    An Ultra-Low-Power 75mV 64-Bit Current-Mode Majority-Function Adder

    Get PDF
    Ultra-low-power circuits are becoming more desirable due to growing portable device markets and they are also becoming more interesting and applicable today in biomedical, pharmacy and sensor networking applications because of the nano-metric scaling and CMOS reliability improvements. In this thesis, three main achievements are presented in ultra-low-power adders. First, a new majority function algorithm for carry and the sum generation is presented. Then with this algorithm and implied new architecture, we achieved a circuit with 75mV supply voltage operation. Last but not least, a 64 bit current-mode majority-function adder based on the new architecture and algorithm is successfully tested at 75mV supply voltage. The circuit consumed 4.5nW or 3.8pJ in one of the worst conditions

    Analysis and design of a subthreshold CMOS Schmitt trigger circuit

    Get PDF
    Tese (doutorado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Engenharia Elétrica, Florianópolis, 2017.Nesta tese, o disparador Schmitt (ou Schmitt trigger) CMOS clássico (ST) operando em inversão fraca é analisado. A transferência de tensão DC completa é determinada, incluindo expressões analíticas para as tensões dos nós internos. A transferência de tensão DC resultante do ST apresenta um comportamento contínuo mesmo na presença da histerese. Nesse caso, a característica da tensão de saída entre os limites da histerese é formada por um segmento metaestável, que pode ser explicado em termos das resistências negativas dos subcircuitos NMOS e PMOS do ST. A tensão mínima para o aparecimento da histerese é determinada fazendo-se a análise de pequenos sinais. A análise de pequenos sinais também é utilizada para a estimativa da largura do laço de histerese. É mostrado que a histerese não aparece para tensões de alimentação menores que 75 mV em 300 K. A análise do ST operando como amplificador também foi feita. A razão ótima dos transistores foi determinada com o objetivo de se maximizar o ganho de tensão. A comparação do disparador Schmitt com o inversor CMOS convencional destaca as vantagens e desvantagens de cada um para aplicações de ultra-baixa tensão. Também é mostrado que o ST é teoricamente capaz de operar (com ganho de tensão absoluto ?1) com uma tensão de alimentação tão baixa quanto 31.5 mV, a qual é menor do que o conhecido limite prévio de 36 mV, para o inversor convencional. Como amplificador, o ST possui ganho de tensão absoluto consideravelmente maior que o inversor convencional na mesma tensão de alimentação. Três circuitos integrados foram projetados e fabricados para estudar o comportamento do ST com tensões de alimentação entre 50 mV e 1000 mV.Abstract : In this thesis, the classical CMOS Schmitt trigger (ST) operating in weak inversion is analyzed. The complete DC voltage transfer characteristic is determined, including analytical expressions for the internal node voltage. The resulting voltage transfer characteristic of the ST presents a continuous output behavior even when hysteresis is present. In this case, the output voltage characteristic between the hysteresis limits is formed by a metastable segment, which can be explained in terms of the negative resistance of the NMOS and PMOS subcircuits of the ST. The minimum supply voltage at which hysteresis appears is determined carrying out small-signal analysis, which is also used to estimate the hysteresis width. It is shown that hysteresis does not appear for supply voltages lower than 75 mV at 300 K. The analysis of the ST operating as a voltage amplifier was also carried out. Optimum transistor ratios were determined aiming at voltage gain maximization. The comparison of the ST with the standard CMOS inverter highlights the relative benefits and drawbacks of each one in ULV applications. It is also shown that the ST is theoretically capable of operating (voltage gain ?1) at a supply voltage as low as 31.5 mV, which is lower than the well-known limit of 36 mV, for the standard CMOS inverter. As an amplifier, the ST shows considerable higher absolute voltage gains than those showed by the conventional inverter at the same supply voltages. Three test chips were designed and fabricated to study the operation of the ST at supply voltages between 50 mV and 1000 mV

    Design and analysis of SRAMs for energy harvesting systems

    Get PDF
    PhD ThesisAt present, the battery is employed as a power source for wide varieties of microelectronic systems ranging from biomedical implants and sensor net-works to portable devices. However, the battery has several limitations and incurs many challenges for the majority of these systems. For instance, the design considerations of implantable devices concern about the battery from two aspects, the toxic materials it contains and its lifetime since replacing the battery means a surgical operation. Another challenge appears in wire-less sensor networks, where hundreds or thousands of nodes are scattered around the monitored environment and the battery of each node should be maintained and replaced regularly, nonetheless, the batteries in these nodes do not all run out at the same time. Since the introduction of portable systems, the area of low power designs has witnessed extensive research, driven by the industrial needs, towards the aim of extending the lives of batteries. Coincidentally, the continuing innovations in the field of micro-generators made their outputs in the same range of several portable applications. This overlap creates a clear oppor-tunity to develop new generations of electronic systems that can be powered, or at least augmented, by energy harvesters. Such self-powered systems benefit applications where maintaining and replacing batteries are impossi-ble, inconvenient, costly, or hazardous, in addition to decreasing the adverse effects the battery has on the environment. The main goal of this research study is to investigate energy harvesting aware design techniques for computational logic in order to enable the capa- II bility of working under non-deterministic energy sources. As a case study, the research concentrates on a vital part of all computational loads, SRAM, which occupies more than 90% of the chip area according to the ITRS re-ports. Essentially, this research conducted experiments to find out the design met-ric of an SRAM that is the most vulnerable to unpredictable energy sources, which has been confirmed to be the timing. Accordingly, the study proposed a truly self-timed SRAM that is realized based on complete handshaking protocols in the 6T bit-cell regulated by a fully Speed Independent (SI) tim-ing circuitry. The study proved the functionality of the proposed design in real silicon. Finally, the project enhanced other performance metrics of the self-timed SRAM concentrating on the bit-line length and the minimum operational voltage by employing several additional design techniques.Umm Al-Qura University, the Ministry of Higher Education in the Kingdom of Saudi Arabia, and the Saudi Cultural Burea

    Low Voltage Circuit Design Techniques for Cubic-Millimeter Computing.

    Full text link
    Cubic-millimeter computers complete with microprocessors, memories, sensors, radios and power sources are becomingly increasingly viable. Power consumption is one of the last remaining barriers to cubic-millimeter computing and is the subject of this work. In particular, this work focuses on minimizing power consumption in digital circuits using low voltage operation. Chapter 2 includes a general discussion of low voltage circuit behavior, specifically that at subthreshold voltages. In Chapter 3, the implications of transistor scaling on subthreshold circuits are considered. It is shown that the slow scaling of gate oxide relative to the device channel length leads to a 60% reduction in Ion/Ioff between the 90nm and 32nm nodes, which results in sub-optimal static noise margins, delay, and power consumption. It is also shown that simple modifications to gate length and doping can alleviate some of these problems. Three low voltage test-chips are discussed for the remainder of this work. The first test-chip implements the Subliminal Processor (Chapter 4), a sub-200mV 8-bit microprocessor fabricated in a 0.13µm technology. Measurements first show that the Subliminal Processor consumes only 3.5pJ/instruction at Vdd=350mV. Measurements of 20 dies then reveal that proper body biasing can eliminate performance variations and reduce mean energy substantially at low voltage. Finally, measurements are used to explore the effectiveness of body biasing, voltage scaling, and various gate sizing techniques for improving speed. The second test-chip implements the Phoenix Processor (Chapter 5), a low voltage 8-bit microprocessor optimized for minimum power operation in standby mode. The Phoenix Processor was fabricated in a 0.18µm technology in an area of only 915x915µm2. The aggressive standby mode strategy used in the Phoenix Processor is discussed thoroughly. Measurements at Vdd=0.5V show that the test-chip consumes 226nW in active mode and only 35.4pW in standby mode, making an on-chip battery a viable option. Finally, the third test-chip implements a low voltage image sensor (Chapter 6). A 128x128 image sensor array was fabricated in a 0.13µm technology. Test-chip measurements reveal that operation below 0.6V is possible with power consumption of only 1.9µW at 0.6V. Extensive characterization is presented with specific emphasis on noise characteristics and power consumption.Ph.D.Electrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/62233/1/hansons_1.pd
    corecore