29 research outputs found

    Yield-Aware Leakage Power Reduction of On-Chip SRAMs

    Get PDF
    Leakage power dissipation of on-chip static random access memories (SRAMs) constitutes a significant fraction of the total chip power consumption in state-of-the-art microprocessors and system-on-chips (SoCs). Scaling the supply voltage of SRAMs during idle periods is a simple yet effective technique to reduce their leakage power consumption. However, supply voltage scaling also results in the degradation of the cells’ robustness, and thus reduces their capability to retain data reliably. This is particularly resulting in the failure of an increasing number of cells that are already weakened by excessive process parameters variations and/or manufacturing imperfections in nano-meter technologies. Thus, with technology scaling, it is becoming increasingly challenging to maintain the yield while attempting to reduce the leakage power of SRAMs. This research focuses on characterizing the yield-leakage tradeoffs and developing novel techniques for a yield-aware leakage power reduction of SRAMs. We first demonstrate that new fault behaviors emerge with the introduction of a low-leakage standby mode to SRAMs. In particular, it is shown that there are some types of defects in SRAM cells that start to cause failures only when the drowsy mode is activated. These defects are not sensitized in the active operating mode, and thus escape the traditional March tests. Fault models for these newly observed fault behaviors are developed and described in this thesis. Then, a new low-complexity test algorithm, called March RAD, is proposed that is capable of detecting all the drowsy faults as well as the simple traditional faults. Extreme process parameters variations can also result in SRAM cells with very weak data-retention capability. The probability of such cells may be very rare in small memory arrays, however, in large arrays, their probability is magnified by the huge number of bit-cells integrated on a single chip. Hence, it is critical also to account for such extremal events while attempting to scale the supply voltage of SRAMs. To estimate the statistics of such rare events within a reasonable computational time, we have employed concepts from extreme value theory (EVT). This has enabled us to accurately model the tail of the cell failure probability distribution versus the supply voltage. Analytical models are then developed to characterize the yield-leakage tradeoffs in large modern SRAMs. It is shown that even a moderate scaling of the supply voltage of large SRAMs can potentially result in significant yield losses, especially in processes with highly fluctuating parameters. Thus, we have investigated the application of fault-tolerance techniques for a more efficient leakage reduction of SRAMs. These techniques allow for a more aggressive voltage scaling by providing tolerance to the failures that might occur during the sleep mode. The results show that in a 45-nm technology, assuming 10% variation in transistors threshold voltage, repairing a 64KB memory using only 8 redundant rows or incorporating single error correcting codes (ECCs) allows for ~90% leakage reduction while incurring only ~1% yield loss. The combination of redundancy and ECC, however, allows to reach the practical limits of leakage reduction in the analyzed benchmark, i.e., ~95%. Applying an identical standby voltage to all dies, regardless of their specific process parameters variations, can result in too many cell failures in some dies with heavily skewed process parameters, so that they may no longer be salvageable by the employed fault-tolerance techniques. To compensate for the inter-die variations, we have proposed to tune the standby voltage of each individual die to its corresponding minimum level, after manufacturing. A test algorithm is presented that can be used to identify the minimum applicable standby voltage to each individual memory die. A possible implementation of the proposed tuning technique is also demonstrated. Simulation results in a 45-nm predictive technology show that tuning standby voltage of SRAMs can enhance data-retention yield by an additional 10%−50%, depending on the severity of the variations

    Robust low-power digital circuit design in nano-CMOS technologies

    Get PDF
    Device scaling has resulted in large scale integrated, high performance, low-power, and low cost systems. However the move towards sub-100 nm technology nodes has increased variability in device characteristics due to large process variations. Variability has severe implications on digital circuit design by causing timing uncertainties in combinational circuits, degrading yield and reliability of memory elements, and increasing power density due to slow scaling of supply voltage. Conventional design methods add large pessimistic safety margins to mitigate increased variability, however, they incur large power and performance loss as the combination of worst cases occurs very rarely. In-situ monitoring of timing failures provides an opportunity to dynamically tune safety margins in proportion to on-chip variability that can significantly minimize power and performance losses. We demonstrated by simulations two delay sensor designs to detect timing failures in advance that can be coupled with different compensation techniques such as voltage scaling, body biasing, or frequency scaling to avoid actual timing failures. Our simulation results using 45 nm and 32 nm technology BSIM4 models indicate significant reduction in total power consumption under temperature and statistical variations. Future work involves using dual sensing to avoid useless voltage scaling that incurs a speed loss. SRAM cache is the first victim of increased process variations that requires handcrafted design to meet area, power, and performance requirements. We have proposed novel 6 transistors (6T), 7 transistors (7T), and 8 transistors (8T)-SRAM cells that enable variability tolerant and low-power SRAM cache designs. Increased sense-amplifier offset voltage due to device mismatch arising from high variability increases delay and power consumption of SRAM design. We have proposed two novel design techniques to reduce offset voltage dependent delays providing a high speed low-power SRAM design. Increasing leakage currents in nano-CMOS technologies pose a major challenge to a low-power reliable design. We have investigated novel segmented supply voltage architecture to reduce leakage power of the SRAM caches since they occupy bulk of the total chip area and power. Future work involves developing leakage reduction methods for the combination logic designs including SRAM peripherals

    Software-Based Self-Test of Set-Associative Cache Memories

    Get PDF
    Embedded microprocessor cache memories suffer from limited observability and controllability creating problems during in-system tests. This paper presents a procedure to transform traditional march tests into software-based self-test programs for set-associative cache memories with LRU replacement. Among all the different cache blocks in a microprocessor, testing instruction caches represents a major challenge due to limitations in two areas: 1) test patterns which must be composed of valid instruction opcodes and 2) test result observability: the results can only be observed through the results of executed instructions. For these reasons, the proposed methodology will concentrate on the implementation of test programs for instruction caches. The main contribution of this work lies in the possibility of applying state-of-the-art memory test algorithms to embedded cache memories without introducing any hardware or performance overheads and guaranteeing the detection of typical faults arising in nanometer CMOS technologie

    Efficient cache architectures for reliable hybrid voltage operation using EDC codes

    Get PDF
    Semiconductor technology evolution enables the design of sensor-based battery-powered ultra-low-cost chips (e.g., below 1 p) required for new market segments such as body, urban life and environment monitoring. Caches have been shown to be the highest energy and area consumer in those chips. This paper proposes a novel, hybrid-operation (high Vcc, ultra-low Vcc), single-Vcc domain cache architecture based on replacing energy-hungry bitcells (e.g., 10T) by more energy-efficient and smaller cells (e.g., 8T) enhanced with Error Detection and Correction (EDC) features for high reliability and performance predictability. Our architecture is proven to largely outperform existing solutions in terms of energy and area.Postprint (author’s final draft

    Combining RAM technologies for hard-error recovery in L1 data caches working at very-low power modes

    Full text link
    ©2013 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Low-power modes in modern microprocessors rely on low frequencies and low voltages to reduce the energy budget. Nevertheless, manufacturing induced parameter variations can make SRAM cells unreliable producing hard errors at supply voltages below Vccmin. Recent proposals provide a rather low fault-coverage due to the fault coverage/overhead trade-off. We propose a new faulttolerant L1 cache, which combines SRAM and eDRAM cells in L1 data caches to provide 100% SRAM hard-error fault coverage. Results show that, compared to a conventional cache and assuming 50% failure probability at low-power mode, leakage and dynamic energy savings are by 85% and 62%, respectively, with a minimal impact on performance.This work was supported by the Spanish MICINN (TIN2010-18368) with the Consolider-Ingenio 2010 Programme co-funded by the European Commission FEDER funds (CSD2006-00046) and co-funded with the Plan E funds (TIN2009-14475-C04-01). Additionaly, it was supported by Generalitat de Catalunya (2009SGR1250), by FP7 program of the European Commission (TRAMS-248789), and by Spanish MINECO (TIN2012-38341-C04-01).Lorente Garcés, VJ.; Valero Bresó, A.; Sahuquillo Borrás, J.; Petit Martí, SV.; Canal, R.; López Rodríguez, PJ.; Duato Marín, JF. (2013). Combining RAM technologies for hard-error recovery in L1 data caches working at very-low power modes. IEEE, ACM. https://doi.org/10.7873/DATE.2013.031

    Microarchitectural techniques to reduce energy consumption in the memory hierarchy

    Get PDF
    This thesis states that dynamic profiling of the memory reference stream can improve energy and performance in the memory hierarchy. The research presented in this theses provides multiple instances of using lightweight hardware structures to profile the memory reference stream. The objective of this research is to develop microarchitectural techniques to reduce energy consumption at different levels of the memory hierarchy. Several simple and implementable techniques were developed as a part of this research. One of the techniques identifies and eliminates redundant refresh operations in DRAM and reduces DRAM refresh power. Another, reduces leakage energy in L2 and higher level caches for multiprocessor systems. The emphasis of this research has been to develop several techniques of obtaining energy savings in caches using a simple hardware structure called the counting Bloom filter (CBF). CBFs have been used to predict L2 cache misses and obtain energy savings by not accessing the L2 cache on a predicted miss. A simple extension of this technique allows CBFs to do way-estimation of set associative caches to reduce energy in cache lookups. Another technique using CBFs track addresses in a Virtual Cache and reduce false synonym lookups. Finally this thesis presents a technique to reduce dynamic power consumption in level one caches using significance compression. The significant energy and performance improvements demonstrated by the techniques presented in this thesis suggest that this work will be of great value for designing memory hierarchies of future computing platforms.Ph.D.Committee Chair: Lee, Hsien-Hsin S.; Committee Member: Cahtterjee,Abhijit; Committee Member: Mukhopadhyay, Saibal; Committee Member: Pande, Santosh; Committee Member: Yalamanchili, Sudhaka

    Multi-criteria optimization for energy-efficient multi-core systems-on-chip

    Get PDF
    The steady down-scaling of transistor dimensions has made possible the evolutionary progress leading to today’s high-performance multi-GHz microprocessors and core based System-on-Chip (SoC) that offer superior performance, dramatically reduced cost per function, and much-reduced physical size compared to their predecessors. On the negative side, this rapid scaling however also translates to high power densities, higher operating temperatures and reduced reliability making it imperative to address design issues that have cropped up in its wake. In particular, the aggressive physical miniaturization have increased CMOS fault sensitivity to the extent that many reliability constraints pose threat to the device normal operation and accelerate the onset of wearout-based failures. Among various wearout-based failure mechanisms, Negative biased temperature instability (NBTI) has been recognized as the most critical source of device aging. The urge of reliable, low-power circuits is driving the EDA community to develop new design techniques, circuit solutions, algorithms, and software, that can address these critical issues. Unfortunately, this challenge is complicated by the fact that power and reliability are known to be intrinsically conflicting metrics: traditional solutions to improve reliability such as redundancy, increase of voltage levels, and up-sizing of critical devices do contrast with traditional low-power solutions, which rely on compact architectures, scaled supply voltages, and small devices. This dissertation focuses on methodologies to bridge this gap and establishes an important link between low-power solutions and aging effects. More specifically, we proposed new architectural solutions based on power management strategies to enable the design of low-power, aging aware cache memories. Cache memories are one of the most critical components for warranting reliable and timely operation. However, they are also more susceptible to aging effects. Due to symmetric structure of a memory cell, aging occurs regardless of the fact that a cell (or word) is accessed or not. Moreover, aging is a worst-case matric and line with worst-case access pattern determines the aging of the entire cache. In order to stop the aging of a memory cell, it must be put into a proper idle state when a cell (or word) is not accessed which require proper management of the idleness of each atomic unit of power management. We have proposed several reliability management techniques based on the idea of cache partitioning to alleviate NBTI-induced aging and obtain joint energy and lifetime benefits. We introduce graceful degradation mechanism which allows different cache blocks into which a cache is partitioned to age at different rates. This implies that various sub-blocks become unreliable at different times, and the cache keeps functioning with reduced efficiency. We extended the capabilities of this architecture by integrating the concept of reconfigurable caches to maintain the performance of the cache throughout its lifetime. By this strategy, whenever a block becomes unreliable, the remaining cache is reconfigured to work as a smaller size cache with only a marginal degradation of performance. Many mission-critical applications require guaranteed lifetime of their operations and therefore the hardware implementing their functionality. Such constraints are usually enforced by means of various reliability enhancing solutions mostly based on redundancy which are not energy-friendly. In our work, we have proposed a novel cache architecture in which a smart use of cache partitions for redundancy allows us to obtain cache that meet a desired lifetime target with minimal energy consumption

    Cache architectures based on heterogeneous technologies to deal with manufacturing errors

    Full text link
    [EN] SRAM technology has traditionally been used to implement processor caches since it is the fastest existing RAM technology.However,one of the major drawbacks of this technology is its high energy consumption.To reduce this energy consumption modern processors mainly use two complementary techniques: i)low-power operating modes and ii)low-power memory technologies.The first technique allows the processor working at low clock frequencies and supply voltages.The main limitation of this technique is that manufacturing defects can significantly affect the reliability of SRAM cells when working these modes.The second technique brings alternative technologies such as eDRAM, which provides minimum area and power consumption.The main drawback of this memory technology is that reads are destructive and eDRAM cells work slower than SRAM ones. This thesis presents three main contributions regarding low-power caches and heterogeneous technologies: i)an study that identifies the optimal capacitance of eDRAM cells, ii)a novel cache design that tolerates the faults produced by SRAM cells in low-power modes, iii)a methodology that allows obtain the optimal operating frequency/voltage level when working with low-power modes. Regarding the first contribution,in this work SRAM and eDRAM technologies are combined to achieve a low-power fast cache that requires smaller area than conventional designs and that tolerates SRAM failures.First,this dissertation focuses on one of the main critical aspects of the design of heterogeneous caches:eDRAM cell capacitance.In this dissertation the optimal capacitance for an heterogeneous L1 data cache is identified by analyzing the compromise between performance and energy consumption.Experimental results show that an heterogeneous cache implemented with 10fF capacitors offers similar performance as a conventional SRAM cache while providing 55% energy savings and reducing by 29% the cache area. Regarding the second contribution,this thesis proposes a novel organization for a fault-tolerant heterogeneous cache.Currently,reducing the supply voltage is a mechanism widely used to reduce consumption and applies when the system workload activity decreases.However,SRAM cells cause different types of failures when the supply voltage is reduced and thus they limit the minimum operating voltage of the microprocessor. In the proposal,memory cells implemented with eDRAM technology serve as backup in case of failure of SRAM cells, because the correct operation of eDRAM cells is not affected by reduced voltages. The proposed architecture has two working modes: high-performance mode for supply voltages that do not induce SRAM cell failures, and low-power mode for those voltages that cause SRAM cell failures. In high-performance mode, the cache provides full capacity, which enables the processor to achieve its maximum performance. In low-power mode, the effective capacity of the cache is reduced because some of the eDRAM cells are dedicated to recover from SRAM failures. Experimental results show that the performance is scarcely reduced (e.g. less than 2.7% across all the studied benchmarks) with respect to an ideal SRAM cache without failures. Finally,this thesis proposes a methodology to find the optimal frequency/voltage level regarding energy consumption for the designed heterogeneous cache. For this purpose, first SRAM failure types and their probabilities are characterized.Then,the energy consumption of different frequency/voltage levels is evaluated when the system works in low-power mode.The study shows that, mainly due to the impact of SRAM failures on performance,the optimal combination of voltage and frequency from the energy point of view does not always correspond to the minimum voltage.[ES] La tecnología SRAM se ha utilizado tradicionalmente para implementar las memorias cache debido a que es la tecnología de memoria RAM más rápida existente.Por contra,uno de los principales inconvenientes de esta tecnología es su elevado consumo energético.Para reducirlo los procesadores modernos suelen emplear dos técnicas complementarias:i) modos de funcionamiento de bajo consumo y ii)tecnologías de bajo consumo.La primeras técnica consiste en utilizar bajas frecuencias y voltajes de funcionamiento.La principal limitación de esta técnica es que los defectos de fabricación pueden afectar notablemente a la fiabilidad de las celdas SRAM en estos modos.La segunda técnica agrupa tecnologías alternativas como la eDRAM,que ofrece área y consumo mínimos.El inconveniente de esta tecnología es que las lecturas son destructivas y es más lenta que la SRAM. Esta tesis presenta tres contribuciones principales centradas en caches de bajo consumo y tecnologías heterogéneas: i)estudio de la capacitancia óptima de las celdas eDRAM, ii)diseño de una cache tolerante a fallos producidos en las celdas SRAM en modos de bajo consumo, iii)metodología para obtener la relación óptima entre voltaje y frecuencia en procesadores con modos de bajo consumo. Respecto a la primera contribución,en este trabajo se combinan las tecnologías SRAM y eDRAM para conseguir una memoria cache rápida, de bajo consumo, área reducida, y tolerante a los fallos inherentes a la tecnología SRAM.En primer lugar,esta disertación se centra en uno de los aspectos críticos de diseño de caches heterogéneas SRAM/eDRAM: la capacitancia de los condensadores implementados con tecnología eDRAM.En esta tesis se identifica la capacitancia óptima de una cache de datos L1 heterogénea mediante el estudio del compromiso entre prestaciones y consumo energético.Los resultados experimentales muestran que condensadores de 10fF ofrecen prestaciones similares a las de una cache SRAM convencional ahorrando un 55% de consumo y reduciendo un 29% el área ocupada por la cache. Respecto a la segunda contribución,esta tesis propone una organización de cache heterogénea tolerante a fallos.Actualmente,reducir el voltaje de alimentación es un mecanismo muy utilizado para reducir el consumo en condiciones de baja carga.Sin embargo,las celdas SRAM producen distintos tipos de fallos cuando se reduce el voltaje de alimentación y por tanto limitan el voltaje mínimo de funcionamiento del microprocesador. En la cache heterogénea propuesta,las celdas de memoria implementadas con tecnología eDRAM sirven de copia de seguridad en caso de fallo de las celdas SRAM, ya que el correcto funcionamiento de las celdas eDRAM no se ve afectado por tensiones reducidas.La arquitectura propuesta consta de dos modos de funcionamiento: high-performance mode para voltajes de alimentación que no inducen fallos en celdas implementadas en tecnología SRAM, y low-power mode para aquellos que sí lo hacen. En el modo high-performance mode,el procesador dispone de toda la capacidad de la cache.En el modo low-power mode se reduce la capacidad efectiva de la cache puesto que algunas de las celdas eDRAM se dedican a la recuperación de fallos de celdas SRAM.El estudio de prestaciones realizado muestra que éstas bajan hasta un máximo de 2.7% con respecto a una cache perfecta sin fallos. Finalmente, en esta tesis se propone una metodología para encontrar la relación óptima de voltaje/frecuencia con respecto al consumo energético sobre la cache heterogénea previamente diseñada. Para ello,primero se caracterizan los tipos de fallos SRAM y las probabilidades de fallo de los mismos.Después,se evalúa el consumo energético de diferentes combinaciones de voltaje/frecuencia cuando el sistema se encuentra en un modo de bajo consumo.El estudio muestra que la combinación óptima de voltaje y frecuencia desde el punto de vista energético no siempre corresponde al mínimo voltaje debido al imp[CA] La tecnologia SRAM s'ha utilitzat tradicionalment per a implementar les memòries cau degut a que és la tecnologia de memòria RAM més ràpida existent.Per contra, un dels principals inconvenients d'aquesta tecnologia és el seu elevat consum energètic.Per a reduir el consum els processadors moderns solen emprar dues tècniques complementàries: i)modes de funcionament de baix consum i ii)tecnologies de baix consum.La primera tècnica consisteix en utilitzar baixes freqüències i voltatges de funcionament.La principal limitació d'aquesta tècnica és que els defectes de fabricació poden afectar notablement a la fiabilitat de les cel·les SRAM en aquests modes.La segona tècnica agrupa tecnologies alternatives com la eDRAM, que ofereix àrea i consum mínims.L'inconvenient d'aquesta tecnologia és que les lectures són destructives i és més lenta que la SRAM. Aquesta tesi presenta tres contribucions principals centrades en caus de baix consum i tecnologies heterogènies: i)estudi de la capacitancia òptima de les cel·les eDRAM, ii)disseny d'una cau tolerant a fallades produïdes en les cel·les SRAM en modes de baix consum, iii)metodologia per a obtenir la relació òptima entre voltatge i freqüència en processadors amb modes de baix consum. Respecte a la primera contribució, en aquest treball es combinen les tecnologies SRAM i eDRAM per a aconseguir una memòria cau ràpida, de baix consum, àrea reduïda, i tolerant a les fallades inherents a la tecnologia SRAM.En primer lloc, aquesta dissertació se centra en un dels aspectes crítics de disseny de caus heterogènies: la capacitancia dels condensadors implementats amb tecnologia eDRAM.En aquesta dissertació s'identifica la capacitancia òptima d'una cache de dades L1 heterogènia mitjançant l'estudi del compromís entre prestacions i consum energètic.Els resultats experimentals mostren que condensadors de 10fF ofereixen prestacions similars a les d'una cau SRAM convencional estalviant un 55% de consum i reduint un 29% l'àrea ocupada per la cau. Respecte a la segona contribució, aquesta tesi proposa una organització de cau heterogènia tolerant a fallades.Actualment,reduir el voltatge d'alimentació és un mecanisme molt utilitzat per a reduir el consum en condicions de baixa càrrega.Per contra, les cel·les SRAM produeixen diferents tipus de fallades quan es redueix el voltatge d'alimentació i per tant limiten el voltatge mínim de funcionament del microprocessador. En la cau heterogènia proposta, les cel·les de memòria implementades amb tecnologia eDRAM serveixen de còpia de seguretat en cas de fallada de les cel·les SRAM, ja que el correcte funcionament de les cel·les eDRAM no es veu afectat per tensions reduïdes.L'arquitectura proposada consta de dues maneres de funcionament: high-performance mode per a voltatges d'alimentació que no indueixen fallades en cel·les implementades en tecnologia SRAM,i low-power mode per a aquells que sí ho fan.En el mode high-performance,el processador disposa de tota la capacitat de la cau.En el mode low-power es redueix la capacitat efectiva de la cau posat que algunes de les cel·les eDRAM es dediquen a la recuperació de fallades de cel·les SRAM.L'estudi de prestacions realitzat mostra que aquestes baixen fins a un màxim de 2.7% pel que fa a una cache perfecta sense fallades. Finalment,en aquesta tesi es proposa una metodologia per a trobar la relació òptima de voltatge/freqüència pel que fa al consum energètic sobre la cau heterogènia prèviament dissenyada.Per a açò,primer es caracteritzen els tipus de fallades SRAM i les probabilitats de fallada de les mateixes.Després,s'avalua el consum energètic de diferents combinacions de voltatge/freqüència quan el sistema es troba en un mode de baix consum.L'estudi mostra que la combinació òptima de voltatge i freqüència des del punt de vista energètic no sempre correspon al mínim voltatge degut a l'impacte de les fallades de SRAM en les presLorente Garcés, VJ. (2015). Cache architectures based on heterogeneous technologies to deal with manufacturing errors [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/58428TESI

    A survey of system level power management schemes in the dark-silicon era for many-core architectures

    Get PDF
    Power consumption in Complementary Metal Oxide Semiconductor (CMOS) technology has escalated to a point that only a fractional part of many-core chips can be powered-on at a time. Fortunately, this fraction can be increased at the expense of performance through the dark-silicon solution. However, with many-core integration set to be heading towards its thousands, power consumption and temperature increases per time, meaning the number of active nodes must be reduced drastically. Therefore, optimized techniques are demanded for continuous advancement in technology. Existing efforts try to overcome this challenge by activating nodes from different parts of the chip at the expense of communication latency. Other efforts on the other hand employ run-time power management techniques to manage the power performance of the cores trading-off performance for power. We found out that, for a significant amount of power to saved and high temperature to be avoided, focus should be on reducing the power consumption of all the on-chip components. Especially, the memory hierarchy and the interconnect. Power consumption can be minimized by, reducing the size of high leakage power dissipating elements, turning-off idle resources and integrating power saving materials

    Standby Supply Voltage Minimization for Reliable Nanoscale SRAMs

    Get PDF
    corecore