19 research outputs found

    Reliability-aware memory design using advanced reconfiguration mechanisms

    Get PDF
    Fast and Complex Data Memory systems has become a necessity in modern computational units in today's integrated circuits. These memory systems are integrated in form of large embedded memory for data manipulation and storage. This goal has been achieved by the aggressive scaling of transistor dimensions to few nanometer (nm) sizes, though; such a progress comes with a drawback, making it critical to obtain high yields of the chips. Process variability, due to manufacturing imperfections, along with temporal aging, mainly induced by higher electric fields and temperature, are two of the more significant threats that can no longer be ignored in nano-scale embedded memory circuits, and can have high impact on their robustness. Static Random Access Memory (SRAM) is one of the most used embedded memories; generally implemented with the smallest device dimensions and therefore its robustness can be highly important in nanometer domain design paradigm. Their reliable operation needs to be considered and achieved both in cell and also in architectural SRAM array design. Recently, and with the approach to near/below 10nm design generations, novel non-FET devices such as Memristors are attracting high attention as a possible candidate to replace the conventional memory technologies. In spite of their favorable characteristics such as being low power and highly scalable, they also suffer with reliability challenges, such as process variability and endurance degradation, which needs to be mitigated at device and architectural level. This thesis work tackles such problem of reliability concerns in memories by utilizing advanced reconfiguration techniques. In both SRAM arrays and Memristive crossbar memories novel reconfiguration strategies are considered and analyzed, which can extend the memory lifetime. These techniques include monitoring circuits to check the reliability status of the memory units, and architectural implementations in order to reconfigure the memory system to a more reliable configuration before a fail happens.Actualmente, el diseño de sistemas de memoria en circuitos integrados busca continuamente que sean más rápidos y complejos, lo cual se ha vuelto de gran necesidad para las unidades de computación modernas. Estos sistemas de memoria están integrados en forma de memoria embebida para una mejor manipulación de los datos y de su almacenamiento. Dicho objetivo ha sido conseguido gracias al agresivo escalado de las dimensiones del transistor, el cual está llegando a las dimensiones nanométricas. Ahora bien, tal progreso ha conllevado el inconveniente de una menor fiabilidad, dado que ha sido altamente difícil obtener elevados rendimientos de los chips. La variabilidad de proceso - debido a las imperfecciones de fabricación - junto con la degradación de los dispositivos - principalmente inducido por el elevado campo eléctrico y altas temperaturas - son dos de las más relevantes amenazas que no pueden ni deben ser ignoradas por más tiempo en los circuitos embebidos de memoria, echo que puede tener un elevado impacto en su robusteza final. Static Random Access Memory (SRAM) es una de las celdas de memoria más utilizadas en la actualidad. Generalmente, estas celdas son implementadas con las menores dimensiones de dispositivos, lo que conlleva que el estudio de su robusteza es de gran relevancia en el actual paradigma de diseño en el rango nanométrico. La fiabilidad de sus operaciones necesita ser considerada y conseguida tanto a nivel de celda de memoria como en el diseño de arquitecturas complejas basadas en celdas de memoria SRAM. Actualmente, con el diseño de sistemas basados en dispositivos de 10nm, dispositivos nuevos no-FET tales como los memristores están atrayendo una elevada atención como posibles candidatos para reemplazar las actuales tecnologías de memorias convencionales. A pesar de sus características favorables, tales como el bajo consumo como la alta escabilidad, ellos también padecen de relevantes retos de fiabilidad, como son la variabilidad de proceso y la degradación de la resistencia, la cual necesita ser mitigada tanto a nivel de dispositivo como a nivel arquitectural. Con todo esto, esta tesis doctoral afronta tales problemas de fiabilidad en memorias mediante la utilización de técnicas de reconfiguración avanzada. La consideración de nuevas estrategias de reconfiguración han resultado ser validas tanto para las memorias basadas en celdas SRAM como en `memristive crossbar¿, donde se ha observado una mejora significativa del tiempo de vida en ambos casos. Estas técnicas incluyen circuitos de monitorización para comprobar la fiabilidad de las unidades de memoria, y la implementación arquitectural con el objetivo de reconfigurar los sistemas de memoria hacia una configuración mucho más fiables antes de que el fallo suced

    Improving the Reliability of Microprocessors under BTI and TDDB Degradations

    Get PDF
    Reliability is a fundamental challenge for current and future microprocessors with advanced nanoscale technologies. With smaller gates, thinner dielectric and higher temperature microprocessors are vulnerable under aging mechanisms such as Bias Temperature Instability (BTI) and Temperature Dependent Dielectric Breakdown (TDDB). Under continuous stress both parametric and functional errors occur, resulting compromised microprocessor lifetime. In this thesis, based on the thorough study on BTI and TDDB mechanisms, solutions are proposed to mitigating the aging processes on memory based and random logic structures in modern out-of-order microprocessors. A large area of processor core is occupied by memory based structure that is vulnerable to BTI induced errors. The problem is exacerbated when PBTI degradation in NMOS is as severe as NBTI in PMOS in high-k metal gate technology. Hence a novel design is proposed to recover 4 internal gates within a SRAM cell simultaneously to mitigate both NBTI and PBTI effects. This technique is applied to both the L2 cache banks and the busy function units with storage cells in out-of-order pipeline in two different ways. For the L2 cache banks, redundant cache bank is added exclusively for proactive recovery rotation. For the critical and busy function units in out-of-order pipelines, idle cycles are exploited at per-buffer-entry level. Different from memory based structures, combinational logic structures such as function units in execution stage can not use low overhead redundancy to tolerate errors due to their irregular structure. A design framework that aims to improve the reliability of the vulnerable functional units of a processor core is designed and implemented. The approach is designing a generic function unit (GFU) that can be reconfigured to replace a particular functional unit (FU) while it is being recovered for improved lifetime. Although flexible, the GFU is slower than the original target FUs. So GFU is carefully designed so as to minimize the performance loss when it is in-use. More schemes are also designed to avoid using the GFU on performance critical paths of a program execution

    Device and Circuit Level EMI Induced Vulnerability: Modeling and Experiments

    Get PDF
    Electro-magnetic interference (EMI) commonly exists in electronic equipment containing semiconductor-based integrated circuits (ICs). Metal-oxide-semiconductor field-effect-transistors (MOSFETs) in the ICs may be disrupted under EMI conditions due to transient voltage-current surges, and their internal states may change undesirably. In this work, the vulnerabilities of silicon MOSFETs under EMI are studied at the device and the circuit levels, categorized as non-permanent upsets (``Soft Errors'') and permanent damages (``Hard Failures''). The Soft Errors, such as temporary bit errors and waveform distortions, may happen or be intensified under EMI, as the transient disruptions activate unwanted and highly non-linear changes inside MOSFETs, such as Impact Ionization and Snapback. The system may be corrected from the erroneous state when the EMI condition is removed. We simulate planar silicon n-type MOSFETs at the device level to study the physical mechanisms leading to or complicate the short-term, signal-level Soft Errors. We experimentally tested commercially available MOSFET devices. Not included in regular MOSFET models, exponential-like current increases as the terminal voltage increases are observed and explained using the device-level knowledge. We develop a compact Soft Error model, compatible with circuit simulators using lumped (or compact-model) components and closed-form expressions, such as SPICE, and calibrate it with our in-house experimental data using an in-house extraction technique based on the Genetic Algorithm. Example circuits are simulated using the extracted device model and under EMI-induced transient disruptions. The EMI voltage-current disruptions may also lead to permanent Hard Failures that cannot be repaired without replacement. One type of Hard Failures, the MOSFET gate dielectric (or ``oxide'') breakdown, can result in input-output relation changes and additional thermal runaway. We have fabricated individual MOSFET devices at the FabLab at the University of Maryland NanoCenter. We experimentally stress-test the fabricated devices and observe the rapid, permanent oxide breakdown. Then, we simulate a nano-scale FinFET device with ultra-thin gate oxide at the device level. Then, we apply the knowledge from our experiments to the simulated FinFET, producing a gate oxide breakdown Hard Failure circuit model. The proposed workflow enables the evaluation of EMI-induced vulnerabilities in circuit simulations before actual fabrication and experiments, which can help the early-stage prototyping process and reduce the development time

    Experimental Characterization of Random Telegraph Noise and Hot Carrier Aging of Nano-scale MOSFETs

    Get PDF
    One of the emerging challenges in the scaling of MOSFETs is the reliability of ultra-thin gate dielectrics. Various sources can cause device aging, such as hot carrier aging (HCA), negative bias temperature instability (NBTI), positive bias temperature instability (PBTI), and time dependent device breakdown (TDDB). Among them, hot carrier aging (HCA) has attracted much attention recently, because it is limiting the device lifetime. As the channel length of MOSFETs becomes smaller, the lateral electrical field increases and charge carriers become sufficiently energetic (“hot”) to cause damage to the device when they travel through the space charge region near the drain. Unlike aging that causes device parameters, such as threshold voltage, to drift in one direction, nano-scale devices also suffer from Random Telegraph Noise (RTN), where the current can fluctuate under fixed biases. RTN is caused by capturing/emitting charge carriers from/to the conduction channel. As the device sizes are reduced to the nano-meters, a single trap can cause substantial fluctuation in the current and threshold voltage. Although early works on HCA and RTN have improved the understanding, many issues remain unresolved and the aim of this project is to address these issues. The project is broadly divided into three parts: (i) an investigation on the HCA kinetics and how to predict HCA-induced device lifetime, (ii) a study of the interaction between HCA and RTN, and (iii) developing a new technique for directly measuring the RTN-induced jitter in the threshold voltage. To predict the device lifetime, a reliable aging kinetics is indispensable. Although early works show that HCA follows a power law, there are uncertainties in the extraction of the time exponent, making the prediction doubtful. A systematic experimental investigation was carried out in Chapter 4 and both the stress conditions and measurement parameters were carefully selected. It was found that the forward saturation current, commonly used in early work for monitoring HCA, leads to an overestimation of time exponents, because part of the damaged region is screened off by the space charges near the drain. Another source of errors comes from the inclusion of as-grown defects in the aging kinetics, which is not caused by aging. This leads to an underestimation of the time exponent. After correcting these errors, a reliable HCA kinetics is established and its predictive capability is demonstrated. There is confusion on how HCA and RTN interact and this is researched into in Chapter 5. The results show that for a device of average RTN, HCA only has a modest impact on RTN. RTN can either increase or decrease after HCA, depending on whether the local current under the RTN traps is rising or reducing. For a device of abnormally high RTN, RTN reduces substantially after HCA and the mechanism for this reduction is explored. The RTN-induced threshold voltage jitter, ∆Vth, is difficult to measure, as it is typically small and highly dynamic. Early works estimate this ∆Vth from the change in drain current and the accuracy of this estimation is not known. Chapter 6 focuses on developing a new ‘Trigger-When-Charged’ technique for directly measuring the RTN-induced ∆Vth. It will be shown that early works overestimate ∆Vth by a factor of two and the origin of this overestimation is investigated. This thesis consists of seven chapters. Chapter 1 introduces the project and its objectives. A literature review is given in Chapter 2. Chapter 3 covers the test facilities, measurement techniques, and devices used in this project. The main experimental results and analysis are given in Chapters 4-6, as described above. Finally, Chapter 7 concludes the project and discusses future works

    Exploiting heterogeneity in Chip-Multiprocessor Design

    Get PDF
    In the past decade, semiconductor manufacturers are persistent in building faster and smaller transistors in order to boost the processor performance as projected by Moore’s Law. Recently, as we enter the deep submicron regime, continuing the same processor development pace becomes an increasingly difficult issue due to constraints on power, temperature, and the scalability of transistors. To overcome these challenges, researchers propose several innovations at both architecture and device levels that are able to partially solve the problems. These diversities in processor architecture and manufacturing materials provide solutions to continuing Moore’s Law by effectively exploiting the heterogeneity, however, they also introduce a set of unprecedented challenges that have been rarely addressed in prior works. In this dissertation, we present a series of in-depth studies to comprehensively investigate the design and optimization of future multi-core and many-core platforms through exploiting heteroge-neities. First, we explore a large design space of heterogeneous chip multiprocessors by exploiting the architectural- and device-level heterogeneities, aiming to identify the optimal design patterns leading to attractive energy- and cost-efficiencies in the pre-silicon stage. After this high-level study, we pay specific attention to the architectural asymmetry, aiming at developing a heterogeneity-aware task scheduler to optimize the energy-efficiency on a given single-ISA heterogeneous multi-processor. An advanced statistical tool is employed to facilitate the algorithm development. In the third study, we shift our concentration to the device-level heterogeneity and propose to effectively leverage the advantages provided by different materials to solve the increasingly important reliability issue for future processors

    A Holistic Solution for Reliability of 3D Parallel Systems

    Full text link
    As device scaling slows down, emerging technologies such as 3D integration and carbon nanotube field-effect transistors are among the most promising solutions to increase device density and performance. These emerging technologies offer shorter interconnects, higher performance, and lower power. However, higher levels of operating temperatures and current densities project significantly higher failure rates. Moreover, due to the infancy of the manufacturing process, high variation, and defect densities, chip designers are not encouraged to consider these emerging technologies as a stand-alone replacement for Silicon-based transistors. The goal of this dissertation is to introduce new architectural and circuit techniques that can work around high-fault rates in the emerging 3D technologies, improving performance and reliability comparable to Silicon. We propose a new holistic approach to the reliability problem that addresses the necessary aspects of an effective solution such as detection, diagnosis, repair, and prevention synergically for a practical solution. By leveraging 3D fabric layouts, it proposes the underlying architecture to efficiently repair the system in the presence of faults. This thesis presents a fault detection scheme by re-executing instructions on idle identical units that distinguishes between transient and permanent faults while localizing it to the granularity of a pipeline stage. Furthermore, with the use of a dynamic and adaptive reconfiguration policy based on activity factors and temperature variation, we propose a framework that delivers a significant improvement in lifetime management to prevent faults due to aging. Finally, a design framework that can be used for large-scale chip production while mitigating yield and variation failures to bring up Carbon Nano Tube-based technology is presented. The proposed framework is capable of efficiently supporting high-variation technologies by providing protection against manufacturing defects at different granularities: module and pipeline-stage levels.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/168118/1/javadb_1.pd

    Cross-Layer Approaches for an Aging-Aware Design of Nanoscale Microprocessors

    Get PDF
    Thanks to aggressive scaling of transistor dimensions, computers have revolutionized our life. However, the increasing unreliability of devices fabricated in nanoscale technologies emerged as a major threat for the future success of computers. In particular, accelerated transistor aging is of great importance, as it reduces the lifetime of digital systems. This thesis addresses this challenge by proposing new methods to model, analyze and mitigate aging at microarchitecture-level and above

    AI/ML Algorithms and Applications in VLSI Design and Technology

    Full text link
    An evident challenge ahead for the integrated circuit (IC) industry in the nanometer regime is the investigation and development of methods that can reduce the design complexity ensuing from growing process variations and curtail the turnaround time of chip manufacturing. Conventional methodologies employed for such tasks are largely manual; thus, time-consuming and resource-intensive. In contrast, the unique learning strategies of artificial intelligence (AI) provide numerous exciting automated approaches for handling complex and data-intensive tasks in very-large-scale integration (VLSI) design and testing. Employing AI and machine learning (ML) algorithms in VLSI design and manufacturing reduces the time and effort for understanding and processing the data within and across different abstraction levels via automated learning algorithms. It, in turn, improves the IC yield and reduces the manufacturing turnaround time. This paper thoroughly reviews the AI/ML automated approaches introduced in the past towards VLSI design and manufacturing. Moreover, we discuss the scope of AI/ML applications in the future at various abstraction levels to revolutionize the field of VLSI design, aiming for high-speed, highly intelligent, and efficient implementations
    corecore