132 research outputs found
Unreliable Silicon: Circuit through System-Level Techniques for Mitigating the Adverse Effects of Process Variation, Device Degradation and Environmental Conditions.
Designing and manufacturing integrated circuits in advanced, highly-scaled processing technologies that meet stringent specification sets is an increasingly unreliable proposition. Dimensional processing variations, time and stress dependent device degradation and potentially varying environmental conditions exacerbate deviations in performance, power and even functionality of integrated circuits. This work explores a system-level adaptive design philosophy intended to mitigate the power and performance impact of unreliable silicon devices and presents enabling circuits for SRAM variation mitigation and in-situ measurement of device degradation in 130nm and 45nm processing technologies. An adaptation of RAZOR-based DVS designed for on-chip memory power reduction and reliability lifetime improvement enables the elimination of 250 mV of voltage margin in a 1.8V design, with up to 500 mV of reduction when allowing 5% of memory operations to use multiple cycles. A novel PID-controlled dynamic reliability management (DRM) system is presented, allowing user-specified circuit lifetime to be dynamically managed via dynamic voltage and frequency scaling. Peak performance improvement of 20-35% is achievable in typical processing systems by allowing brief periods of elevated voltage operation through the real-time DRM system, while minimizing voltage during non-critical periods of operation to maximize circuit lifetime. A probabilistic analysis of oxide breakdown using the percolation model indicates the need for 1000-2000 integrated in-situ sensors to achieve oxide lifetime prediction error at or under 10%. The conclusions from the oxide analysis are used to guide the design of a series of novel on-chip reliability monitoring circuits for use in a real-time DRM system. A 130nm in-situ oxide breakdown measurement sensor presented is the first published design of an oxide-breakdown oriented circuit and is compatible with standard-cell style automatic âplace and routeâ design styles used in the majority of application specific integrated circuit designs. Measured results show increases in gate oxide leakage of 14-35% after accelerated stress testing. A second generation design of the on-chip oxide degradation sensor is presented that reduces stress mode power consumption by 111,785X over the initial design while providing an ideal 1:1 mapping of gate leakage to output frequency in extracted simulations.Ph.D.Electrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/60701/1/ekarl_1.pd
Solid State Circuits Technologies
The evolution of solid-state circuit technology has a long history within a relatively short period of time. This technology has lead to the modern information society that connects us and tools, a large market, and many types of products and applications. The solid-state circuit technology continuously evolves via breakthroughs and improvements every year. This book is devoted to review and present novel approaches for some of the main issues involved in this exciting and vigorous technology. The book is composed of 22 chapters, written by authors coming from 30 different institutions located in 12 different countries throughout the Americas, Asia and Europe. Thus, reflecting the wide international contribution to the book. The broad range of subjects presented in the book offers a general overview of the main issues in modern solid-state circuit technology. Furthermore, the book offers an in depth analysis on specific subjects for specialists. We believe the book is of great scientific and educational value for many readers. I am profoundly indebted to the support provided by all of those involved in the work. First and foremost I would like to acknowledge and thank the authors who worked hard and generously agreed to share their results and knowledge. Second I would like to express my gratitude to the Intech team that invited me to edit the book and give me their full support and a fruitful experience while working together to combine this book
Techniques for Aging, Soft Errors and Temperature to Increase the Reliability of Embedded On-Chip Systems
This thesis investigates the challenge of providing an abstracted, yet sufficiently accurate reliability estimation for embedded on-chip systems. In addition, it also proposes new techniques to increase the reliability of register files within processors against aging effects and soft errors. It also introduces a novel thermal measurement setup that perspicuously captures the infrared images of modern multi-core processors
Reliability-aware memory design using advanced reconfiguration mechanisms
Fast and Complex Data Memory systems has become a necessity in modern computational units in today's integrated circuits. These memory systems are integrated in form of large embedded memory for data manipulation and storage. This goal has been achieved by the aggressive scaling of transistor dimensions to few nanometer (nm) sizes, though; such a progress comes with a drawback, making it critical to obtain high yields of the chips. Process variability, due to manufacturing imperfections, along with temporal aging, mainly induced by higher electric fields and temperature, are two of the more significant threats that can no longer be ignored in nano-scale embedded memory circuits, and can have high impact on their robustness.
Static Random Access Memory (SRAM) is one of the most used embedded memories; generally implemented with the smallest device dimensions and therefore its robustness can be highly important in nanometer domain design paradigm. Their reliable operation needs to be considered and achieved both in cell and also in architectural SRAM array design.
Recently, and with the approach to near/below 10nm design generations, novel non-FET devices such as Memristors are attracting high attention as a possible candidate to replace the conventional memory technologies. In spite of their favorable characteristics such as being low power and highly scalable, they also suffer with reliability challenges, such as process variability and endurance degradation, which needs to be mitigated at device and architectural level.
This thesis work tackles such problem of reliability concerns in memories by utilizing advanced reconfiguration techniques. In both SRAM arrays and Memristive crossbar memories novel reconfiguration strategies are considered and analyzed, which can extend the memory lifetime. These techniques include monitoring circuits to check the reliability status of the memory units, and architectural implementations in order to reconfigure the memory system to a more reliable configuration before a fail happens.Actualmente, el diseño de sistemas de memoria en circuitos integrados busca continuamente que sean mĂĄs rĂĄpidos y complejos, lo cual se ha vuelto de gran necesidad para las unidades de computaciĂłn modernas. Estos sistemas de memoria estĂĄn integrados en forma de memoria embebida para una mejor manipulaciĂłn de los datos y de su almacenamiento. Dicho objetivo ha sido conseguido gracias al agresivo escalado de las dimensiones del transistor, el cual estĂĄ llegando a las dimensiones nanomĂ©tricas. Ahora bien, tal progreso ha conllevado el inconveniente de una menor fiabilidad, dado que ha sido altamente difĂcil obtener elevados rendimientos de los chips. La variabilidad de proceso - debido a las imperfecciones de fabricaciĂłn - junto con la degradaciĂłn de los dispositivos - principalmente inducido por el elevado campo elĂ©ctrico y altas temperaturas - son dos de las mĂĄs relevantes amenazas que no pueden ni deben ser ignoradas por mĂĄs tiempo en los circuitos embebidos de memoria, echo que puede tener un elevado impacto en su robusteza final. Static Random Access Memory (SRAM) es una de las celdas de memoria mĂĄs utilizadas en la actualidad. Generalmente, estas celdas son implementadas con las menores dimensiones de dispositivos, lo que conlleva que el estudio de su robusteza es de gran relevancia en el actual paradigma de diseño en el rango nanomĂ©trico. La fiabilidad de sus operaciones necesita ser considerada y conseguida tanto a nivel de celda de memoria como en el diseño de arquitecturas complejas basadas en celdas de memoria SRAM. Actualmente, con el diseño de sistemas basados en dispositivos de 10nm, dispositivos nuevos no-FET tales como los memristores estĂĄn atrayendo una elevada atenciĂłn como posibles candidatos para reemplazar las actuales tecnologĂas de memorias convencionales. A pesar de sus caracterĂsticas favorables, tales como el bajo consumo como la alta escabilidad, ellos tambiĂ©n padecen de relevantes retos de fiabilidad, como son la variabilidad de proceso y la degradaciĂłn de la resistencia, la cual necesita ser mitigada tanto a nivel de dispositivo como a nivel arquitectural. Con todo esto, esta tesis doctoral afronta tales problemas de fiabilidad en memorias mediante la utilizaciĂłn de tĂ©cnicas de reconfiguraciĂłn avanzada. La consideraciĂłn de nuevas estrategias de reconfiguraciĂłn han resultado ser validas tanto para las memorias basadas en celdas SRAM como en `memristive crossbarÂż, donde se ha observado una mejora significativa del tiempo de vida en ambos casos. Estas tĂ©cnicas incluyen circuitos de monitorizaciĂłn para comprobar la fiabilidad de las unidades de memoria, y la implementaciĂłn arquitectural con el objetivo de reconfigurar los sistemas de memoria hacia una configuraciĂłn mucho mĂĄs fiables antes de que el fallo suced
Dependable Embedded Systems
This Open Access book introduces readers to many new techniques for enhancing and optimizing reliability in embedded systems, which have emerged particularly within the last five years. This book introduces the most prominent reliability concerns from todayâs points of view and roughly recapitulates the progress in the community so far. Unlike other books that focus on a single abstraction level such circuit level or system level alone, the focus of this book is to deal with the different reliability challenges across different levels starting from the physical level all the way to the system level (cross-layer approaches). The book aims at demonstrating how new hardware/software co-design solution can be proposed to ef-fectively mitigate reliability degradation such as transistor aging, processor variation, temperature effects, soft errors, etc. Provides readers with latest insights into novel, cross-layer methods and models with respect to dependability of embedded systems; Describes cross-layer approaches that can leverage reliability through techniques that are pro-actively designed with respect to techniques at other layers; Explains run-time adaptation and concepts/means of self-organization, in order to achieve error resiliency in complex, future many core systems
Recommended from our members
A Performance-Efficient and Practical Processor Error Recovery Framework
Continued reduction in the size of a transistor has affected the reliability of pro-
cessors built using them. This is primarily due to factors such as inaccuracies while
manufacturing, as well as non-ideal operating conditions, causing transistors to slow
down consistently, eventually leading to permanent breakdown and erroneous operation
of the processor. Permanent transistor breakdown, or faults, can occur at any point in
time in the processorâs lifetime. Errors are the discrepancies in the output of faulty
circuits. This dissertation shows that the components containing faults can continue
operating if the errors caused by them are within certain bounds. Further, the lifetime
of a processor can be increased by adding supportive structures that start working
once the processor develops these hard errors.
This dissertation has three major contributions, namely REPAIR, FaultSim and
PreFix. REPAIR is a fault tolerant system with minimal changes to the processor
design. It uses an external Instruction Re-execution Unit (IRU) to perform operations,
which the faulty processor might have erroneously executed. Instructions that are
found to use faulty hardware are then re-executed on the IRU. REPAIR shows that
the performance overhead of such targeted re-execution is low for a limited number of
faults.
FaultSim is a fast fault-simulator capable of simulating large circuits at the transistor
level. It is developed in this dissertation to understand the effect of faults on different
circuits. It performs digital logic based simulations, trading off analogue accuracy with
speed, while still being able to support most fault models. A 32-bit addition takes
under 15 micro-seconds, while simulating more than 1500 transistors. It can also be
integrated into an architectural simulator, which added a performance overhead of 10 to 26 percent to a simulation. The results obtained show that single faults cause an
error in an adder in less than 10 percent of the inputs.
PreFix brings together the fault models created using FaultSim and the design
directions found using REPAIR. PreFix performs re-execution of instructions on a
remote core, which pick up instructions to execute using a global instruction buffer.
Error prediction and detection are used to reduce the number of re-executed instructions.
PreFix has an area overhead of 3.5 percent in the setup used, and the performance
overhead is within 5 percent of a fault-free case. This dissertation shows that faults
in processors can be tolerated without explicitly switching off any component, and
minimal redundancy is sufficient to achieve the same
Energy-Efficient and Reliable Computing in Dark Silicon Era
Dark silicon denotes the phenomenon that, due to thermal and power constraints, the fraction of transistors that can operate at full frequency is decreasing in each technology generation. Mooreâs law and Dennard scaling had been backed and coupled appropriately for five decades to bring commensurate exponential performance via single core and later muti-core design. However, recalculating Dennard scaling for recent small technology sizes shows that current ongoing multi-core growth is demanding exponential thermal design power to achieve linear performance increase. This process hits a power wall where raises the amount of dark or dim silicon on future multi/many-core chips more and more. Furthermore, from another perspective, by increasing the number of transistors on the area of a single chip and susceptibility to internal defects alongside aging phenomena, which also is exacerbated by high chip thermal density, monitoring and managing the chip reliability before and after its activation is becoming a necessity. The proposed approaches and experimental investigations in this thesis focus on two main tracks: 1) power awareness and 2) reliability awareness in dark silicon era, where later these two tracks will combine together. In the first track, the main goal is to increase the level of returns in terms of main important features in chip design, such as performance and throughput, while maximum power limit is honored. In fact, we show that by managing the power while having dark silicon, all the traditional benefits that could be achieved by proceeding in Mooreâs law can be also achieved in the dark silicon era, however, with a lower amount. Via the track of reliability awareness in dark silicon era, we show that dark silicon can be considered as an opportunity to be exploited for different instances of benefits, namely life-time increase and online testing. We discuss how dark silicon can be exploited to guarantee the system lifetime to be above a certain target value and, furthermore, how dark silicon can be exploited to apply low cost non-intrusive online testing on the cores. After the demonstration of power and reliability awareness while having dark silicon, two approaches will be discussed as the case study where the power and reliability awareness are combined together. The first approach demonstrates how chip reliability can be used as a supplementary metric for power-reliability management. While the second approach provides a trade-off between workload performance and system reliability by simultaneously honoring the given power budget and target reliability
FCC-hh: The Hadron Collider: Future Circular Collider Conceptual Design Report Volume 3
In response to the 2013 Update of the European Strategy for Particle Physics (EPPSU), the Future Circular Collider (FCC) study was launched as a world-wide international collaboration hosted by CERN. The FCC study covered an energy-frontier hadron collider (FCC-hh), a highest-luminosity high-energy lepton collider (FCC-ee), the corresponding 100 km tunnel infrastructure, as well as the physics opportunities of these two colliders, and a high-energy LHC, based on FCC-hh technology. This document constitutes the third volume of the FCC Conceptual Design Report, devoted to the hadron collider FCC-hh. It summarizes the FCC-hh physics discovery opportunities, presents the FCC-hh accelerator design, performance reach, and staged operation plan, discusses the underlying technologies, the civil engineering and technical infrastructure, and also sketches a possible implementation. Combining ingredients from the Large Hadron Collider (LHC), the high-luminosity LHC upgrade and adding novel technologies and approaches, the FCC-hh design aims at significantly extending the energy frontier to 100 TeV. Its unprecedented centre of-mass collision energy will make the FCC-hh a unique instrument to explore physics beyond the Standard Model, offering great direct sensitivity to new physics and discoveries
- âŠ