60 research outputs found
Degradation in FPGAs: Monitoring, Modeling and Mitigation
This dissertation targets the transistor aging degradation as well as the associated thermal challenges in FPGAs (since there is an exponential relation between aging and chip temperature). The main objectives are to perform experimentation, analysis and device-level model abstraction for modeling the degradation in FPGAs, then to monitor the FPGA to keep track of aging rates and ultimately to propose an aging-aware FPGA design flow to mitigate the aging
Techniques for Aging, Soft Errors and Temperature to Increase the Reliability of Embedded On-Chip Systems
This thesis investigates the challenge of providing an abstracted, yet sufficiently accurate reliability estimation for embedded on-chip systems. In addition, it also proposes new techniques to increase the reliability of register files within processors against aging effects and soft errors. It also introduces a novel thermal measurement setup that perspicuously captures the infrared images of modern multi-core processors
Multi-criteria optimization for energy-efficient multi-core systems-on-chip
The steady down-scaling of transistor dimensions has made possible the evolutionary progress leading to today’s high-performance multi-GHz microprocessors and core based System-on-Chip (SoC) that offer superior performance, dramatically reduced cost per function, and much-reduced physical size compared to their predecessors. On the negative side, this rapid scaling however also translates to high power densities, higher operating temperatures and reduced reliability making it imperative to address design issues that have cropped up in its wake. In particular, the aggressive physical miniaturization have increased CMOS fault sensitivity to the extent that many reliability constraints pose threat to the device normal operation and accelerate the onset of wearout-based failures. Among various wearout-based failure mechanisms, Negative biased temperature instability (NBTI) has been recognized as the most critical source of device aging.
The urge of reliable, low-power circuits is driving the EDA community to develop new design techniques, circuit solutions, algorithms, and software, that can address these critical issues. Unfortunately, this challenge is complicated by the fact that power and reliability are known to be intrinsically conflicting metrics: traditional solutions to improve reliability such as redundancy, increase of voltage levels, and up-sizing of critical devices do contrast with traditional low-power solutions, which rely on compact architectures, scaled supply voltages, and small devices.
This dissertation focuses on methodologies to bridge this gap and establishes an important link between low-power solutions and aging effects. More specifically, we proposed new architectural solutions based on power management strategies to enable the design of low-power, aging aware cache memories.
Cache memories are one of the most critical components for warranting reliable and timely operation. However, they are also more susceptible to aging effects. Due to symmetric structure of a memory cell, aging occurs regardless of the fact that a cell (or word) is accessed or not. Moreover, aging is a worst-case matric and line with worst-case access pattern determines the aging of the entire cache. In order to stop the aging of a memory cell, it must be put into a proper idle state when a cell (or word) is not accessed which require proper management of the idleness of each atomic unit of power management.
We have proposed several reliability management techniques based on the idea of cache partitioning to alleviate NBTI-induced aging and obtain joint energy and lifetime benefits. We introduce graceful degradation mechanism which allows different cache blocks into which a cache is partitioned to age at different rates. This implies that various sub-blocks become unreliable at different times, and the cache keeps functioning with reduced efficiency. We extended the capabilities of this architecture by integrating the concept of reconfigurable caches to maintain the performance of the cache throughout its lifetime. By this strategy, whenever a block becomes unreliable, the remaining cache is reconfigured to work as a smaller size cache with only a marginal degradation of performance.
Many mission-critical applications require guaranteed lifetime of their operations and therefore the hardware implementing their functionality. Such constraints are usually enforced by means of various reliability enhancing solutions mostly based on redundancy which are not energy-friendly. In our work, we have proposed a novel cache architecture in which a smart use of cache partitions for redundancy allows us to obtain cache that meet a desired lifetime target with minimal energy consumption
Thermal Management for Dependable On-Chip Systems
This thesis addresses the dependability issues in on-chip systems from a thermal perspective. This includes an explanation and analysis of models to show the relationship between dependability and tempature. Additionally, multiple novel methods for on-chip thermal management are introduced aiming to optimize thermal properties. Analysis of the methods is done through simulation and through infrared thermal camera measurements
Solid State Circuits Technologies
The evolution of solid-state circuit technology has a long history within a relatively short period of time. This technology has lead to the modern information society that connects us and tools, a large market, and many types of products and applications. The solid-state circuit technology continuously evolves via breakthroughs and improvements every year. This book is devoted to review and present novel approaches for some of the main issues involved in this exciting and vigorous technology. The book is composed of 22 chapters, written by authors coming from 30 different institutions located in 12 different countries throughout the Americas, Asia and Europe. Thus, reflecting the wide international contribution to the book. The broad range of subjects presented in the book offers a general overview of the main issues in modern solid-state circuit technology. Furthermore, the book offers an in depth analysis on specific subjects for specialists. We believe the book is of great scientific and educational value for many readers. I am profoundly indebted to the support provided by all of those involved in the work. First and foremost I would like to acknowledge and thank the authors who worked hard and generously agreed to share their results and knowledge. Second I would like to express my gratitude to the Intech team that invited me to edit the book and give me their full support and a fruitful experience while working together to combine this book
Design for prognostics and security in field programmable gate arrays (FPGAs).
There is an evolutionary progression of Field Programmable Gate Arrays (FPGAs)
toward more complex and high power density architectures such as Systems-on-
Chip (SoC) and Adaptive Compute Acceleration Platforms (ACAP). Primarily, this is
attributable to the continual transistor miniaturisation and more innovative and
efficient IC manufacturing processes. Concurrently, degradation mechanism of Bias
Temperature Instability (BTI) has become more pronounced with respect to its
ageing impact. It could weaken the reliability of VLSI devices, FPGAs in particular
due to their run-time reconfigurability. At the same time, vulnerability of FPGAs to
device-level attacks in the increasing cyber and hardware threat environment is also
quadrupling as the susceptible reliability realm opens door for the rogue elements to
intervene. Insertion of highly stealthy and malicious circuitry, called hardware
Trojans, in FPGAs is one of such malicious interventions. On the one hand where
such attacks/interventions adversely affect the security ambit of these devices, they
also undermine their reliability substantially. Hitherto, the security and reliability are
treated as two separate entities impacting the FPGA health. This has resulted in
fragmented solutions that do not reflect the true state of the FPGA operational and
functional readiness, thereby making them even more prone to hardware attacks.
The recent episodes of Spectre and Meltdown vulnerabilities are some of the key
examples. This research addresses these concerns by adopting an integrated
approach and investigating the FPGA security and reliability as two inter-dependent
entities with an additional dimension of health estimation/ prognostics. The design
and implementation of a small footprint frequency and threshold voltage-shift
detection sensor, a novel hardware Trojan, and an online transistor dynamic scaling
circuitry present a viable FPGA security scheme that helps build a strong
microarchitectural level defence against unscrupulous hardware attacks. Augmented
with an efficient Kernel-based learning technique for FPGA health
estimation/prognostics, the optimal integrated solution proves to be more
dependable and trustworthy than the prevalent disjointed approach.Samie, Mohammad (Associate)PhD in Transport System
Reliability in the face of variability in nanometer embedded memories
In this thesis, we have investigated the impact of parametric variations on the behaviour of one performance-critical processor structure - embedded memories. As variations manifest as a spread in power and performance, as a first step, we propose a novel modeling methodology that helps evaluate the impact of circuit-level optimizations on architecture-level design choices. Choices made at the design-stage ensure conflicting requirements from higher-levels are decoupled. We then complement such design-time optimizations with a runtime mechanism that takes advantage of adaptive body-biasing to lower power whilst improving performance in the presence of variability. Our proposal uses a novel fully-digital variation tracking hardware using embedded DRAM (eDRAM) cells to monitor run-time changes in cache latency and leakage. A special fine-grain body-bias generator uses the measurements to generate an optimal body-bias that is needed to meet the required yield targets. A novel variation-tolerant and soft-error hardened eDRAM cell is also proposed as an alternate candidate for replacing existing SRAM-based designs in latency critical memory structures. In the ultra low-power domain where reliable operation is limited by the minimum voltage of operation (Vddmin), we analyse the impact of failures on cache functional margin and functional yield. Towards this end, we have developed a fully automated tool (INFORMER) capable of estimating memory-wide metrics such as power, performance and yield accurately and rapidly. Using the developed tool, we then evaluate the #effectiveness of a new class of hybrid techniques in improving cache yield through failure prevention and correction. Having a holistic perspective of memory-wide metrics helps us arrive at design-choices optimized simultaneously for multiple metrics needed for maintaining lifetime requirements
Identification and Rejuvenation of NBTI-Critical Logic Paths in Nanoscale Circuits
The Negative Bias Temperature Instability (NBTI) phenomenon is agreed to be one of the main reliability concerns in nanoscale circuits. It increases the threshold voltage of pMOS transistors, thus, slows down signal propagation along logic paths between flip-flops. NBTI may cause intermittent faults and, ultimately, the circuit’s permanent functional failures. In this paper, we propose an innovative NBTI mitigation approach by rejuvenating the nanoscale logic along NBTI-critical paths. The method is based on hierarchical identification of NBTI-critical paths and the generation of rejuvenation stimuli using an Evolutionary Algorithm. A new, fast, yet accurate model for computation of NBTI-induced delays at gate-level is developed. This model is based on intensive SPICE simulations of individual gates. The generated rejuvenation stimuli are used to drive those pMOS transistors to the recovery phase, which are the most critical for the NBTI-induced path delay. It is intended to apply the rejuvenation procedure to the circuit, as an execution overhead, periodically. Experimental results performed on a set of designs demonstrate reduction of NBTI-induced delays by up to two times with an execution overhead of 0.1 % or less. The proposed approach is aimed at extending the reliable lifetime of nanoelectronics
Dependable Embedded Systems
This Open Access book introduces readers to many new techniques for enhancing and optimizing reliability in embedded systems, which have emerged particularly within the last five years. This book introduces the most prominent reliability concerns from today’s points of view and roughly recapitulates the progress in the community so far. Unlike other books that focus on a single abstraction level such circuit level or system level alone, the focus of this book is to deal with the different reliability challenges across different levels starting from the physical level all the way to the system level (cross-layer approaches). The book aims at demonstrating how new hardware/software co-design solution can be proposed to ef-fectively mitigate reliability degradation such as transistor aging, processor variation, temperature effects, soft errors, etc. Provides readers with latest insights into novel, cross-layer methods and models with respect to dependability of embedded systems; Describes cross-layer approaches that can leverage reliability through techniques that are pro-actively designed with respect to techniques at other layers; Explains run-time adaptation and concepts/means of self-organization, in order to achieve error resiliency in complex, future many core systems
Robust Design of Variation-Sensitive Digital Circuits
The nano-age has already begun, where typical feature dimensions are smaller than 100nm. The operating frequency is expected to increase up to
12 GHz, and a single chip will contain over 12 billion transistors in 2020, as given by the International Technology Roadmap for Semiconductors
(ITRS) initiative. ITRS also predicts that the scaling of CMOS devices and process technology, as it is known today, will become much more
difficult as the industry advances towards the 16nm technology node and further. This aggressive scaling of CMOS technology has pushed the
devices to their physical limits. Design goals are governed by several factors other than power, performance and area such as process
variations, radiation induced soft errors, and aging degradation mechanisms. These new design challenges have a strong impact on the parametric
yield of nanometer digital circuits and also result in functional yield losses in variation-sensitive digital circuits such as Static Random
Access Memory (SRAM) and flip-flops. Moreover, sub-threshold SRAM and flip-flops circuits, which are aggravated by the strong demand for lower
power consumption, show larger sensitivity to these challenges which reduces their robustness and yield. Accordingly, it is not surprising that
the ITRS considers variability and reliability as the most challenging obstacles for nanometer digital circuits robust design.
Soft errors are considered one of the main reliability and robustness concerns in SRAM arrays in sub-100nm technologies due to low operating
voltage, small node capacitance, and high packing density. The SRAM arrays soft errors immunity is also affected by process variations. We
develop statistical design-oriented soft errors immunity variations models for super-threshold and sub-threshold SRAM cells accounting for
die-to-die variations and within-die variations. This work provides new design insights and highlights the important design knobs that can be
used to reduce the SRAM cells soft errors immunity variations. The developed models are scalable, bias dependent, and only require the
knowledge of easily measurable parameters. This makes them useful in early design exploration, circuit optimization as well as technology
prediction. The derived models are verified using Monte Carlo SPICE simulations, referring to an industrial hardware-calibrated 65nm CMOS
technology.
The demand for higher performance leads to very deep pipelining which means that hundreds of thousands of flip-flops are required to control
the data flow under strict timing constraints. A violation of the timing constraints at a flip-flop can result in latching incorrect data
causing the overall system to malfunction. In addition, the flip-flops power dissipation represents a considerable fraction of the total power
dissipation. Sub-threshold flip-flops are considered the most energy efficient solution for low power applications in which, performance is of
secondary importance. Accordingly, statistical gate sizing is conducted to different flip-flops topologies for timing yield improvement of
super-threshold flip-flops and power yield improvement of sub-threshold flip-flops. Following that, a comparative analysis between these
flip-flops topologies considering the required overhead for yield improvement is performed. This comparative analysis provides useful
recommendations that help flip-flops designers on selecting the best flip-flops topology that satisfies their system specifications while
taking the process variations impact and robustness requirements into account.
Adaptive Body Bias (ABB) allows the tuning of the transistor threshold voltage, Vt, by controlling the transistor body voltage. A forward
body bias reduces Vt, increasing the device speed at the expense of increased leakage power. Alternatively, a reverse body bias increases
Vt, reducing the leakage power but slowing the device. Therefore, the impact of process variations is mitigated by speeding up slow and
less leaky devices or slowing down devices that are fast and highly leaky. Practically, the implementation of the ABB is desirable to bias each
device in a design independently, to mitigate within-die variations. However, supplying so many separate voltages inside a die results in a
large area overhead. On the other hand, using the same body bias for all devices on the same die limits its capability to compensate for
within-die variations. Thus, the granularity level of the ABB scheme is a trade-off between the within-die variations compensation capability
and the associated area overhead. This work introduces new ABB circuits that exhibit lower area overhead by a factor of 143X than that of
previous ABB circuits. In addition, these ABB circuits are resolution free since no digital-to-analog converters or analog-to-digital
converters are required on their implementations. These ABB circuits are adopted to high performance critical paths, emulating a real
microprocessor architecture, for process variations compensation and also adopted to SRAM arrays, for Negative Bias Temperature Instability
(NBTI) aging and process variations compensation. The effectiveness of the new ABB circuits is verified by post layout simulation results and
test chip measurements using triple-well 65nm CMOS technology.
The highly capacitive nodes of wide fan-in dynamic circuits and SRAM bitlines limit the performance of these circuits. In addition, process
variations mitigation by statistical gate sizing increases this capacitance further and fails in achieving the target yield improvement. We
propose new negative capacitance circuits that reduce the overall parasitic capacitance of these highly capacitive nodes. These negative
capacitance circuits are adopted to wide fan-in dynamic circuits for timing yield improvement up to 99.87% and to SRAM arrays for read access
yield improvement up to 100%. The area and power overheads of these new negative capacitance circuits are amortized over the large die area of
the microprocessor and the SRAM array. The effectiveness of the new negative capacitance circuits is verified by post layout simulation results
and test chip measurements using 65nm CMOS technology
- …