360 research outputs found
Recommended from our members
Variation-Tolerant and Voltage-Scalable Integrated Circuits Design
Ultra-low-voltage (ULV) operation where the supply voltage of the digital computing hardware is scaled down to the level near or below transistor threshold voltage (e.g. 300-500mV) is a key technique to achieve high computing energy efficiency. It has enabled many new exciting applications in the field of Internet of Things (IoT) devices and energy-constrained applications such as medical implants, environment sensors, and micro-robots. Ultra-low-voltage (ULV) operation is also commonly used with the emerging architectures that are often non Von-Neumann style to empower energy-efficient cognitive computing.
One the biggest challenge in realizing ULV design is the large circuit delay variability. To guarantee functionality in the worst-case process, voltage, and temperature (PVT) condition, the traditional safety margin approach requires operating at a slower clock frequency or higher supply voltage which significantly limits the achievable energy efficiency of the hardware. To fully claim the energy efficiency of ULV, the large circuit delay variation needs to be adaptively handled. However, the existing adaptive techniques that are optimized for nominal supply voltage operation and traditional Von-Neumann architectures become inefficient for ULV designs and emerging architectures.
This thesis presents adaptive techniques based on timing error detection and correction (EDAC) that are more suitable for the energy-constrained ULV designs and the emerging architectures. The proposed techniques are demonstrated in three test chips: (1) R-Processor: A 0.4V resilient processor with a voltage-scalable and low-overhead in-situ EDAC technique. It achieves 38% energy efficiency improvement or 2.3X throughput improvement as compared to the traditional safety margin approach. (2) A 450mV timing-margin-free waveform sorter for brain computer interface (BCI) microsystem. It achieves 49.3% higher energy efficiency and 35.6% higher throughput than the traditional safety margin approach. (3) Ultra-low-power and robust power-management system which consists of a microprocessor employing ULV EDAC, 63-ratio integrated switched-capacitor DC-DC converter, and a fully-digital error based regulation controller.
In this thesis, we also explore circuits for emerging techniques. The first is temperature sensors for dynamic-thermal-management (DTM). The modern high-performance microprocessors suffer from ever-increasing power densities which has led to reliability concerns and increased cooling costs from excessive heat. In order to monitor and manage the thermal behavior, DTM techniques embed multiple temperature sensors and use its information. The size, accuracy, and voltage-scalability of the sensor are critical for the performance of DTM. Therefore, we propose a temperature sensor that directly senses transistor threshold voltage and the test chip demonstrates 9X smaller area, 3X higher accuracy, and 200mV lower voltage scalability (down to 400mV) than the previous state-of-art.
Another area of exploration is interconnect design for ultra-dynamic-voltage-scaling (UDVS) systems. UDVS has been proposed for applications that require both high performance and high energy efficiency. UDVS can provide peak performance with nominal supply voltage when work load is high. When work load is moderate or low, UDVS systems can switch to ULV operation for higher energy efficiency. One of the critical challenges for developing UDVS systems is the inflexibility in various circuit fabrics that are often optimized for a single supply voltage. One critical example is conventional repeater based long interconnects which suffers from non-optimal performance and energy efficiency in UDVS systems. Therefore, in this thesis, we propose a reconfigurable interconnect design based on regenerators and demonstrate near optimal performance and energy efficiency across the supply voltage of 0.3V and 1V
Exploiting Adaptive Techniques to Improve Processor Energy Efficiency
Rapid device-miniaturization keeps on inducing challenges in building energy efficient microprocessors. As the size of the transistors continuously decreasing, more uncertainties emerge in their operations. On the other hand, integrating more and more transistors on a single chip accentuates the need to lower its supply-voltage. This dissertation investigates one of the primary device uncertainties - timing error, in microprocessor performance bottleneck in NTC era. Then it proposes various innovative techniques to exploit these opportunities to maintain processor energy efficiency, in the context of emerging challenges. Evaluated with the cross-layer methodology, the proposed approaches achieve substantial improvements in processor energy efficiency, compared to other start-of-art techniques
A Timing-Monitoring Sequential for Forward and Backward Error-Detection in 28 nm FD-SOI
The increasing impact of variability on near-threshold nanometer circuits calls for a tighter online monitoring and control of the available timing margins. Error-detection sequentials are widely used together with error-correction techniques to operate digital designs with such carefully controlled far-below-worst-case margins, ensuring their correct operation even in the presence of uncertainties and variations. However, these registers are often designed only to either detect setup timing violations or to measure the available positive timing slack for a small detection-window. In this paper we propose a timing-monitoring sequential that provides both timing-monitoring modes, which can be selected at run-time depending on the desired timing-monitoring strategy. As the detection window of the presented circuit depends on the duty-cycle of the clock, either slow paths or fast paths can be monitored and measured with wide timing windows. The performance of this timing-monitoring sequential is evaluated in a 28nm FD-SOI process with post-layout simulations which show that the circuit is able to monitor a positive timing slack as small as 140 ps or to measure a path delay as fast as 50 ps. The proposed circuit is applied to a digital multiplier that was fabricated in a test chip and measurements show that the timing-monitoring sequentials are able to measure the critical path of the multiplier with a 1% accuracy and without incurring any timing violation
Circuits and Systems Advances in Near Threshold Computing
Modern society is witnessing a sea change in ubiquitous computing, in which people have embraced computing systems as an indispensable part of day-to-day existence. Computation, storage, and communication abilities of smartphones, for example, have undergone monumental changes over the past decade. However, global emphasis on creating and sustaining green environments is leading to a rapid and ongoing proliferation of edge computing systems and applications. As a broad spectrum of healthcare, home, and transport applications shift to the edge of the network, near-threshold computing (NTC) is emerging as one of the promising low-power computing platforms. An NTC device sets its supply voltage close to its threshold voltage, dramatically reducing the energy consumption. Despite showing substantial promise in terms of energy efficiency, NTC is yet to see widescale commercial adoption. This is because circuits and systems operating with NTC suffer from several problems, including increased sensitivity to process variation, reliability problems, performance degradation, and security vulnerabilities, to name a few. To realize its potential, we need designs, techniques, and solutions to overcome these challenges associated with NTC circuits and systems. The readers of this book will be able to familiarize themselves with recent advances in electronics systems, focusing on near-threshold computing
Timing speculation and adaptive reliable overclocking techniques for aggressive computer systems
Computers have changed our lives beyond our own imagination in the past several decades. The continued and progressive advancements in VLSI technology and numerous micro-architectural innovations have played a key role in the design of spectacular low-cost high performance computing systems that have become omnipresent in today\u27s technology driven world. Performance and dependability have become key concerns as these ubiquitous computing machines continue to drive our everyday life. Every application has unique demands, as they run in diverse operating environments. Dependable, aggressive and adaptive systems improve efficiency in terms of speed, reliability and energy consumption.
Traditional computing systems run at a fixed clock frequency, which is determined by taking into account the worst-case timing paths, operating conditions, and process variations. Timing speculation based reliable overclocking advocates going beyond worst-case limits to achieve best performance while not avoiding, but detecting and correcting a modest number of timing errors. The success of this design methodology relies on the fact that timing critical paths are rarely exercised in a design, and typical execution happens much faster than the timing requirements dictated by worst-case design methodology. Better-than-worst-case design methodology is advocated by several recent research pursuits, which exploit dependability techniques to enhance computer system performance.
In this dissertation, we address different aspects of timing speculation based adaptive reliable overclocking schemes, and evaluate their role in the design of low-cost, high performance, energy efficient and dependable systems. We visualize various control knobs in the design that can be favorably controlled to ensure different design targets.
As part of this research, we extend the SPRIT3E, or Superscalar PeRformance Improvement Through Tolerating Timing Errors, framework, and characterize the extent of application dependent performance acceleration achievable in superscalar processors by scrutinizing the various parameters that impact the operation beyond worst-case limits. We study the limitations imposed by short-path constraints on our technique, and present ways to exploit them to maximize performance gains. We analyze the sensitivity of our technique\u27s adaptiveness by exploring the necessary hardware requirements for dynamic overclocking schemes. Experimental analysis based on SPEC2000 benchmarks running on a SimpleScalar Alpha processor simulator, augmented with error rate data obtained from hardware simulations of a superscalar processor, are presented.
Even though reliable overclocking guarantees functional correctness, it leads to higher power consumption. As a consequence, reliable overclocking without considering on-chip temperatures will bring down the lifetime reliability of the chip. In this thesis, we analyze how reliable overclocking impacts the on-chip temperature of a microprocessor and evaluate the effects of overheating, due to such reliable dynamic frequency tuning mechanisms, on the lifetime reliability of these systems. We then evaluate the effect of performing thermal throttling, a technique that clamps the on-chip temperature below a predefined value, on system performance and reliability. Our study shows that a reliably overclocked system with dynamic thermal management achieves 25% performance improvement, while lasting for 14 years when being operated within 353K.
Over the past five decades, technology scaling, as predicted by Moore\u27s law, has been the bedrock of semiconductor technology evolution. The continued downscaling of CMOS technology to deep sub-micron gate lengths has been the primary reason for its dominance in today\u27s omnipresent silicon microchips. Even as the transition to the next technology node is indispensable, the initial cost and time associated in doing so presents a non-level playing field for the competitors in the semiconductor business. As part of this thesis, we evaluate the capability of speculative reliable overclocking mechanisms to maximize performance at a given technology level. We evaluate its competitiveness when compared to technology scaling, in terms of performance, power consumption, energy and energy delay product. We present a comprehensive comparison for integer and floating point SPEC2000 benchmarks running on a simulated Alpha processor at three different technology nodes in normal and enhanced modes. Our results suggest that adopting reliable overclocking strategies will help skip a technology node altogether, or be competitive in the market, while porting to the next technology node.
Reliability has become a serious concern as systems embrace nanometer technologies. In this dissertation, we propose a novel fault tolerant aggressive system that combines soft error protection and timing error tolerance. We replicate both the pipeline registers and the pipeline stage combinational logic. The replicated logic receives its inputs from the primary pipeline registers while writing its output to the replicated pipeline registers. The organization of redundancy in the proposed Conjoined Pipeline system supports overclocking, provides concurrent error detection and recovery capability for soft errors, intermittent faults and timing errors, and flags permanent silicon defects. The fast recovery process requires no checkpointing and takes three cycles. Back annotated post-layout gate-level timing simulations, using 45nm technology, of a conjoined two-stage arithmetic pipeline and a conjoined five-stage DLX pipeline processor, with forwarding logic, show that our approach, even under a severe fault injection campaign, achieves near 100% fault coverage and an average performance improvement of about 20%, when dynamically overclocked
Approaches to multiprocessor error recovery using an on-chip interconnect subsystem
For future multicores, a dedicated interconnect subsystem for on-chip monitors was found to be highly beneficial in terms of scalability, performance and area. In this thesis, such a monitor network (MNoC) is used for multicores to support selective error identification and recovery and maintain target chip reliability in the context of dynamic voltage and frequency scaling (DVFS). A selective shared memory multiprocessor recovery is performed using MNoC in which, when an error is detected, only the group of processors sharing an application with the affected processors are recovered. Although the use of DVFS in contemporary multicores provides significant protection from unpredictable thermal events, a potential side effect can be an increased processor exposure to soft errors. To address this issue, a flexible fault prevention and recovery mechanism has been developed to selectively enable a small amount of per-core dual modular redundancy (DMR) in response to increased vulnerability, as measured by the processor architectural vulnerability factor (AVF). Our new algorithm for DMR deployment aims to provide a stable effective soft error rate (SER) by using DMR in response to DVFS caused by thermal events. The algorithm is implemented in real-time on the multicore using MNoC and controller which evaluates thermal information and multicore performance statistics in addition to error information. DVFS experiments with a multicore simulator using standard benchmarks show an average 6% improvement in overall power consumption and a stable SER by using selective DMR versus continuous DMR deployment
Cross-Layer Optimization for Power-Efficient and Robust Digital Circuits and Systems
With the increasing digital services demand, performance and power-efficiency
become vital requirements for digital circuits and systems. However, the
enabling CMOS technology scaling has been facing significant challenges of
device uncertainties, such as process, voltage, and temperature variations. To
ensure system reliability, worst-case corner assumptions are usually made in
each design level. However, the over-pessimistic worst-case margin leads to
unnecessary power waste and performance loss as high as 2.2x. Since
optimizations are traditionally confined to each specific level, those safe
margins can hardly be properly exploited.
To tackle the challenge, it is therefore advised in this Ph.D. thesis to
perform a cross-layer optimization for digital signal processing circuits and
systems, to achieve a global balance of power consumption and output quality.
To conclude, the traditional over-pessimistic worst-case approach leads to
huge power waste. In contrast, the adaptive voltage scaling approach saves
power (25% for the CORDIC application) by providing a just-needed supply
voltage. The power saving is maximized (46% for CORDIC) when a more aggressive
voltage over-scaling scheme is applied. These sparsely occurred circuit errors
produced by aggressive voltage over-scaling are mitigated by higher level error
resilient designs. For functions like FFT and CORDIC, smart error mitigation
schemes were proposed to enhance reliability (soft-errors and timing-errors,
respectively). Applications like Massive MIMO systems are robust against lower
level errors, thanks to the intrinsically redundant antennas. This property
makes it applicable to embrace digital hardware that trades quality for power
savings.Comment: 190 page
Design of variability compensation architectures of digital circuits with adaptive body bias
The most critical concern in circuit is to achieve high level of performance with very tight power constraint. As the high performance circuits moved beyond 45nm technology one of the major issues is the parameter variation i.e. deviation in process, temperature and voltage (PVT) values from nominal specifications. A key process parameter subject to variation is the transistor threshold voltage (Vth) which impacts two important parameters: frequency and leakage power. Although the degradation can be compensated by the worstcase scenario based over-design approach, it induces remarkable power and performance overhead which is undesirable in tightly constrained designs. Dynamic voltage scaling (DVS) is a more power efficient approach, however its coarse granularity implies difficulty in handling fine grained variations. These factors have contributed to the growing interest in power aware robust circuit design. We propose a variability compensation architecture with adaptive body bias, for low power applications using 28nm FDSOI technology. The basic approach is based on a dynamic prediction and prevention of possible circuit timing errors. In our proposal we are using a Canary logic technique that enables the typical-case design. The body bias generation is based on a DLL type method which uses an external reference generator and voltage controlled delay line (VCDL) to generate the forward body bias (FBB) control signals. The adaptive technique is used for dynamic detection and correction of path failures in digital designs due to PVT variations. Instead of tuning the supply voltage, the key idea of the design approach is to tune the body bias voltage bymonitoring the error rate during operation. The FBB increases operating speed with an overhead in leakage power
- …