33 research outputs found

    Timing speculation and adaptive reliable overclocking techniques for aggressive computer systems

    Get PDF
    Computers have changed our lives beyond our own imagination in the past several decades. The continued and progressive advancements in VLSI technology and numerous micro-architectural innovations have played a key role in the design of spectacular low-cost high performance computing systems that have become omnipresent in today\u27s technology driven world. Performance and dependability have become key concerns as these ubiquitous computing machines continue to drive our everyday life. Every application has unique demands, as they run in diverse operating environments. Dependable, aggressive and adaptive systems improve efficiency in terms of speed, reliability and energy consumption. Traditional computing systems run at a fixed clock frequency, which is determined by taking into account the worst-case timing paths, operating conditions, and process variations. Timing speculation based reliable overclocking advocates going beyond worst-case limits to achieve best performance while not avoiding, but detecting and correcting a modest number of timing errors. The success of this design methodology relies on the fact that timing critical paths are rarely exercised in a design, and typical execution happens much faster than the timing requirements dictated by worst-case design methodology. Better-than-worst-case design methodology is advocated by several recent research pursuits, which exploit dependability techniques to enhance computer system performance. In this dissertation, we address different aspects of timing speculation based adaptive reliable overclocking schemes, and evaluate their role in the design of low-cost, high performance, energy efficient and dependable systems. We visualize various control knobs in the design that can be favorably controlled to ensure different design targets. As part of this research, we extend the SPRIT3E, or Superscalar PeRformance Improvement Through Tolerating Timing Errors, framework, and characterize the extent of application dependent performance acceleration achievable in superscalar processors by scrutinizing the various parameters that impact the operation beyond worst-case limits. We study the limitations imposed by short-path constraints on our technique, and present ways to exploit them to maximize performance gains. We analyze the sensitivity of our technique\u27s adaptiveness by exploring the necessary hardware requirements for dynamic overclocking schemes. Experimental analysis based on SPEC2000 benchmarks running on a SimpleScalar Alpha processor simulator, augmented with error rate data obtained from hardware simulations of a superscalar processor, are presented. Even though reliable overclocking guarantees functional correctness, it leads to higher power consumption. As a consequence, reliable overclocking without considering on-chip temperatures will bring down the lifetime reliability of the chip. In this thesis, we analyze how reliable overclocking impacts the on-chip temperature of a microprocessor and evaluate the effects of overheating, due to such reliable dynamic frequency tuning mechanisms, on the lifetime reliability of these systems. We then evaluate the effect of performing thermal throttling, a technique that clamps the on-chip temperature below a predefined value, on system performance and reliability. Our study shows that a reliably overclocked system with dynamic thermal management achieves 25% performance improvement, while lasting for 14 years when being operated within 353K. Over the past five decades, technology scaling, as predicted by Moore\u27s law, has been the bedrock of semiconductor technology evolution. The continued downscaling of CMOS technology to deep sub-micron gate lengths has been the primary reason for its dominance in today\u27s omnipresent silicon microchips. Even as the transition to the next technology node is indispensable, the initial cost and time associated in doing so presents a non-level playing field for the competitors in the semiconductor business. As part of this thesis, we evaluate the capability of speculative reliable overclocking mechanisms to maximize performance at a given technology level. We evaluate its competitiveness when compared to technology scaling, in terms of performance, power consumption, energy and energy delay product. We present a comprehensive comparison for integer and floating point SPEC2000 benchmarks running on a simulated Alpha processor at three different technology nodes in normal and enhanced modes. Our results suggest that adopting reliable overclocking strategies will help skip a technology node altogether, or be competitive in the market, while porting to the next technology node. Reliability has become a serious concern as systems embrace nanometer technologies. In this dissertation, we propose a novel fault tolerant aggressive system that combines soft error protection and timing error tolerance. We replicate both the pipeline registers and the pipeline stage combinational logic. The replicated logic receives its inputs from the primary pipeline registers while writing its output to the replicated pipeline registers. The organization of redundancy in the proposed Conjoined Pipeline system supports overclocking, provides concurrent error detection and recovery capability for soft errors, intermittent faults and timing errors, and flags permanent silicon defects. The fast recovery process requires no checkpointing and takes three cycles. Back annotated post-layout gate-level timing simulations, using 45nm technology, of a conjoined two-stage arithmetic pipeline and a conjoined five-stage DLX pipeline processor, with forwarding logic, show that our approach, even under a severe fault injection campaign, achieves near 100% fault coverage and an average performance improvement of about 20%, when dynamically overclocked

    Combining Structural and Timing Errors in Overclocked Inexact Speculative Adders

    Get PDF
    Worst-case design is used in IoT devices and high performance data centers to ensure reliability, leading to a power efficiency loss. Recently, approximate computing has been proposed to trade off accuracy for efficiency. In this paper, we use Inexact Speculative Adders, which redesign the adder architecture to shorten its critical path and improve performance, but introduces controlled structural errors. On the other hand, overclocking is used to reduce conservative timing guardbands but could normally introduce catastrophic timing errors, we thus apply a supervised learning model to overclock speculative adders and predict their timing errors. We build a methodology to combine both structural and timing errors and analyze how they interplay with each other to limit the overal errors

    Spéculation temporelle algorithmique pour accélérateurs de réseaux de neuro

    Get PDF
    In this paper, we propose a technique for improving the efficiency of hardwareaccelerators based on timing speculation (overclocking) and fault tolerance. We augment theaccelerator with a lightweight error detection mechanism to protect against timing errors, enablingaggressive timing speculation. We demonstrate the validity of our approach for the convolutionlayers in Convolutional Neural Networks (CNN). We present an implementation of a fault-tolerantCNN accelerator combined with the lightweight error detection for convolution layers. The errordetection mechanism we have developed works at the algorithm level, based on algebraic propertiesof the computation, allowing the full implementation to be realized using High-Level Synthesistools. We use a set of Zybo boards to experimentally demonstrate that overclocking boosts thefrequency by 17-36% with low chances of error, and that the infrequent errors can be detected witha negligible overhead (only 1000 LUTs)

    Superscalar Processor Performance Enhancement through Reliable Dynamic Clock Frequency Tuning

    Get PDF
    Synchronous circuits are typically clocked considering worst case timing paths so that timing errors are avoided under all circumstances. In the case of a pipelined processor, this has special implications since the operating frequency of the entire pipeline is limited by the slowest stage. Our goal, in this paper, is to achieve higher performance in superscalar processors by dynamically varying the operating frequency during run time past worst case limits. The key objective is to see the effect of overclocking on superscalar processors for various benchmark applications, and analyze the associated overhead, in terms of extra hardware and error recovery penalty, when the clock frequency is adjusted dynamically. We tolerate timing errors occurring at speeds higher than what the circuit is designed to operate at by implementing an efficient error detection and recovery mechanism. We also study the limitations imposed by minimum path constraints on our technique. Experimental results show that an average performance gain up to 57% across all benchmark applications is achievable
    corecore