530 research outputs found

    A low-power cache system for high-performance processors

    Get PDF
    制度:新 ; 報告番号:甲3439号 ; 学位の種類:博士(工学) ; 授与年月日:12-Sep-11 ; 早大学位記番号:新576

    Chapter One – An Overview of Architecture-Level Power- and Energy-Efficient Design Techniques

    Get PDF
    Power dissipation and energy consumption became the primary design constraint for almost all computer systems in the last 15 years. Both computer architects and circuit designers intent to reduce power and energy (without a performance degradation) at all design levels, as it is currently the main obstacle to continue with further scaling according to Moore's law. The aim of this survey is to provide a comprehensive overview of power- and energy-efficient “state-of-the-art” techniques. We classify techniques by component where they apply to, which is the most natural way from a designer point of view. We further divide the techniques by the component of power/energy they optimize (static or dynamic), covering in that way complete low-power design flow at the architectural level. At the end, we conclude that only a holistic approach that assumes optimizations at all design levels can lead to significant savings.Peer ReviewedPostprint (published version

    Fault Tolerant Electronic System Design

    Get PDF
    Due to technology scaling, which means reduced transistor size, higher density, lower voltage and more aggressive clock frequency, VLSI devices may become more sensitive against soft errors. Especially for those devices used in safety- and mission-critical applications, dependability and reliability are becoming increasingly important constraints during the development of system on/around them. Other phenomena (e.g., aging and wear-out effects) also have negative impacts on reliability of modern circuits. Recent researches show that even at sea level, radiation particles can still induce soft errors in electronic systems. On one hand, processor-based system are commonly used in a wide variety of applications, including safety-critical and high availability missions, e.g., in the automotive, biomedical and aerospace domains. In these fields, an error may produce catastrophic consequences. Thus, dependability is a primary target that must be achieved taking into account tight constraints in terms of cost, performance, power and time to market. With standards and regulations (e.g., ISO-26262, DO-254, IEC-61508) clearly specify the targets to be achieved and the methods to prove their achievement, techniques working at system level are particularly attracting. On the other hand, Field Programmable Gate Array (FPGA) devices are becoming more and more attractive, also in safety- and mission-critical applications due to the high performance, low power consumption and the flexibility for reconfiguration they provide. Two types of FPGAs are commonly used, based on their configuration memory cell technology, i.e., SRAM-based and Flash-based FPGA. For SRAM-based FPGAs, the SRAM cells of the configuration memory highly susceptible to radiation induced effects which can leads to system failure; and for Flash-based FPGAs, even though their non-volatile configuration memory cells are almost immune to Single Event Upsets induced by energetic particles, the floating gate switches and the logic cells in the configuration tiles can still suffer from Single Event Effects when hit by an highly charged particle. So analysis and mitigation techniques for Single Event Effects on FPGAs are becoming increasingly important in the design flow especially when reliability is one of the main requirements

    Harnessing Checker Hierarchy for Reliable Microprocessors

    Get PDF
    Traditional fault-tolerant multi-threading architectures provide good fault tolerance by re-executing all the computations. However, such a full re-execution significantly increases the demand on the processor resources, resulting in severe performance degradation. To address this problem, this dissertation presents Active Verification Management (AVM) approaches that utilize a checker hierarchy to increase its performance with a minimal effect on the overall reliability. Based on a simplified queueing model, AVM employs a filter checker which prioritizes the verification candidates to selectively do verification. This dissertation proposes three filter checkers - based on (1) result usage, (2) result bitwidth, and (3) result anomaly - that exploit correctness-criticality metrics and anomaly speculation. Binary Correctness Criticality (BCC) and Likelihood of Correctness Criticality (LoCC) are metrics that quantify whether an instruction is important for reliability or how likely an instruction is correctness-critical, respectively. Based on the BCC, a result-usage-based filter checker mitigates the verification workload by bypassing instructions that are unnecessary for correct execution. Computing the LoCC is accomplished by exploiting information redundancy of compressing computationally useful data bits. Numerical significance hints let the result-bitwidth-based filter checker guide a verification priority effectively before the re-execution process starts. A result-anomaly-based filter checker exploits a value similarity property, which is defined by a frequent occurrence of partially identical values. Based on the biased distribution of similarity distance measure, this dissertation further investigates another application to exploit similar values for soft error tolerance with anomaly speculation. Extensive measurements show that the majority of instructions produce values that are different from the previous result value only in a few bits. Experimental results show that the proposed schemes accelerate the processor to be 180% faster than traditional fully-fault-tolerant processor, with a minimal impact on the overall soft error rate. With no AVM, congestion at the checker badly affects performance, to the tune of 57%, when compared to that of a non-fault-tolerant processor. These results explain that the proposed AVM has the potential to solve the verification congestion problem when perfect fault coverage is not needed

    Exploiting Natural On-chip Redundancy for Energy Efficient Memory and Computing

    Get PDF
    Power density is currently the primary design constraint across most computing segments and the main performance limiting factor. For years, industry has kept power density constant, while increasing frequency, lowering transistors supply (Vdd) and threshold (Vth) voltages. However, Vth scaling has stopped because leakage current is exponentially related to it. Transistor count and integration density keep doubling every process generation (Moore’s Law), but the power budget caps the amount of hardware that can be active at the same time, leading to dark silicon. With each new generation, there are more resources available, but we cannot fully exploit their performance potential. In the last years, different research trends have explored how to cope with dark silicon and unlock the energy efficiency of the chips, including Near-Threshold voltage Computing (NTC) and approximate computing. NTC aggressively lowers Vdd to values near Vth. This allows a substantial reduction in power, as dynamic power scales quadratically with supply voltage. The resultant power reduction could be used to activate more chip resources and potentially achieve performance improvements. Unfortunately, Vdd scaling is limited by the tight functionality margins of on-chip SRAM transistors. When scaling Vdd down to values near-threshold, manufacture-induced parameter variations affect the functionality of SRAM cells, which eventually become not reliable. A large amount of emerging applications, on the other hand, features an intrinsic error-resilience property, tolerating a certain amount of noise. In this context, approximate computing takes advantage of this observation and exploits the gap between the level of accuracy required by the application and the level of accuracy given by the computation, providing that reducing the accuracy translates into an energy gain. However, deciding which instructions and data and which techniques are best suited for approximation still poses a major challenge. This dissertation contributes in these two directions. First, it proposes a new approach to mitigate the impact of SRAM failures due to parameter variation for effective operation at ultra-low voltages. We identify two levels of natural on-chip redundancy: cache level and content level. The first arises because of the replication of blocks in multi-level cache hierarchies. We exploit this redundancy with a cache management policy that allocates blocks to entries taking into account the nature of the cache entry and the use pattern of the block. This policy obtains performance improvements between 2% and 34%, with respect to block disabling, a technique with similar complexity, incurring no additional storage overhead. The latter (content level redundancy) arises because of the redundancy of data in real world applications. We exploit this redundancy compressing cache blocks to fit them in partially functional cache entries. At the cost of a slight overhead increase, we can obtain performance within 2% of that obtained when the cache is built with fault-free cells, even if more than 90% of the cache entries have at least a faulty cell. Then, we analyze how the intrinsic noise tolerance of emerging applications can be exploited to design an approximate Instruction Set Architecture (ISA). Exploiting the ISA redundancy, we explore a set of techniques to approximate the execution of instructions across a set of emerging applications, pointing out the potential of reducing the complexity of the ISA, and the trade-offs of the approach. In a proof-of-concept implementation, the ISA is shrunk in two dimensions: Breadth (i.e., simplifying instructions) and Depth (i.e., dropping instructions). This proof-of-concept shows that energy can be reduced on average 20.6% at around 14.9% accuracy loss

    Cross layer reliability estimation for digital systems

    Get PDF
    Forthcoming manufacturing technologies hold the promise to increase multifuctional computing systems performance and functionality thanks to a remarkable growth of the device integration density. Despite the benefits introduced by this technology improvements, reliability is becoming a key challenge for the semiconductor industry. With transistor size reaching the atomic dimensions, vulnerability to unavoidable fluctuations in the manufacturing process and environmental stress rise dramatically. Failing to meet a reliability requirement may add excessive re-design cost to recover and may have severe consequences on the success of a product. %Worst-case design with large margins to guarantee reliable operation has been employed for long time. However, it is reaching a limit that makes it economically unsustainable due to its performance, area, and power cost. One of the open challenges for future technologies is building ``dependable'' systems on top of unreliable components, which will degrade and even fail during normal lifetime of the chip. Conventional design techniques are highly inefficient. They expend significant amount of energy to tolerate the device unpredictability by adding safety margins to a circuit's operating voltage, clock frequency or charge stored per bit. Unfortunately, the additional cost introduced to compensate unreliability are rapidly becoming unacceptable in today's environment where power consumption is often the limiting factor for integrated circuit performance, and energy efficiency is a top concern. Attention should be payed to tailor techniques to improve the reliability of a system on the basis of its requirements, ending up with cost-effective solutions favoring the success of the product on the market. Cross-layer reliability is one of the most promising approaches to achieve this goal. Cross-layer reliability techniques take into account the interactions between the layers composing a complex system (i.e., technology, hardware and software layers) to implement efficient cross-layer fault mitigation mechanisms. Fault tolerance mechanism are carefully implemented at different layers starting from the technology up to the software layer to carefully optimize the system by exploiting the inner capability of each layer to mask lower level faults. For this purpose, cross-layer reliability design techniques need to be complemented with cross-layer reliability evaluation tools, able to precisely assess the reliability level of a selected design early in the design cycle. Accurate and early reliability estimates would enable the exploration of the system design space and the optimization of multiple constraints such as performance, power consumption, cost and reliability. This Ph.D. thesis is devoted to the development of new methodologies and tools to evaluate and optimize the reliability of complex digital systems during the early design stages. More specifically, techniques addressing hardware accelerators (i.e., FPGAs and GPUs), microprocessors and full systems are discussed. All developed methodologies are presented in conjunction with their application to real-world use cases belonging to different computational domains

    Parallel error detection using heterogeneous cores

    Get PDF
    Microprocessor error detection is increasingly important, as the number of transistors in modern systems heightens their vulnerability. In addition, many modern workloads in domains such as the automotive and health industries are increasingly error intolerant, due to strict safety standards. However, current detection techniques require duplication of all hardware structures, causing a considerable increase in power consumption and chip area. Solutions in the literature involve running the code multiple times on the same hardware, which reduces performance significantly and cannot capture all errors. We have designed a novel hardware-only solution for error detection, that exploits parallelism in checking code which may not exist in the original execution. We pair a high-performance out-of-order core with a set of small low-power cores, each of which checks a portion of the out-of-order core's execution. Our system enables the detection of both hard and soft errors, with low area, power and performance overheads.This work was supported by the Engineering and Physical Sciences Research Council (EPSRC), through grant references EP/K026399/1 and EP/M506485/1, and Arm Ltd

    Reconfiguration of field programmable logic in embedded systems

    Get PDF

    Reducing soft errors through operand width aware policies

    Get PDF
    Soft errors are an important challenge in contemporary microprocessors. Particle hits on the components of a processor are expected to create an increasing number of transient errors with each new microprocessor generation. In this paper, we propose simple mechanisms that effectively reduce the vulnerability to soft errors in a processor. Our designs are generally motivated by the fact that many of the produced and consumed values in the processors are narrow and their upper order bits are meaningless. Soft errors caused by any particle strike to these higher order bits can be avoided by simply identifying these narrow values. Alternatively, soft errors can be detected or corrected on the narrow values by replicating the vulnerable portion of the value inside the storage space provided for the upper order bits of these operands. As a faster but less fault tolerant alternative to ECC and parity, we offer a variety of schemes that make use of narrow values and analyze their efficiency in reducing soft error vulnerability of different data-holding components of a processor. On average, techniques that make use of the narrowness of the values can provide 49 percent error detection, 45 percent error correction, or 27 percent error avoidance coverage for single bit upsets in the first level data cache across all Spec2K. In other structures such as the immediate field of the issue queue, an average error detection rate of 64 percent is achieved.Peer ReviewedPostprint (published version
    corecore