25 research outputs found

    A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services

    Get PDF
    Datacenter workloads demand high computational capabilities, flexibility, power efficiency, and low cost. It is challenging to improve all of these factors simultaneously. To advance datacenter capabilities beyond what commodity server designs can provide, we designed and built a composable, reconfigurable hardware fabric based on field programmable gate arrays (FPGA). Each server in the fabric contains one FPGA, and all FPGAs within a 48-server rack are interconnected over a low-latency, high-bandwidth network. We describe a medium-scale deployment of this fabric on a bed of 1632 servers, and measure its effectiveness in accelerating the ranking component of the Bing web search engine. We describe the requirements and architecture of the system, detail the critical engineering challenges and solutions needed to make the system robust in the presence of failures, and measure the performance, power, and resilience of the system. Under high load, the large-scale reconfigurable fabric improves the ranking throughput of each server by 95% at a desirable latency distribution or reduces tail latency by 29% at a fixed throughput. In other words, the reconfigurable fabric enables the same throughput using only half the number of servers

    Online Low-Cost Defect Tolerance Solutions for Microprocessor Designs.

    Full text link
    One of the major driving forces of the semiconductor industry is the continuous scaling of the silicon process technology. Over the last four decades, the scaling into a new silicon technology every few years offered smaller and faster transistors that made possible the development of high-performance microprocessors. This technological achievement fueled the widespread adoption of microprocessor-based products in applications that touch every aspect of our life. However, many device experts warn that the continued transistor size scaling into smaller dimensions will inevitably result in silicon technologies that are much less reliable than the current ones. Microprocessors manufactured in future silicon technologies will likely experience failures due to silicon defects. In the absence of any viable alternative technology, the success of the semiconductor industry in the future will depend on the creation of cost-effective mechanisms to tolerate silicon defects while the microprocessor is in operation. This thesis is focused on the development of defect-tolerance techniques that will provide low-cost mechanisms to protect a microprocessor from silicon defects. The approach of these novel defect-tolerance solutions represents a new thinking in the field of defect-tolerant design. In particular, traditional approaches to defect-tolerant design saddle a system with redundant components that continuously verify computation. In contrast, the proposed BulletProof approach provides low cost periodic hardware checking. Furthermore, to lower the cost of hardware checking, the silicon defect detection process is shifted from hardware to software using a software-based approach, the ACE Framework. This thesis also makes the case that the hardware resources of the ACE framework can also be used for other applications to add value and ease its adoption in future generation microprocessors. Finally, this thesis presents CrashTest, a novel FPGA-based framework used to assess the threats and the reliability requirements of a microprocessor. Altogether, the defect-tolerance solutions presented in this thesis provide a cost-effective defect-tolerance framework that makes possible the development of reliable microprocessors using unreliable silicon technologies. This enables the continuation of silicon scaling into smaller but possibly less reliable transistors, a key requirement for the development of the next generation microprocessors and the extension of microprocessor-based products into new applications.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/62317/1/kypros_1.pd

    A Hardware-Based Method for Dynamically Detecting Instruction-Isomorphism and its Application to Branch Prediction

    No full text
    This paper proposes a hardware-based heuristic method for implementing various transformations and detecting isomorphism in the dynamic dependence graph of a program. This enables on the fly identificationof isomorphic instructions which may be useful for improving the performance of several microarchitectural mechanisms. This work considers the application of the proposed method to conditional branch prediction. The empirical results using SPEC benchmarks suggest that the proposed method may be useful for increasing prediction accuracy and improving performance. Specifically, is shown for a 4-way processor that a 16KB gshare predictor combined with a 16KB overriding isomorphic predictor can achieve better performance than either a 32KB gshare or a 32KB combining gshare/bimodal predictor.

    Software-Based Online Detection of Hardware Defects Mechanisms, Architectural Support, and Evaluation

    No full text
    As silicon process technology scales deeper into the nanometer regime, hardware defects are becoming more common. Such defects are bound to hinder the correct operation of future processor systems, unless new online techniques become available to detect and to tolerate them while preserving the integrity of software applications running on the system. This paper proposes a new, software-based, defect detection and diagnosis technique. We introduce a novel set of instructions, called Access-Control Extension (ACE), that can access and control the microprocessor’s internal state. Special firmware periodically suspends microprocessor execution and uses the ACE instructions to run directed tests on the hardware. When a hardware defect is present, these tests can diagnose and locate it, and then activate system repair through resource reconfiguration. The software nature of our framework makes it flexible: testing techniques can be modified/upgraded in the field to trade off performance with reliability without requiring any change to the hardware. We evaluated our technique on a commercial chip-multiprocessor based on Sun’s Niagara and found that it can provide very high coverage, with 99.22 % of all silicon defects detected. Moreover, our results show that the average performance overhead of softwarebased testing is only 5.5%. Based on a detailed RTL-level implementation of our technique, we find its area overhead to be quite modest, with only a 5.8 % increase in total chip area. 1

    The Significance of Affectors and Affectees Correlations for Branch Prediction

    No full text
    Abstract. This work investigates the potential of direction-correlations to improve branch prediction. There are two types of direction-correlation: affectors and affectees. This work considers for the first time their implications at a basic level. These correlations are determined based on dataflow graph information and are used to select the subset of global branch history bits used for prediction. If this subset is small then affectors and affectees can be useful to cut down learning time, and reduce aliasing in prediction tables. This paper extends previous work explaining why and how correlation-based predictors work by analyzing the properties of direction-correlations. It also shows that branch history selected using oracle knowledge of direction-correlations improves the accuracy of the limit and realistic conditional branch predictors, that won at the recent branch prediction contest, by up to 30 % and 17 % respectively. The findings in this paper call for the investigation of predictors that can learn efficiently correlations from long branch history that may be non-consecutive with holes between them.

    A Flexible Software-Based Framework for Online Detection of Hardware Defects,” submitted to

    No full text
    Abstract—This work proposes a new, software-based, defect detection and diagnosis technique. We introduce a novel set of instructions, called Access-Control Extensions (ACE), that can access and control the microprocessor’s internal state. Special firmware periodically suspends microprocessor execution and uses the ACE instructions to run directed tests on the hardware. When a hardware defect is present, these tests can diagnose and locate it, and then activate system repair through resource reconfiguration. The software nature of our framework makes it flexible: testing techniques can be modified/upgraded in the field to trade-off performance with reliability without requiring any change to the hardware. We describe and evaluate different execution models for using the ACE framework. We also describe how the proposed ACE framework can be extended and utilized to improve the quality of post-silicon debugging and manufacturing testing of modern processors. We evaluated our technique on a commercial chipmultiprocessor based on Sun’s Niagara and found that it can provide very high coverage, with 99.22 percent of all silicon defects detected. Moreover, our results show that the average performance overhead of software-based testing is only 5.5 percent. Based on a detailed register transfer level (RTL) implementation of our technique, we find its area and power consumption overheads to be modest, with a 5.8 percent increase in total chip area and a 4 percent increase in the chip’s overall power consumption. Index Terms—Reliability, hardware defects, online defect detection, testing, online self-test, post-silicon debugging, manufacturing test. Ç

    Low-Cost Protection for SER Upsets and Silicon Defects

    No full text
    Extreme transistor scaling trends in silicon technology are soon to reach a point where manufactured systems will suffer from limited device reliability and severely reduced life-time, due to early transistor failures, gate oxide wear-out, manufacturing defects, and radiation-induced soft errors (SER). In this paper we present a low-cost technique to harden a microprocessor pipeline and caches against these reliability threats. Our approach utilizes online built-in self-test (BIST) and microarchitectural checkpointing to detect, diagnose and recover the computation impaired by silicon defects or SER events. The approach works by periodically testing the processor to determine if the system is broken. If so, we reconfigure the processor to avoid using the broken component. A similar mechanism is used to detect SER, faults, with the difference that recovery is implemented by re-execution. By utilizing low-cost techniques to address defects and SER, we keep protection costs significantly lower than traditional fault-tolerance approaches while providing high levels of coverage for a wide range of faults. Using detailed gate-level simulation, we find that our approach provides 95% and 99% coverage for silicon defects and SER events, respectively, with only a 14% area overhead
    corecore