2 research outputs found

    Adaptive execution assistance for multiplexed fault-tolerant chip multiprocessors

    Full text link
    Relentless scaling of CMOS fabrication technology has made contemporary integrated circuits increasingly susceptible to transient faults, wearout-related permanent faults, intermittent faults and process variations. Therefore, mechanisms to mitigate the effects of decreased reliability are expected to become essential components of future general­ purpose microprocessors. In this paper, we introduce a new throughput-efficient architecture for multiplexed fault-tolerant chip multiprocessors (CMPs). Our proposal relies on the new technique of adaptive execution assistance, which dynamically varies instruction outcomes forwarded from the leading core to the trailing core based on measures of trailing core performance. We identify policies and design low overhead hardware mechanisms to achieve this. Our work also introduces a new priority-based thread-scheduling algorithm for multiplexed architectures that improves multiplexed fault­ tolerant CMP throughput by prioritizing stalled threads. Through simulation-based evaluation, we find that our proposal delivers 17.2% higher throughput than perfect dual modular redundant (DMR) execution and outperforms previous proposals for throughput-efficient CMP architectures

    Fault Tolerance Through Re-execution in Multiscalar Architecture

    No full text
    Multi-threading and multiscaling are two fundamental microarchitecture approaches that are expected to stay on the existing performance gain curve. Both of these approaches assume that integrated circuits with over billion transistors will become available in the near future. Such large integrated circuits imply reduced design tolerances and hence increased failure probability. Conventional hardware redundancy techniques for desired reliability in computation may severely limit the performance of such high performance processors. Hence we need to study novel methods to exploit the inherent redundancy of the microarchitectures, without unduly affecting the performance, to provide correct program execution and/or detect failures (permanent or transient) that can occur in the hardware. This paper proposes a time redundancy technique suitable for multiscalar architectures. In the multiscalar architecture, there are usually several processing units to exploit the instruction level parallelism that exists in a given program. The technique in this paper uses a majority of the processing units for executing the program as in the traditional multiscalar paradigm while using the remainder of the processing units for re-executing the committed instructions. By comparing the results from the two program executions, errors caused by permanent or transient faults in the processing units can be detected. Simulation results presented in this paper demonstrate that this can be achieved with about 5-15 % performance degradation
    corecore