research

Balancing soft error coverage with lifetime reliability in redundantly multithreaded processors

Abstract

Silicon reliability is a key challenge facing the microprocessor industry. Processors need to be designed such that they are resilient against both soft errors and lifetime reliability phenomena. However, techniques developed to address one class of reliability problems may impact other aspects of silicon reliability. In this paper, we show that Redundant Multi-Threading (RMT), which provides soft error protection, exacerbates lifetime reliability. We then explore two different architectural approaches to tackle this problem, namely, Dynamic Voltage Scaling (DVS) and partial RMT. We show that each approach has certain strengths and weaknesses with respect to performance, soft error coverage, and lifetime reliability. We then propose and evaluate a hybrid approach that combines DVS and partial RMT. We show that this approach provides better improvement in lifetime reliability than DVS or partial RMT alone, buys back a significant amount of performance that is lost due to DVS, and provides nearly complete soft error coverage. I

    Similar works

    Available Versions

    Last time updated on 01/04/2019