Multiversioning hardware transactional memory for fail-operational multithreaded applications

Abstract

Modern safety-critical embedded applications like autonomous driving need to be fail-operational, while high performance and low power consumption are demanded simultaneously. The prevalent fault tolerance mechanisms suffer from disadvantages: Some (e.g. triple modular redundancy) require a substantial amount of duplication, resulting in high hardware costs and power consumption. Others, like lockstep, require supplementary checkpointing mechanisms to recover from errors. Further approaches (e.g. software-based process-level redundancy) cannot handle the indeterminism caused by multithreaded execution. This paper presents a novel approach for fail-operational systems using hardware transactional memory for embedded systems. The hardware transactional memory is extended to support multiple versions, enabling redundant atomic operations and recovery in case of an error. In our FPGA-based evaluation, we executed the PARSEC benchmark suite with fault tolerance on 12 cores. The evaluation shows that multiversioning can successfully recover from all transient errors with an overhead comparable to fault tolerance mechanisms without recovery

    Similar works