Managing Soft-errors in Transactional Systems

Abstract

Abstract-Multicore architectures are becoming increasingly prone to soft-errors -i.e., transient faults caused by external physical phenomena such as electric noise and cosmic particle strikes. With increasing core counts, the soft-error rate is growing due to the accelerating transistor density on chips. The impact of these errors on business-critical applications that are being deployed on multicore hardware can be significant. We present an active replication-based approach that fully masks such errors for transactional applications. We partition computational cores, fully replicate objects across partitions, and concurrently execute transactional requests on all partitions, thereby enabling completely local object accesses. Transactional requests are globally ordered and delivered across partitions using optimistic atomic broadcast. Hardware message passing -an important emerging trend in multicore architectures -is exploited to mitigate communication costs. We report preliminary results obtained with an implementation of our approach on a 36-core Tilera TILE-Gx hardware, with an onchip scalable mesh network

    Similar works