17 research outputs found
Improving redundant multithreading performance for soft-error detection in HPC applications
Tesis de Graduación (Maestría en Computación) Instituto Tecnológico de Costa Rica, Escuela de Computación, 2018As HPC systems move towards extreme scale, soft errors leading to silent data corruptions become
a major concern. In this thesis, we propose a set of three optimizations to the classical Redundant
Multithreading (RMT) approach to allow faster soft error detection. First, we leverage the use of
Simultaneous Multithreading (SMT) to collocate sibling replicated threads on the same physical
core to efficiently exchange data to expose errors. Some HPC applications cannot fully exploit
SMT for performance improvement and instead, we propose to use these additional resources
for fault tolerance. Second, we present variable aggregation to group several values together
and use this merged value to speed up detection of soft errors. Third, we introduce selective
checking to decrease the number of checked values to a minimum. The last two techniques reduce
the overall performance overhead by relaxing the soft error detection scope. Our experimental
evaluation, executed on recent multicore processors with representative HPC benchmarks, proves
that the use of SMT for fault tolerance can enhance RMT performance. It also shows that, at
constant computing power budget, with optimizations applied, the overhead of the technique can
be significantly lower than the classical RMT replicated execution. Furthermore, these results
show that RMT can be a viable solution for soft-error detection at extreme scale
BatchQueue : file producteur / consommateur optimisée pour les multi-cœurs
National audienceLes applications séquentielles peuvent tirer partie des systèmes multi-cœurs en utilisant le parallélisme pipeline pour accroître leur performance. Dans un tel schéma de parallélisme, l'accélération possible est limitée par le surcoût dû à la communication cœur à cœur. Ce papier présente l'algorithme BatchQueue, un système de communication rapide conçu pour optimiser l'utilisation du cache matériel, notamment au regard du pré-chargement. BatchQueue propose des performances améliorées d'un facteur 2 : il est capable d'envoyer un mot de données en 3,5 nanosecondes sur un système 64 bits, représentant un débit de 2 Gio/s
Using proxy design pattern for transparent redundant execution
12th Turkish National Software Engineering Symposium, UYMS 2018; Istanbul; Turkey; 10 September 2018 through 12 September 2018In this study, we propose a transparent model for reliable execution
of object-oriented software. We design a generic object-oriented
programming tool for redundant software execution to provide the desired
level of reliability against transient hardware faults. To achieve this,
we utilize the Proxy design pattern which is one of the well-known GoF
design patterns that are formed to make software systems
exible and
easy to maintain. Proxy design pattern provides a controlled access and
a transparent mechanism for adding new functionalities to an existing
object when accessing it. Combining the instruments of dynamic proxy
and annotations in Java programming language, we present, Redundant-
Caller, a generic, transparent, and con gurable tool for redundant execution
and majority voting. Our tool takes any object and creates a
dynamic proxy for it which executes the methods of the object multiple
times in separate threads, and performs majority voting on the
background, requiring minimum amount of change in the original user
code. Thanks to annotations, users can con gure the redundant execution
scheme methodwise. Our experiments demonstrate that our tool
provides a signi cant level of reliability to any object-oriented software
with a reasonable amount of performance degradation through multithreaded
execution.Bu çalışsmada, nesneye yönelik programların güvenilir bir şekilde çalıştırılması için saydam bir model önermekteyiz. Geçici donanım hatalarıa karşı istenen seviyede güvenilirliği sağlayabilmek amacıyla artıklı (redundant) program çalıştıması için genel bir nesneye yönelik programlama araç tasarladık. Bunun için yazılım sistemlerini esnek ve kolay sürdürülebilir yapabilmek için oluşturulmuş ve yaygınca kullanılan GoF tasarım örüntülerinden biri olan vekil tasarım örünüsünü kullandık. Vekil tasarım örüntüsü, var olan bir nesneye erişirken ona yeni fonksiyonellikler eklemeye yarayan saydam bir düzenek ve kontrollü bir
erişim sağlamaktadır. Java programlama dilindeki dinamik vekil ve annotation araçlarını birleştirerek, artıklı çalıştırma ve çoğunluk oylaması için genel, saydam ve yapılandırılabilir bir araç olan RedundantCaller'ı sunmaktayız. Aracımız, herhangi bir nesneyi alır ve özgün kullanıcı koduna en az miktarda değişiklik gerektirerek nesnenin metotlarını farklı iş parçacıkların da çoklu miktarda çalıştıran ve arka planda çoğunluk oylaması yapan bir dinamik vekil yaratır. annotationlar sayesinde, kullanıcılar artıklı çalıştırmayı metot seviyesinde yapılandırabilirler. Deneylerimiz göstermektedir ki; aracımız herhangi bir nesneye yönelik program için çok iş parçacıklı çalıştırma sayesinde makul bir performans düşüşüyle
kayda değer bir güvenilirlik seviyesi sağlamaktadır.Ulusal Yüksek Başarılı Hesaplama Merkezi'nin (UHeM), (1005202018
Recommended from our members
Runtime asynchronous fault tolerance via speculation
Transient faults are emerging as a critical reliability concern in modern microprocessors. Redundant hardware solutions are commonly deployed to detect transient faults, but they are less flexible and cost-effective than software solutions. However, software solutions are rendered impractical because of high performance overheads. To address this problem, this paper presents Runtime Asynchronous Fault Tolerance via Speculation (RAFT), the fastest transient fault detection technique known to date. Serving as a layer between the application and the underlying platform, RAFT automatically generates two symmetric program instances from a program binary. It detects transient faults in a non-invasive way and exploits high-confidence value speculation to achieve low runtime overhead. Evaluation on a commodity multicore system demonstrates that RAFT delivers a geomean performance overhead of 2.83% on a set of 30 SPEC CPU benchmarks and STAMP benchmarks. Compared with existing transient fault detection techniques, RAFT exhibits the best performance and fault coverage, without requiring any change to the hardware or the software applications
Parallel error detection using heterogeneous cores
Microprocessor error detection is increasingly important, as the number of transistors in modern systems heightens their vulnerability. In addition, many modern workloads in domains such as the automotive and health industries are increasingly error intolerant, due to strict safety standards.
However, current detection techniques require duplication of all hardware structures, causing a considerable increase in power consumption and chip area. Solutions in the literature involve running the code multiple times on the same hardware, which reduces performance significantly and cannot capture all errors.
We have designed a novel hardware-only solution for error detection, that exploits parallelism in checking code which may not exist in the original execution. We pair a high-performance out-of-order core with a set of small low-power cores, each of which checks a portion of the out-of-order core's execution. Our system enables the detection of both hard and soft errors, with low area, power and performance overheads.This work was supported by the Engineering and Physical Sciences Research Council (EPSRC), through grant references EP/K026399/1 and EP/M506485/1, and Arm Ltd
ParaMedic: Heterogeneous Parallel Error Correction
Processor error detection can be reduced in cost significantly by exploiting the parallelism that exists in a repeated copy of an execution, which may not exist in the original code, to split up the redundant work on a large number of small, highly efficient cores. However, such schemes don't provide a method for automatic error recovery.
We develop ParaMedic, an architecture to allow efficient automatic correction of errors detected in a system by using parallel heterogeneous cores, to provide a full fail-safe system that does not propagate errors to other systems, and can recover without manual intervention. This uses logging to roll back any computation that occurred after a detected error, along with a set of techniques to provide error-checking parallelism while still preventing the escape of incorrect processor values in multicore environments, where ordering of individual processors' logs is not enough to be able to roll back execution. Across a set of single and multi-threaded benchmarks, we achieve 3.1\% and 1.5\% overhead respectively, compared with 1.9\% and 1\% for error detection alone.Arm Lt