Search CORE

315 research outputs found

Modeling RTL Fault Models Behavior to Increase the Confidence on TSIM-based Fault Injection

Author: Abella Ferrer Jaume
Espinosa Jaime
Hernandez Carles
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 04/07/2016
Field of study

Future high-performance safety-relevant applications require microcontrollers delivering higher performance than the existing certified ones. However, means for assessing their dependability are needed so that they can be certified against safety critical certification standars (e.g ISO26262). Dependability assessment analyses performed at high level of abstraction inject single faults to investigate the effects these have in the system. In this work we show that single faults do not comprise the whole picture, due to fault multiplicities and reactivations. Later we prove that, by injecting complex fault models that consider multiplicities and reactivations in higher levels of abstraction, results are substantially different, thus indicating that a change in the methodology is needed.The research leading to these results has received funding from the Ministry of Science and Technology of Spain under contract TIN2015-65316-P and the HiPEAC Network of Excellence. Carles Hern´andez is jointly funded by the Spanish Ministry of Economy and Competitiveness (MINECO) and FEDER funds through grant TIN2014-60404-JIN. Jaume Abella has been partially supported by the MINECO under Ramon y Cajal postdoctoral fellowship number RYC-2013-14717.Postprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

Building on Quicksand

Author: Campbell David
Helland Pat
Publication venue
Publication date: 01/01/2009
Field of study

Reliable systems have always been built out of unreliable components. Early on, the reliable components were small such as mirrored disks or ECC (Error Correcting Codes) in core memory. These systems were designed such that failures of these small components were transparent to the application. Later, the size of the unreliable components grew larger and semantic challenges crept into the application when failures occurred. As the granularity of the unreliable component grows, the latency to communicate with a backup becomes unpalatable. This leads to a more relaxed model for fault tolerance. The primary system will acknowledge the work request and its actions without waiting to ensure that the backup is notified of the work. This improves the responsiveness of the system. There are two implications of asynchronous state capture: 1) Everything promised by the primary is probabilistic. There is always a chance that an untimely failure shortly after the promise results in a backup proceeding without knowledge of the commitment. Hence, nothing is guaranteed! 2) Applications must ensure eventual consistency. Since work may be stuck in the primary after a failure and reappear later, the processing order for work cannot be guaranteed. Platform designers are struggling to make this easier for their applications. Emerging patterns of eventual consistency and probabilistic execution may soon yield a way for applications to express requirements for a "looser" form of consistency while providing availability in the face of ever larger failures. This paper recounts portions of the evolution of these trends, attempts to show the patterns that span these changes, and talks about future directions as we continue to "build on quicksand".Comment: CIDR 200

arXiv.org e-Print Archive

CiteSeerX

Fault-tolerant behavior in state-of-the-art grid workflow management systems

Author: Fahringer T.
Fahringer T.
Kacsuk P.
Kacsuk P.
Kertesz A.
Kertesz A.
Plankensteiner K.
Plankensteiner K.
Prodan R.
Prodan R.
Publication venue: CoreGRID
Publication date: 18/10/2007
Field of study

WestminsterResearch

Enabling portable I/O analysis of commercially sensitive HPC applications through workload replication

Author: Dickson James
Harris Duncan
Herdman J. A.
Jarvis Stephen A.
Maheswaran Satheesh
Miller Mark C.
Wright Steven A.
Publication venue: Cray User Group
Publication date: 01/05/2017
Field of study

Benchmarking and analyzing I/O performance across high performance computing (HPC) platforms is necessary to identify performance bottlenecks and guide effective use of new and existing storage systems. Doing this with large production applications, which can often be commercially sensitive and lack portability, is not a straightforward task and the availability of a representative proxy for I/O workloads can help to provide a solution. We use Darshan I/O characterization and the MACSio proxy application to replicate five production workloads, showing how these can be used effectively to investigate I/O performance when migrating between HPC systems ranging from small local clusters to leadership scale machines. Preliminary results indicate that it is possible to generate datasets that match the target application with a good degree of accuracy. This enables a predictive performance analysis study of a representative workload to be conducted on five different systems. The results of this analysis are used to identify how workloads exhibit different I/O footprints on a file system and what effect file system configuration can have on performance

Warwick Research Archives Portal Repository

SPOT: A DSL for Extending Fortran Programs with Metaprogramming

Author: Jeff Gray
Songqing Yue
Publication venue: 'Hindawi Limited'
Publication date
Field of study

Crossref