Search CORE

96,419 research outputs found

Havens: Explicit Reliable Memory Regions for HPC Applications

Author: Engelmann Christian
Hukerikar Saurabh
Publication venue
Publication date: 26/10/2016
Field of study

Supporting error resilience in future exascale-class supercomputing systems is a critical challenge. Due to transistor scaling trends and increasing memory density, scientific simulations are expected to experience more interruptions caused by transient errors in the system memory. Existing hardware-based detection and recovery techniques will be inadequate to manage the presence of high memory fault rates. In this paper we propose a partial memory protection scheme based on region-based memory management. We define the concept of regions called havens that provide fault protection for program objects. We provide reliability for the regions through a software-based parity protection mechanism. Our approach enables critical program objects to be placed in these havens. The fault coverage provided by our approach is application agnostic, unlike algorithm-based fault tolerance techniques.Comment: 2016 IEEE High Performance Extreme Computing Conference (HPEC '16), September 2016, Waltham, MA, US

arXiv.org e-Print Archive

Crossref

A runtime heuristic to selectively replicate tasks for application-specific reliability targets

Author: Labarta Mancho Jesús José
Subasi Omer
Unsal Osman Sabri
Yalcin Gulay
Zyulkyarov Ferad
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

In this paper we propose a runtime-based selective task replication technique for task-parallel high performance computing applications. Our selective task replication technique is automatic and does not require modification/recompilation of OS, compiler or application code. Our heuristic, we call App_FIT, selects tasks to replicate such that the specified reliability target for an application is achieved. In our experimental evaluation, we show that App FIT selective replication heuristic is low-overhead and highly scalable. In addition, results indicate that complete task replication is overkill for achieving reliability targets. We show that with App FIT, we can tolerate pessimistic exascale error rates with only 53% of the tasks being replicated.This work was supported by FI-DGR 2013 scholarship and the European Community’s Seventh Framework Programme [FP7/2007-2013] under the Mont-blanc 2 Project (www.montblanc-project.eu), grant agreement no. 610402 and in part by the European Union (FEDER funds) under contract TIN2015-65316-P.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

Measurement-free topological protection using dissipative feedback

Author: Fujii Keisuke
Imoto Nobuyuki
Kitagawa Masahiro
Negoro Makoto
Publication venue: 'American Physical Society (APS)'
Publication date: 24/01/2014
Field of study

Protecting quantum information from decoherence due to environmental noise is vital for fault-tolerant quantum computation. To this end, standard quantum error correction employs parallel projective measurements of individual particles, which makes the system extremely complicated. Here we propose measurement-free topological protection in two dimension without any selective addressing of individual particles. We make use of engineered dissipative dynamics and feedback operations to reduce the entropy generated by decoherence in such a way that quantum information is topologically protected. We calculate an error threshold, below which quantum information is protected, without assuming selective addressing, projective measurements, nor instantaneous classical processing. All physical operations are local and translationally invariant, and no parallel projective measurement is required, which implies high scalability. Furthermore, since the engineered dissipative dynamics we utilized has been well studied in quantum simulation, the proposed scheme can be a promising route progressing from quantum simulation to fault-tolerant quantum information processing.Comment: 17pages, 6 figure

arXiv.org e-Print Archive

Directory of Open Access Journals

Osaka University Knowledge Archive

Institutional Repositories DataBase (IRDB)