96,419 research outputs found
Havens: Explicit Reliable Memory Regions for HPC Applications
Supporting error resilience in future exascale-class supercomputing systems
is a critical challenge. Due to transistor scaling trends and increasing memory
density, scientific simulations are expected to experience more interruptions
caused by transient errors in the system memory. Existing hardware-based
detection and recovery techniques will be inadequate to manage the presence of
high memory fault rates.
In this paper we propose a partial memory protection scheme based on
region-based memory management. We define the concept of regions called havens
that provide fault protection for program objects. We provide reliability for
the regions through a software-based parity protection mechanism. Our approach
enables critical program objects to be placed in these havens. The fault
coverage provided by our approach is application agnostic, unlike
algorithm-based fault tolerance techniques.Comment: 2016 IEEE High Performance Extreme Computing Conference (HPEC '16),
September 2016, Waltham, MA, US
A runtime heuristic to selectively replicate tasks for application-specific reliability targets
In this paper we propose a runtime-based selective task replication technique for task-parallel high performance computing applications. Our selective task replication technique is automatic and does not require modification/recompilation of OS, compiler or application code. Our heuristic, we call App_FIT, selects tasks to replicate such that the specified reliability target for an application is achieved. In our experimental evaluation, we show that App FIT selective replication heuristic is low-overhead and highly scalable. In addition, results indicate that complete task replication is overkill for achieving reliability targets. We show that with App FIT, we can tolerate pessimistic exascale error rates with only 53% of the tasks being replicated.This work was supported by FI-DGR 2013 scholarship and the European Community’s
Seventh Framework Programme [FP7/2007-2013] under the Mont-blanc 2
Project (www.montblanc-project.eu), grant agreement no. 610402 and in part by the
European Union (FEDER funds) under contract TIN2015-65316-P.Peer ReviewedPostprint (author's final draft
Measurement-free topological protection using dissipative feedback
Protecting quantum information from decoherence due to environmental noise is
vital for fault-tolerant quantum computation. To this end, standard quantum
error correction employs parallel projective measurements of individual
particles, which makes the system extremely complicated. Here we propose
measurement-free topological protection in two dimension without any selective
addressing of individual particles. We make use of engineered dissipative
dynamics and feedback operations to reduce the entropy generated by decoherence
in such a way that quantum information is topologically protected. We calculate
an error threshold, below which quantum information is protected, without
assuming selective addressing, projective measurements, nor instantaneous
classical processing. All physical operations are local and translationally
invariant, and no parallel projective measurement is required, which implies
high scalability. Furthermore, since the engineered dissipative dynamics we
utilized has been well studied in quantum simulation, the proposed scheme can
be a promising route progressing from quantum simulation to fault-tolerant
quantum information processing.Comment: 17pages, 6 figure
- …