Search CORE

6 research outputs found

Recommended from our members

Runtime asynchronous fault tolerance via speculation

Author: August David I
Ghosh Soumyadeep
Huang Jialu
Lee Jae W
Mahlke Scott A
Zhang Yun
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2012
Field of study

Transient faults are emerging as a critical reliability concern in modern microprocessors. Redundant hardware solutions are commonly deployed to detect transient faults, but they are less flexible and cost-effective than software solutions. However, software solutions are rendered impractical because of high performance overheads. To address this problem, this paper presents Runtime Asynchronous Fault Tolerance via Speculation (RAFT), the fastest transient fault detection technique known to date. Serving as a layer between the application and the underlying platform, RAFT automatically generates two symmetric program instances from a program binary. It detects transient faults in a non-invasive way and exploits high-confidence value speculation to achieve low runtime overhead. Evaluation on a commodity multicore system demonstrates that RAFT delivers a geomean performance overhead of 2.83% on a set of 30 SPEC CPU benchmarks and STAMP benchmarks. Compared with existing transient fault detection techniques, RAFT exhibits the best performance and fault coverage, without requiring any change to the hardware or the software applications

Princeton University Open Access Repository

Crossref

Deviation-tolerant computation in concurrent failure-prone hardware

Author: Marculescu D.
Stanley-Marbell P.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2008
Field of study

Repository TU/e

Pure OAI Repository

Quantitative Robustness Analysis of Quantum Programs (Extended Version)

Author: Hicks Michael
Hietala Kesha
Hung Shih-Han
Wu Xiaodi
Ying Mingsheng
Zhu Shaopeng
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

Quantum computation is a topic of significant recent interest, with practical advances coming from both research and industry. A major challenge in quantum programming is dealing with errors (quantum noise) during execution. Because quantum resources (e.g., qubits) are scarce, classical error correction techniques applied at the level of the architecture are currently cost-prohibitive. But while this reality means that quantum programs are almost certain to have errors, there as yet exists no principled means to reason about erroneous behavior. This paper attempts to fill this gap by developing a semantics for erroneous quantum while-programs, as well as a logic for reasoning about them. This logic permits proving a property we have identified, called

\epsilon

-robustness, which characterizes possible "distance" between an ideal program and an erroneous one. We have proved the logic sound, and showed its utility on several case studies, notably: (1) analyzing the robustness of noisy versions of the quantum Bernoulli factory (QBF) and quantum walk (QW); (2) demonstrating the (in)effectiveness of different error correction schemes on single-qubit errors; and (3) analyzing the robustness of a fault-tolerant version of QBF.Comment: 34 pages, LaTeX; v2: fixed typo

arXiv.org e-Print Archive

OPUS - University of Technology Sydney

Static typing for a faulty lambda calculus

Author: David I. August
David Walker
George A. Reis
Jay Ligatti
Lester Mackey
Publication venue: ACM Press
Publication date: 01/01/2006
Field of study

A transient hardware fault occurs when an energetic particle strikes a transistor, causing it to change state. These faults do not cause permanent damage, but may result in incorrect program execution by altering signal transfers or stored values. While the likelihood that such transient faults will cause any significant damage may seem remote, over the last several years transient faults have caused costly failures in high-end machines at America Online, eBay, and the Los Alamos Neutron Science Center, among others [6, 44, 15]. Because susceptibility to transient faults is proportional to the size and density of transistors, the problem of transient faults will become increasingly important in the coming decades. This paper defines the first formal, type-theoretic framework for studying reliable computation in the presence of transient faults. More specifically, it defines λzap, a lambda calculus that exhibits intermittent data faults. In order to detect and recover from these faults, λzap programs replicate intermediate computations and use majority voting, thereby modeling software-based fault tolerance techniques studied extensively, but informally [10, 20, 30, 31, 32

CiteSeerX

Crossref

Static typing for a faulty lambda calculus

Author: Baumann R. C.
David I. August
David Walker
George A. Reis
Holm J. G.
Jay Ligatti
Lester Mackey
Morrisett G.
Oh N.
Rotenberg E.
Shao Z.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref