6 research outputs found
Recommended from our members
Runtime asynchronous fault tolerance via speculation
Transient faults are emerging as a critical reliability concern in modern microprocessors. Redundant hardware solutions are commonly deployed to detect transient faults, but they are less flexible and cost-effective than software solutions. However, software solutions are rendered impractical because of high performance overheads. To address this problem, this paper presents Runtime Asynchronous Fault Tolerance via Speculation (RAFT), the fastest transient fault detection technique known to date. Serving as a layer between the application and the underlying platform, RAFT automatically generates two symmetric program instances from a program binary. It detects transient faults in a non-invasive way and exploits high-confidence value speculation to achieve low runtime overhead. Evaluation on a commodity multicore system demonstrates that RAFT delivers a geomean performance overhead of 2.83% on a set of 30 SPEC CPU benchmarks and STAMP benchmarks. Compared with existing transient fault detection techniques, RAFT exhibits the best performance and fault coverage, without requiring any change to the hardware or the software applications
Quantitative Robustness Analysis of Quantum Programs (Extended Version)
Quantum computation is a topic of significant recent interest, with practical
advances coming from both research and industry. A major challenge in quantum
programming is dealing with errors (quantum noise) during execution. Because
quantum resources (e.g., qubits) are scarce, classical error correction
techniques applied at the level of the architecture are currently
cost-prohibitive. But while this reality means that quantum programs are almost
certain to have errors, there as yet exists no principled means to reason about
erroneous behavior. This paper attempts to fill this gap by developing a
semantics for erroneous quantum while-programs, as well as a logic for
reasoning about them. This logic permits proving a property we have identified,
called -robustness, which characterizes possible "distance" between
an ideal program and an erroneous one. We have proved the logic sound, and
showed its utility on several case studies, notably: (1) analyzing the
robustness of noisy versions of the quantum Bernoulli factory (QBF) and quantum
walk (QW); (2) demonstrating the (in)effectiveness of different error
correction schemes on single-qubit errors; and (3) analyzing the robustness of
a fault-tolerant version of QBF.Comment: 34 pages, LaTeX; v2: fixed typo
Static typing for a faulty lambda calculus
A transient hardware fault occurs when an energetic particle strikes a transistor, causing it to change state. These faults do not cause permanent damage, but may result in incorrect program execution by altering signal transfers or stored values. While the likelihood that such transient faults will cause any significant damage may seem remote, over the last several years transient faults have caused costly failures in high-end machines at America Online, eBay, and the Los Alamos Neutron Science Center, among others [6, 44, 15]. Because susceptibility to transient faults is proportional to the size and density of transistors, the problem of transient faults will become increasingly important in the coming decades. This paper defines the first formal, type-theoretic framework for studying reliable computation in the presence of transient faults. More specifically, it defines 位zap, a lambda calculus that exhibits intermittent data faults. In order to detect and recover from these faults, 位zap programs replicate intermediate computations and use majority voting, thereby modeling software-based fault tolerance techniques studied extensively, but informally [10, 20, 30, 31, 32