302 research outputs found
Algorithmic fault tolerance using the Lanczos method
We consider the problem of algorithm-based fault tolerance, and make two major contributions. First, we show how very general sequences of polynomials can be used to generate the checksums, so as to reduce the chance of numerical overows. Second, we show how the Lanczos process can be applied in the error location and correction steps, so as to save on the amount of work and to facilitate actual hardware implementation
Status and Future Perspectives for Lattice Gauge Theory Calculations to the Exascale and Beyond
In this and a set of companion whitepapers, the USQCD Collaboration lays out
a program of science and computing for lattice gauge theory. These whitepapers
describe how calculation using lattice QCD (and other gauge theories) can aid
the interpretation of ongoing and upcoming experiments in particle and nuclear
physics, as well as inspire new ones.Comment: 44 pages. 1 of USQCD whitepapers
Reliable Linear, Sesquilinear and Bijective Operations On Integer Data Streams Via Numerical Entanglement
A new technique is proposed for fault-tolerant linear, sesquilinear and
bijective (LSB) operations on integer data streams (), such as:
scaling, additions/subtractions, inner or outer vector products, permutations
and convolutions. In the proposed method, the input integer data streams
are linearly superimposed to form numerically-entangled integer data
streams that are stored in-place of the original inputs. A series of LSB
operations can then be performed directly using these entangled data streams.
The results are extracted from the entangled output streams by additions
and arithmetic shifts. Any soft errors affecting any single disentangled output
stream are guaranteed to be detectable via a specific post-computation
reliability check. In addition, when utilizing a separate processor core for
each of the streams, the proposed approach can recover all outputs after
any single fail-stop failure. Importantly, unlike algorithm-based fault
tolerance (ABFT) methods, the number of operations required for the
entanglement, extraction and validation of the results is linearly related to
the number of the inputs and does not depend on the complexity of the performed
LSB operations. We have validated our proposal in an Intel processor (Haswell
architecture with AVX2 support) via fast Fourier transforms, circular
convolutions, and matrix multiplication operations. Our analysis and
experiments reveal that the proposed approach incurs between to
reduction in processing throughput for a wide variety of LSB operations. This
overhead is 5 to 1000 times smaller than that of the equivalent ABFT method
that uses a checksum stream. Thus, our proposal can be used in fault-generating
processor hardware or safety-critical applications, where high reliability is
required without the cost of ABFT or modular redundancy.Comment: to appear in IEEE Trans. on Signal Processing, 201
Early Fault-Tolerant Quantum Computing
Over the past decade, research in quantum computing has tended to fall into
one of two camps: near-term intermediate scale quantum (NISQ) and
fault-tolerant quantum computing (FTQC). Yet, a growing body of work has been
investigating how to use quantum computers in transition between these two
eras. This envisions operating with tens of thousands to millions of physical
qubits, able to support fault-tolerant protocols, though operating close to the
fault-tolerant threshold. Two challenges emerge from this picture: how to model
the performance of devices that are continually improving and how to design
algorithms to make the most use of these devices? In this work we develop a
model for the performance of early fault-tolerant quantum computing (EFTQC)
architectures and use this model to elucidate the regimes in which algorithms
suited to such architectures are advantageous. As a concrete example, we show
that, for the canonical task of phase estimation, in a regime of moderate
scalability and using just over one million physical qubits, the ``reach'' of
the quantum computer can be extended (compared to the standard approach) from
90-qubit instances to over 130-qubit instances using a simple early
fault-tolerant quantum algorithm, which reduces the number of operations per
circuit by a factor of 100 and increases the number of circuit repetitions by a
factor of 10,000. This clarifies the role that such algorithms might play in
the era of limited-scalability quantum computing.Comment: 20 pages, 8 figures with desmos links, plus appendi
- …