818 research outputs found
Parallel Tempering Simulation of the three-dimensional Edwards-Anderson Model with Compact Asynchronous Multispin Coding on GPU
Monte Carlo simulations of the Ising model play an important role in the
field of computational statistical physics, and they have revealed many
properties of the model over the past few decades. However, the effect of
frustration due to random disorder, in particular the possible spin glass
phase, remains a crucial but poorly understood problem. One of the obstacles in
the Monte Carlo simulation of random frustrated systems is their long
relaxation time making an efficient parallel implementation on state-of-the-art
computation platforms highly desirable. The Graphics Processing Unit (GPU) is
such a platform that provides an opportunity to significantly enhance the
computational performance and thus gain new insight into this problem. In this
paper, we present optimization and tuning approaches for the CUDA
implementation of the spin glass simulation on GPUs. We discuss the integration
of various design alternatives, such as GPU kernel construction with minimal
communication, memory tiling, and look-up tables. We present a binary data
format, Compact Asynchronous Multispin Coding (CAMSC), which provides an
additional speedup compared with the traditionally used Asynchronous
Multispin Coding (AMSC). Our overall design sustains a performance of 33.5
picoseconds per spin flip attempt for simulating the three-dimensional
Edwards-Anderson model with parallel tempering, which significantly improves
the performance over existing GPU implementations.Comment: 15 pages, 18 figure
Hierarchical fractional-step approximations and parallel kinetic Monte Carlo algorithms
We present a mathematical framework for constructing and analyzing parallel
algorithms for lattice Kinetic Monte Carlo (KMC) simulations. The resulting
algorithms have the capacity to simulate a wide range of spatio-temporal scales
in spatially distributed, non-equilibrium physiochemical processes with complex
chemistry and transport micro-mechanisms. The algorithms can be tailored to
specific hierarchical parallel architectures such as multi-core processors or
clusters of Graphical Processing Units (GPUs). The proposed parallel algorithms
are controlled-error approximations of kinetic Monte Carlo algorithms,
departing from the predominant paradigm of creating parallel KMC algorithms
with exactly the same master equation as the serial one.
Our methodology relies on a spatial decomposition of the Markov operator
underlying the KMC algorithm into a hierarchy of operators corresponding to the
processors' structure in the parallel architecture. Based on this operator
decomposition, we formulate Fractional Step Approximation schemes by employing
the Trotter Theorem and its random variants; these schemes, (a) determine the
communication schedule} between processors, and (b) are run independently on
each processor through a serial KMC simulation, called a kernel, on each
fractional step time-window.
Furthermore, the proposed mathematical framework allows us to rigorously
justify the numerical and statistical consistency of the proposed algorithms,
showing the convergence of our approximating schemes to the original serial
KMC. The approach also provides a systematic evaluation of different processor
communicating schedules.Comment: 34 pages, 9 figure
Demonstration of a scaling advantage for a quantum annealer over simulated annealing
The observation of an unequivocal quantum speedup remains an elusive
objective for quantum computing. The D-Wave quantum annealing processors have
been at the forefront of experimental attempts to address this goal, given
their relatively large numbers of qubits and programmability. A complete
determination of the optimal time-to-solution (TTS) using these processors has
not been possible to date, preventing definitive conclusions about the presence
of a scaling advantage. The main technical obstacle has been the inability to
verify an optimal annealing time within the available range. Here we overcome
this obstacle and present a class of problem instances for which we observe an
optimal annealing time using a D-Wave 2000Q processor over a range spanning up
to more than qubits. This allows us to perform an optimal TTS
benchmarking analysis and perform a comparison to several classical algorithms,
including simulated annealing, spin-vector Monte Carlo, and discrete-time
simulated quantum annealing. We establish the first example of a scaling
advantage for an experimental quantum annealer over classical simulated
annealing: we find that the D-Wave device exhibits certifiably better scaling
than simulated annealing, with confidence, over the range of problem
sizes that we can test. However, we do not find evidence for a quantum speedup:
simulated quantum annealing exhibits the best scaling by a significant margin.
Our construction of instance classes with verifiably optimal annealing times
opens up the possibility of generating many new such classes, paving the way
for further definitive assessments of scaling advantages using current and
future quantum annealing devices.Comment: 26 pages, 22 figures. v2: Updated benchmarking results with
additional analysis. v3: Updated to published versio
- …