372 research outputs found
Parallel Tempering Simulation of the three-dimensional Edwards-Anderson Model with Compact Asynchronous Multispin Coding on GPU
Monte Carlo simulations of the Ising model play an important role in the
field of computational statistical physics, and they have revealed many
properties of the model over the past few decades. However, the effect of
frustration due to random disorder, in particular the possible spin glass
phase, remains a crucial but poorly understood problem. One of the obstacles in
the Monte Carlo simulation of random frustrated systems is their long
relaxation time making an efficient parallel implementation on state-of-the-art
computation platforms highly desirable. The Graphics Processing Unit (GPU) is
such a platform that provides an opportunity to significantly enhance the
computational performance and thus gain new insight into this problem. In this
paper, we present optimization and tuning approaches for the CUDA
implementation of the spin glass simulation on GPUs. We discuss the integration
of various design alternatives, such as GPU kernel construction with minimal
communication, memory tiling, and look-up tables. We present a binary data
format, Compact Asynchronous Multispin Coding (CAMSC), which provides an
additional speedup compared with the traditionally used Asynchronous
Multispin Coding (AMSC). Our overall design sustains a performance of 33.5
picoseconds per spin flip attempt for simulating the three-dimensional
Edwards-Anderson model with parallel tempering, which significantly improves
the performance over existing GPU implementations.Comment: 15 pages, 18 figure
JANUS: an FPGA-based System for High Performance Scientific Computing
This paper describes JANUS, a modular massively parallel and reconfigurable
FPGA-based computing system. Each JANUS module has a computational core and a
host. The computational core is a 4x4 array of FPGA-based processing elements
with nearest-neighbor data links. Processors are also directly connected to an
I/O node attached to the JANUS host, a conventional PC. JANUS is tailored for,
but not limited to, the requirements of a class of hard scientific applications
characterized by regular code structure, unconventional data manipulation
instructions and not too large data-base size. We discuss the architecture of
this configurable machine, and focus on its use on Monte Carlo simulations of
statistical mechanics. On this class of application JANUS achieves impressive
performances: in some cases one JANUS processing element outperfoms high-end
PCs by a factor ~ 1000. We also discuss the role of JANUS on other classes of
scientific applications.Comment: 11 pages, 6 figures. Improved version, largely rewritten, submitted
to Computing in Science & Engineerin
Simulating spin systems on IANUS, an FPGA-based computer
We describe the hardwired implementation of algorithms for Monte Carlo
simulations of a large class of spin models. We have implemented these
algorithms as VHDL codes and we have mapped them onto a dedicated processor
based on a large FPGA device. The measured performance on one such processor is
comparable to O(100) carefully programmed high-end PCs: it turns out to be even
better for some selected spin models. We describe here codes that we are
currently executing on the IANUS massively parallel FPGA-based system.Comment: 19 pages, 8 figures; submitted to Computer Physics Communication
Janus II: a new generation application-driven computer for spin-system simulations
This paper describes the architecture, the development and the implementation
of Janus II, a new generation application-driven number cruncher optimized for
Monte Carlo simulations of spin systems (mainly spin glasses). This domain of
computational physics is a recognized grand challenge of high-performance
computing: the resources necessary to study in detail theoretical models that
can make contact with experimental data are by far beyond those available using
commodity computer systems. On the other hand, several specific features of the
associated algorithms suggest that unconventional computer architectures, which
can be implemented with available electronics technologies, may lead to order
of magnitude increases in performance, reducing to acceptable values on human
scales the time needed to carry out simulation campaigns that would take
centuries on commercially available machines. Janus II is one such machine,
recently developed and commissioned, that builds upon and improves on the
successful JANUS machine, which has been used for physics since 2008 and is
still in operation today. This paper describes in detail the motivations behind
the project, the computational requirements, the architecture and the
implementation of this new machine and compares its expected performances with
those of currently available commercial systems.Comment: 28 pages, 6 figure
Ianus: an Adpative FPGA Computer
Dedicated machines designed for specific computational algorithms can
outperform conventional computers by several orders of magnitude. In this note
we describe {\it Ianus}, a new generation FPGA based machine and its basic
features: hardware integration and wide reprogrammability. Our goal is to build
a machine that can fully exploit the performance potential of new generation
FPGA devices. We also plan a software platform which simplifies its
programming, in order to extend its intended range of application to a wide
class of interesting and computationally demanding problems. The decision to
develop a dedicated processor is a complex one, involving careful assessment of
its performance lead, during its expected lifetime, over traditional computers,
taking into account their performance increase, as predicted by Moore's law. We
discuss this point in detail
Critical properties of the four-state Commutative Random Permutation Glassy Potts model in three and four dimensions
We investigate the critical properties of the four-state commutative random
permutation glassy Potts model in three and four dimensions by means of Monte
Carlo simulation and of a finite size scaling analysis. Thanks to the use of a
field programmable gate array we have been able to thermalize a large number of
samples of systems with large volume. This has allowed us to observe a
spin-glass ordered phase in d=4 and to study the critical properties of the
transition. In d=3, our results are consistent with the presence of a
Kosterlitz-Thouless transition, but we cannot exclude transient effects due to
a value of the lower critical dimension slightly below 3.Comment: 9 pages, 8 Postscript figure
Applications of a quantum random number generator to simulations in condense matter physics
We study the importance of the quality of random numbers in Monte Carlo simulations of 2D Ising systems. Simulations are carried out at critical temperature to find the dynamic scaling law of the linear relaxation time. Our aim is to show that statistical correlations that appear in large Ising simulations performed with pseudorandom numbers can be corrected using a quantum random number generator (QRNG). To achieve high speeds and large systems, Ising lattices are simulated on a field programmable gate array (FPGA) with an optical QRNG
A Parallel Hardware Architecture For Quantum Annealing Algorithm Acceleration
Quantum Annealing (QA) is an emerging technique, derived from Simulated Annealing, providing metaheuristics for multivariable optimisation problems. Studies have shown that it can be applied to solve NP-hard problems with faster convergence and better quality of result than other traditional heuristics, with potential applications in a variety of fields, from transport logistics to circuit synthesis and optimisation. In this paper, we present a hardware architecture implementing a QA-based solver for the Multidimensional Knapsack Problem, designed to improve the performance of the algorithm by exploiting parallelised computation. We synthesised the architecture using as a target an Altera FPGA board and simulated the execution for solving a set of benchmarks available in the literature. Simulation results show that the proposed implementation is about 100 times faster than a single-thread general-purpose CPU without impact on the accuracy of the solution
- …