2,680 research outputs found
A scalable parallel Monte Carlo algorithm for atomistic simulations of precipitation in alloys
We present an extension of the semi-grandcanonical (SGC) ensemble that we
refer to as the variance-constrained semi-grandcanonical (VC-SGC) ensemble. It
allows for transmutation Monte Carlo simulations of multicomponent systems in
multiphase regions of the phase diagram and lends itself to scalable
simulations on massively parallel platforms. By combining transmutation moves
with molecular dynamics steps structural relaxations and thermal vibrations in
realistic alloys can be taken into account. In this way, we construct a robust
and efficient simulation technique that is ideally suited for large-scale
simulations of precipitation in multicomponent systems in the presence of
structural disorder. To illustrate the algorithm introduced in this work, we
study the precipitation of Cu in nanocrystalline Fe.Comment: 12 pages; 10 figure
QCD simulations with staggered fermions on GPUs
We report on our implementation of the RHMC algorithm for the simulation of
lattice QCD with two staggered flavors on Graphics Processing Units, using the
NVIDIA CUDA programming language. The main feature of our code is that the GPU
is not used just as an accelerator, but instead the whole Molecular Dynamics
trajectory is performed on it. After pointing out the main bottlenecks and how
to circumvent them, we discuss the obtained performances. We present some
preliminary results regarding OpenCL and multiGPU extensions of our code and
discuss future perspectives.Comment: 22 pages, 14 eps figures, final version to be published in Computer
Physics Communication
SKIRT: hybrid parallelization of radiative transfer simulations
We describe the design, implementation and performance of the new hybrid
parallelization scheme in our Monte Carlo radiative transfer code SKIRT, which
has been used extensively for modeling the continuum radiation of dusty
astrophysical systems including late-type galaxies and dusty tori. The hybrid
scheme combines distributed memory parallelization, using the standard Message
Passing Interface (MPI) to communicate between processes, and shared memory
parallelization, providing multiple execution threads within each process to
avoid duplication of data structures. The synchronization between multiple
threads is accomplished through atomic operations without high-level locking
(also called lock-free programming). This improves the scaling behavior of the
code and substantially simplifies the implementation of the hybrid scheme. The
result is an extremely flexible solution that adjusts to the number of
available nodes, processors and memory, and consequently performs well on a
wide variety of computing architectures.Comment: 21 pages, 20 figure
- …