90,358 research outputs found
FISH: A 3D parallel MHD code for astrophysical applications
FISH is a fast and simple ideal magneto-hydrodynamics code that scales to ~10
000 processes for a Cartesian computational domain of ~1000^3 cells. The
simplicity of FISH has been achieved by the rigorous application of the
operator splitting technique, while second order accuracy is maintained by the
symmetric ordering of the operators. Between directional sweeps, the
three-dimensional data is rotated in memory so that the sweep is always
performed in a cache-efficient way along the direction of contiguous memory.
Hence, the code only requires a one-dimensional description of the conservation
equations to be solved. This approach also enable an elegant novel
parallelisation of the code that is based on persistent communications with MPI
for cubic domain decomposition on machines with distributed memory. This scheme
is then combined with an additional OpenMP parallelisation of different sweeps
that can take advantage of clusters of shared memory. We document the detailed
implementation of a second order TVD advection scheme based on flux
reconstruction. The magnetic fields are evolved by a constrained transport
scheme. We show that the subtraction of a simple estimate of the hydrostatic
gradient from the total gradients can significantly reduce the dissipation of
the advection scheme in simulations of gravitationally bound hydrostatic
objects. Through its simplicity and efficiency, FISH is as well-suited for
hydrodynamics classes as for large-scale astrophysical simulations on
high-performance computer clusters. In preparation for the release of a public
version, we demonstrate the performance of FISH in a suite of astrophysically
orientated test cases.Comment: 27 pages, 11 figure
Parallel Tempering Simulation of the three-dimensional Edwards-Anderson Model with Compact Asynchronous Multispin Coding on GPU
Monte Carlo simulations of the Ising model play an important role in the
field of computational statistical physics, and they have revealed many
properties of the model over the past few decades. However, the effect of
frustration due to random disorder, in particular the possible spin glass
phase, remains a crucial but poorly understood problem. One of the obstacles in
the Monte Carlo simulation of random frustrated systems is their long
relaxation time making an efficient parallel implementation on state-of-the-art
computation platforms highly desirable. The Graphics Processing Unit (GPU) is
such a platform that provides an opportunity to significantly enhance the
computational performance and thus gain new insight into this problem. In this
paper, we present optimization and tuning approaches for the CUDA
implementation of the spin glass simulation on GPUs. We discuss the integration
of various design alternatives, such as GPU kernel construction with minimal
communication, memory tiling, and look-up tables. We present a binary data
format, Compact Asynchronous Multispin Coding (CAMSC), which provides an
additional speedup compared with the traditionally used Asynchronous
Multispin Coding (AMSC). Our overall design sustains a performance of 33.5
picoseconds per spin flip attempt for simulating the three-dimensional
Edwards-Anderson model with parallel tempering, which significantly improves
the performance over existing GPU implementations.Comment: 15 pages, 18 figure
NLSEmagic: Nonlinear Schr\"odinger Equation Multidimensional Matlab-based GPU-accelerated Integrators using Compact High-order Schemes
We present a simple to use, yet powerful code package called NLSEmagic to
numerically integrate the nonlinear Schr\"odinger equation in one, two, and
three dimensions. NLSEmagic is a high-order finite-difference code package
which utilizes graphic processing unit (GPU) parallel architectures. The codes
running on the GPU are many times faster than their serial counterparts, and
are much cheaper to run than on standard parallel clusters. The codes are
developed with usability and portability in mind, and therefore are written to
interface with MATLAB utilizing custom GPU-enabled C codes with the
MEX-compiler interface. The packages are freely distributed, including user
manuals and set-up files.Comment: 37 pages, 13 figure
An Efficient Cell List Implementation for Monte Carlo Simulation on GPUs
Maximizing the performance potential of the modern day GPU architecture
requires judicious utilization of available parallel resources. Although
dramatic reductions can often be obtained through straightforward mappings,
further performance improvements often require algorithmic redesigns to more
closely exploit the target architecture. In this paper, we focus on efficient
molecular simulations for the GPU and propose a novel cell list algorithm that
better utilizes its parallel resources. Our goal is an efficient GPU
implementation of large-scale Monte Carlo simulations for the grand canonical
ensemble. This is a particularly challenging application because there is
inherently less computation and parallelism than in similar applications with
molecular dynamics. Consistent with the results of prior researchers, our
simulation results show traditional cell list implementations for Monte Carlo
simulations of molecular systems offer effectively no performance improvement
for small systems [5, 14], even when porting to the GPU. However for larger
systems, the cell list implementation offers significant gains in performance.
Furthermore, our novel cell list approach results in better performance for all
problem sizes when compared with other GPU implementations with or without cell
lists.Comment: 30 page
Simulating spin models on GPU
Over the last couple of years it has been realized that the vast
computational power of graphics processing units (GPUs) could be harvested for
purposes other than the video game industry. This power, which at least
nominally exceeds that of current CPUs by large factors, results from the
relative simplicity of the GPU architectures as compared to CPUs, combined with
a large number of parallel processing units on a single chip. To benefit from
this setup for general computing purposes, the problems at hand need to be
prepared in a way to profit from the inherent parallelism and hierarchical
structure of memory accesses. In this contribution I discuss the performance
potential for simulating spin models, such as the Ising model, on GPU as
compared to conventional simulations on CPU.Comment: 5 pages, 4 figures, elsarticl
StochKit-FF: Efficient Systems Biology on Multicore Architectures
The stochastic modelling of biological systems is an informative, and in some
cases, very adequate technique, which may however result in being more
expensive than other modelling approaches, such as differential equations. We
present StochKit-FF, a parallel version of StochKit, a reference toolkit for
stochastic simulations. StochKit-FF is based on the FastFlow programming
toolkit for multicores and exploits the novel concept of selective memory. We
experiment StochKit-FF on a model of HIV infection dynamics, with the aim of
extracting information from efficiently run experiments, here in terms of
average and variance and, on a longer term, of more structured data.Comment: 14 pages + cover pag
The GENGA Code: Gravitational Encounters in N-body simulations with GPU Acceleration
We describe an open source GPU implementation of a hybrid symplectic N-body
integrator, GENGA (Gravitational ENcounters with Gpu Acceleration), designed to
integrate planet and planetesimal dynamics in the late stage of planet
formation and stability analyses of planetary systems. GENGA uses a hybrid
symplectic integrator to handle close encounters with very good energy
conservation, which is essential in long-term planetary system integration. We
extended the second order hybrid integration scheme to higher orders. The GENGA
code supports three simulation modes: Integration of up to 2048 massive bodies,
integration with up to a million test particles, or parallel integration of a
large number of individual planetary systems. We compare the results of GENGA
to Mercury and pkdgrav2 in respect of energy conservation and performance, and
find that the energy conservation of GENGA is comparable to Mercury and around
two orders of magnitude better than pkdgrav2. GENGA runs up to 30 times faster
than Mercury and up to eight times faster than pkdgrav2. GENGA is written in
CUDA C and runs on all NVIDIA GPUs with compute capability of at least 2.0.Comment: Accepted by ApJ. 18 pages, 17 figures, 4 table
- …