12,331 research outputs found
Mixing multi-core CPUs and GPUs for scientific simulation software
Recent technological and economic developments have led to widespread availability of
multi-core CPUs and specialist accelerator processors such as graphical processing units
(GPUs). The accelerated computational performance possible from these devices can be very
high for some applications paradigms. Software languages and systems such as NVIDIA's
CUDA and Khronos consortium's open compute language (OpenCL) support a number of
individual parallel application programming paradigms. To scale up the performance of some
complex systems simulations, a hybrid of multi-core CPUs for coarse-grained parallelism and
very many core GPUs for data parallelism is necessary. We describe our use of hybrid applica-
tions using threading approaches and multi-core CPUs to control independent GPU devices.
We present speed-up data and discuss multi-threading software issues for the applications
level programmer and o er some suggested areas for language development and integration
between coarse-grained and ne-grained multi-thread systems. We discuss results from three
common simulation algorithmic areas including: partial di erential equations; graph cluster
metric calculations and random number generation. We report on programming experiences
and selected performance for these algorithms on: single and multiple GPUs; multi-core CPUs;
a CellBE; and using OpenCL. We discuss programmer usability issues and the outlook and
trends in multi-core programming for scienti c applications developers
Tackling Exascale Software Challenges in Molecular Dynamics Simulations with GROMACS
GROMACS is a widely used package for biomolecular simulation, and over the
last two decades it has evolved from small-scale efficiency to advanced
heterogeneous acceleration and multi-level parallelism targeting some of the
largest supercomputers in the world. Here, we describe some of the ways we have
been able to realize this through the use of parallelization on all levels,
combined with a constant focus on absolute performance. Release 4.6 of GROMACS
uses SIMD acceleration on a wide range of architectures, GPU offloading
acceleration, and both OpenMP and MPI parallelism within and between nodes,
respectively. The recent work on acceleration made it necessary to revisit the
fundamental algorithms of molecular simulation, including the concept of
neighborsearching, and we discuss the present and future challenges we see for
exascale simulation - in particular a very fine-grained task parallelism. We
also discuss the software management, code peer review and continuous
integration testing required for a project of this complexity.Comment: EASC 2014 conference proceedin
Hardware acceleration of reaction-diffusion systems:a guide to optimisation of pattern formation algorithms using OpenACC
Reaction Diffusion Systems (RDS) have widespread applications in computational ecology, biology, computer graphics and the visual arts. For the former applications a major barrier to the development of effective simulation models is their computational complexity - it takes a great deal of processing power to simulate enough replicates such that reliable conclusions can be drawn. Optimizing the computation is thus highly desirable in order to obtain more results with less resources. Existing optimizations of RDS tend to be low-level and GPGPU based. Here we apply the higher-level OpenACC framework to two case studies: a simple RDS to learn the ‘workings’ of OpenACC and a more realistic and complex example. Our results show that simple parallelization directives and minimal data transfer can produce a useful performance improvement. The relative simplicity of porting OpenACC code between heterogeneous hardware is a key benefit to the scientific computing community in terms of speed-up and portability
A GPU-accelerated package for simulation of flow in nanoporous source rocks with many-body dissipative particle dynamics
Mesoscopic simulations of hydrocarbon flow in source shales are challenging,
in part due to the heterogeneous shale pores with sizes ranging from a few
nanometers to a few micrometers. Additionally, the sub-continuum fluid-fluid
and fluid-solid interactions in nano- to micro-scale shale pores, which are
physically and chemically sophisticated, must be captured. To address those
challenges, we present a GPU-accelerated package for simulation of flow in
nano- to micro-pore networks with a many-body dissipative particle dynamics
(mDPD) mesoscale model. Based on a fully distributed parallel paradigm, the
code offloads all intensive workloads on GPUs. Other advancements, such as
smart particle packing and no-slip boundary condition in complex pore
geometries, are also implemented for the construction and the simulation of the
realistic shale pores from 3D nanometer-resolution stack images. Our code is
validated for accuracy and compared against the CPU counterpart for speedup. In
our benchmark tests, the code delivers nearly perfect strong scaling and weak
scaling (with up to 512 million particles) on up to 512 K20X GPUs on Oak Ridge
National Laboratory's (ORNL) Titan supercomputer. Moreover, a single-GPU
benchmark on ORNL's SummitDev and IBM's AC922 suggests that the host-to-device
NVLink can boost performance over PCIe by a remarkable 40\%. Lastly, we
demonstrate, through a flow simulation in realistic shale pores, that the CPU
counterpart requires 840 Power9 cores to rival the performance delivered by our
package with four V100 GPUs on ORNL's Summit architecture. This simulation
package enables quick-turnaround and high-throughput mesoscopic numerical
simulations for investigating complex flow phenomena in nano- to micro-porous
rocks with realistic pore geometries
Sapporo2: A versatile direct -body library
Astrophysical direct -body methods have been one of the first production
algorithms to be implemented using NVIDIA's CUDA architecture. Now, almost
seven years later, the GPU is the most used accelerator device in astronomy for
simulating stellar systems. In this paper we present the implementation of the
Sapporo2 -body library, which allows researchers to use the GPU for -body
simulations with little to no effort. The first version, released five years
ago, is actively used, but lacks advanced features and versatility in numerical
precision and support for higher order integrators. In this updated version we
have rebuilt the code from scratch and added support for OpenCL,
multi-precision and higher order integrators. We show how to tune these codes
for different GPU architectures and present how to continue utilizing the GPU
optimal even when only a small number of particles () is integrated.
This careful tuning allows Sapporo2 to be faster than Sapporo1 even with the
added options and double precision data loads. The code runs on a range of
NVIDIA and AMD GPUs in single and double precision accuracy. With the addition
of OpenCL support the library is also able to run on CPUs and other
accelerators that support OpenCL.Comment: 15 pages, 7 figures. Accepted for publication in Computational
Astrophysics and Cosmolog
A Study of Speed of the Boundary Element Method as applied to the Realtime Computational Simulation of Biological Organs
In this work, possibility of simulating biological organs in realtime using
the Boundary Element Method (BEM) is investigated. Biological organs are
assumed to follow linear elastostatic material behavior, and constant boundary
element is the element type used. First, a Graphics Processing Unit (GPU) is
used to speed up the BEM computations to achieve the realtime performance.
Next, instead of the GPU, a computer cluster is used. Results indicate that BEM
is fast enough to provide for realtime graphics if biological organs are
assumed to follow linear elastostatic material behavior. Although the present
work does not conduct any simulation using nonlinear material models, results
from using the linear elastostatic material model imply that it would be
difficult to obtain realtime performance if highly nonlinear material models
that properly characterize biological organs are used. Although the use of BEM
for the simulation of biological organs is not new, the results presented in
the present study are not found elsewhere in the literature.Comment: preprint, draft, 2 tables, 47 references, 7 files, Codes that can
solve three dimensional linear elastostatic problems using constant boundary
elements (of triangular shape) while ignoring body forces are provided as
supplementary files; codes are distributed under the MIT License in three
versions: i) MATLAB version ii) Fortran 90 version (sequential code) iii)
Fortran 90 version (parallel code
- …