21,891 research outputs found
Mixing multi-core CPUs and GPUs for scientific simulation software
Recent technological and economic developments have led to widespread availability of
multi-core CPUs and specialist accelerator processors such as graphical processing units
(GPUs). The accelerated computational performance possible from these devices can be very
high for some applications paradigms. Software languages and systems such as NVIDIA's
CUDA and Khronos consortium's open compute language (OpenCL) support a number of
individual parallel application programming paradigms. To scale up the performance of some
complex systems simulations, a hybrid of multi-core CPUs for coarse-grained parallelism and
very many core GPUs for data parallelism is necessary. We describe our use of hybrid applica-
tions using threading approaches and multi-core CPUs to control independent GPU devices.
We present speed-up data and discuss multi-threading software issues for the applications
level programmer and o er some suggested areas for language development and integration
between coarse-grained and ne-grained multi-thread systems. We discuss results from three
common simulation algorithmic areas including: partial di erential equations; graph cluster
metric calculations and random number generation. We report on programming experiences
and selected performance for these algorithms on: single and multiple GPUs; multi-core CPUs;
a CellBE; and using OpenCL. We discuss programmer usability issues and the outlook and
trends in multi-core programming for scienti c applications developers
Harvesting graphics power for MD simulations
We discuss an implementation of molecular dynamics (MD) simulations on a
graphic processing unit (GPU) in the NVIDIA CUDA language. We tested our code
on a modern GPU, the NVIDIA GeForce 8800 GTX. Results for two MD algorithms
suitable for short-ranged and long-ranged interactions, and a congruential
shift random number generator are presented. The performance of the GPU's is
compared to their main processor counterpart. We achieve speedups of up to 80,
40 and 150 fold, respectively. With newest generation of GPU's one can run
standard MD simulations at 10^7 flops/$.Comment: 12 pages, 5 figures. Submitted to Mol. Si
Accelerating Monte Carlo simulations with an NVIDIA® graphics processor
Modern graphics cards, commonly used in desktop computers, have evolved beyond a simple interface between processor and display to incorporate sophisticated calculation engines that can be applied to general purpose computing. The Monte Carlo algorithm for modelling photon transport in turbid media has been implemented on an NVIDIA® 8800gt graphics card using the CUDA toolkit. The Monte Carlo method relies on following the trajectory of millions of photons through the sample, often taking hours or days to complete. The graphics-processor implementation, processing roughly 110 million scattering events per second, was found to run more than 70 times faster than a similar, single-threaded implementation on a 2.67 GHz desktop computer
QCDGPU: open-source package for Monte Carlo lattice simulations on OpenCL-compatible multi-GPU systems
The multi-GPU open-source package QCDGPU for lattice Monte Carlo simulations
of pure SU(N) gluodynamics in external magnetic field at finite temperature and
O(N) model is developed. The code is implemented in OpenCL, tested on AMD and
NVIDIA GPUs, AMD and Intel CPUs and may run on other OpenCL-compatible devices.
The package contains minimal external library dependencies and is OS
platform-independent. It is optimized for heterogeneous computing due to the
possibility of dividing the lattice into non-equivalent parts to hide the
difference in performances of the devices used. QCDGPU has client-server part
for distributed simulations. The package is designed to produce lattice gauge
configurations as well as to analyze previously generated ones. QCDGPU may be
executed in fault-tolerant mode. Monte Carlo procedure core is based on PRNGCL
library for pseudo-random numbers generation on OpenCL-compatible devices,
which contains several most popular pseudo-random number generators.Comment: Presented at the Third International Conference "High Performance
Computing" (HPC-UA 2013), Kyiv, Ukraine; 9 pages, 2 figure
Pseudo-random number generators for Monte Carlo simulations on Graphics Processing Units
Basic uniform pseudo-random number generators are implemented on ATI Graphics
Processing Units (GPU). The performance results of the realized generators
(multiplicative linear congruential (GGL), XOR-shift (XOR128), RANECU, RANMAR,
RANLUX and Mersenne Twister (MT19937)) on CPU and GPU are discussed. The
obtained speed-up factor is hundreds of times in comparison with CPU. RANLUX
generator is found to be the most appropriate for using on GPU in Monte Carlo
simulations. The brief review of the pseudo-random number generators used in
modern software packages for Monte Carlo simulations in high-energy physics is
present.Comment: 31 pages, 9 figures, 3 table
- …