Search CORE

1,053 research outputs found

Mixing multi-core CPUs and GPUs for scientific simulation software

Author: Hawick K.A.
Leist A.
Playne D.P.
Publication venue: 'Massey University'
Publication date: 01/01/2010
Field of study

Recent technological and economic developments have led to widespread availability of multi-core CPUs and specialist accelerator processors such as graphical processing units (GPUs). The accelerated computational performance possible from these devices can be very high for some applications paradigms. Software languages and systems such as NVIDIA's CUDA and Khronos consortium's open compute language (OpenCL) support a number of individual parallel application programming paradigms. To scale up the performance of some complex systems simulations, a hybrid of multi-core CPUs for coarse-grained parallelism and very many core GPUs for data parallelism is necessary. We describe our use of hybrid applica- tions using threading approaches and multi-core CPUs to control independent GPU devices. We present speed-up data and discuss multi-threading software issues for the applications level programmer and o er some suggested areas for language development and integration between coarse-grained and ne-grained multi-thread systems. We discuss results from three common simulation algorithmic areas including: partial di erential equations; graph cluster metric calculations and random number generation. We report on programming experiences and selected performance for these algorithms on: single and multiple GPUs; multi-core CPUs; a CellBE; and using OpenCL. We discuss programmer usability issues and the outlook and trends in multi-core programming for scienti c applications developers

Massey Research Online

Developing Efﬁcient Discrete Simulations on Multicore and GPU Architectures

Author: Cagigas Muñiz Daniel
Díaz del Río Fernando
Guisado Lízar José Luís
Jiménez-Morales Francisco de Paula
López-Torres Manuel Ramón
Publication venue: 'MDPI AG'
Publication date: 01/01/2020
Field of study

In this paper we show how to efﬁciently implement parallel discrete simulations on multicoreandGPUarchitecturesthrougharealexampleofanapplication: acellularautomatamodel of laser dynamics. We describe the techniques employed to build and optimize the implementations using OpenMP and CUDA frameworks. We have evaluated the performance on two different hardware platforms that represent different target market segments: high-end platforms for scientiﬁc computing, using an Intel Xeon Platinum 8259CL server with 48 cores, and also an NVIDIA Tesla V100GPU,bothrunningonAmazonWebServer(AWS)Cloud;and on a consumer-oriented platform, using an Intel Core i9 9900k CPU and an NVIDIA GeForce GTX 1050 TI GPU. Performance results were compared and analyzed in detail. We show that excellent performance and scalability can be obtained in both platforms, and we extract some important issues that imply a performance degradation for them. We also found that current multicore CPUs with large core numbers can bring a performance very near to that of GPUs, and even identical in some cases.Ministerio de Economía, Industria y Competitividad, Gobierno de España (MINECO), and the Agencia Estatal de Investigación (AEI) of Spain, coﬁnanced by FEDER funds (EU) TIN2017-89842

idUS. Depósito de Investigación Universidad de Sevilla

Accelerating Monte Carlo simulations with an NVIDIA® graphics processor

Author: Blaschke Johannes
Jordan Robert
Künnemeyer Rainer
Martinsen Paul
Publication venue: 'Elsevier BV'
Publication date: 01/01/2009
Field of study

Modern graphics cards, commonly used in desktop computers, have evolved beyond a simple interface between processor and display to incorporate sophisticated calculation engines that can be applied to general purpose computing. The Monte Carlo algorithm for modelling photon transport in turbid media has been implemented on an NVIDIA® 8800gt graphics card using the CUDA toolkit. The Monte Carlo method relies on following the trajectory of millions of photons through the sample, often taking hours or days to complete. The graphics-processor implementation, processing roughly 110 million scattering events per second, was found to run more than 70 times faster than a similar, single-threaded implementation on a 2.67 GHz desktop computer

Research Commons@Waikato

Development of a GPU-based Monte Carlo dose calculation code for coupled electron-photon transport

Author: Amitava Majumdar
Bielajew A F Hirayama H Nelson W R Rogers D W O
Briesmeister J F
Dongju Choi
Gu X
Gu X
Jacques R Taylor R Wong J McNutt T
Josep Sempau
Lensch H Strzodka R
Li J S
Li M
Ma C M
Men C
Nelson W R Hirayama H Rogers D W O
NVIDIA
Salvat F Fernández-Varea J M Baró J Sempau J
Salvat F Fernández-Varea J M Sempau J
Sempau J
Sharp G C
Steve B Jiang
Woodcock E Murphy T Hemmings P Longworth S
Xu F
Xuejun Gu
Xun Jia
Yan G R
Publication venue: 'IOP Publishing'
Publication date: 22/03/2010
Field of study

Monte Carlo simulation is the most accurate method for absorbed dose calculations in radiotherapy. Its efficiency still requires improvement for routine clinical applications, especially for online adaptive radiotherapy. In this paper, we report our recent development on a GPU-based Monte Carlo dose calculation code for coupled electron-photon transport. We have implemented the Dose Planning Method (DPM) Monte Carlo dose calculation package (Sempau et al, Phys. Med. Biol., 45(2000)2263-2291) on GPU architecture under CUDA platform. The implementation has been tested with respect to the original sequential DPM code on CPU in phantoms with water-lung-water or water-bone-water slab geometry. A 20 MeV mono-energetic electron point source or a 6 MV photon point source is used in our validation. The results demonstrate adequate accuracy of our GPU implementation for both electron and photon beams in radiotherapy energy range. Speed up factors of about 5.0 ~ 6.6 times have been observed, using an NVIDIA Tesla C1060 GPU card against a 2.27GHz Intel Xeon CPU processor.Comment: 13 pages, 3 figures, and 1 table. Paper revised. Figures update

arXiv.org e-Print Archive

Crossref

Pseudo-Random Streams for Distributed and Parallel Stochastic Simulations on GP-GPU

Author: C Mazel
D R C Hill
Entacher K
Hellekalek P
J Passerat-Palmbach
Kirk D
Langdon W
Langdon W
Matsumoto M
Saito M
Traore M
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

International audienceRandom number generation is a key element of stochastic simulations. It has been widely studied for sequential applications purposes, enabling us to reliably use pseudo-random numbers in this case. Unfortunately, we cannot be so enthusiastic when dealing with parallel stochastic simulations. Many applications still neglect random stream parallelization, leading to potentially biased results. In particular parallel execution platforms, such as Graphics Processing Units (GPUs), add their constraints to those of Pseudo-Random Number Generators (PRNGs) used in parallel. This results in a situation where potential biases can be combined with performance drops when parallelization of random streams has not been carried out rigorously. Here, we propose criteria guiding the design of good GPU-enabled PRNGs. We enhance our comments with a study of the techniques aiming to parallelize random streams correctly, in the context of GPU-enabled stochastic simulations

Crossref

HAL Clermont Université