Search CORE

3,845 research outputs found

Parallel Unsmoothed Aggregation Algebraic Multigrid Algorithms on GPUs

Author: A Krechel
D Goddeke
G Haase
G Karypis
G Karypis
GE Blelloch
H Grossauer
H Sterck De
J Bolz
N Bell
O Axelsson
O Axelsson
PS Vassilevski
R Courant
TV Kolev
VE Henson
W Joubert
Publication venue
Publication date: 11/02/2013
Field of study

We design and implement a parallel algebraic multigrid method for isotropic graph Laplacian problems on multicore Graphical Processing Units (GPUs). The proposed AMG method is based on the aggregation framework. The setup phase of the algorithm uses a parallel maximal independent set algorithm in forming aggregates and the resulting coarse level hierarchy is then used in a K-cycle iteration solve phase with a

\ell^1

-Jacobi smoother. Numerical tests of a parallel implementation of the method for graphics processors are presented to demonstrate its effectiveness.Comment: 18 pages, 3 figure

arXiv.org e-Print Archive

Crossref

Pseudo-random number generators for Monte Carlo simulations on Graphics Processing Units

Author: Anderson
Anselmi
Babich
Block
Clark
Clark
Demchik
Di Pierro
Egri
James
James
James
Janke
Luscher
L‘Ecuyer
L‘Ecuyer
Marsaglia
Marsaglia
Marsaglia
Matsumoto
Panneton
Park
Preis
Thomas
Vadim Demchik
Yin
Publication venue: 'Elsevier BV'
Publication date: 09/03/2010
Field of study

Basic uniform pseudo-random number generators are implemented on ATI Graphics Processing Units (GPU). The performance results of the realized generators (multiplicative linear congruential (GGL), XOR-shift (XOR128), RANECU, RANMAR, RANLUX and Mersenne Twister (MT19937)) on CPU and GPU are discussed. The obtained speed-up factor is hundreds of times in comparison with CPU. RANLUX generator is found to be the most appropriate for using on GPU in Monte Carlo simulations. The brief review of the pseudo-random number generators used in modern software packages for Monte Carlo simulations in high-energy physics is present.Comment: 31 pages, 9 figures, 3 table

arXiv.org e-Print Archive

CiteSeerX

Crossref

SAPPORO: A way to turn your graphics cards into a GRAPE-6

Author: Aarseth
Anderson
Belleman
Dorband
Evghenii Gaburov
Fernando
Fernando
Ford
Gualandris
Harfst
Harfst
Heggie
Makino
Makino
Nitadori
Plummer
Portegies Zwart
Portegies Zwart
Portegies Zwart
Simon Portegies Zwart
Stefan Harfst
Sussman
van Meel
Publication venue: 'Elsevier BV'
Publication date: 01/01/2009
Field of study

We present Sapporo, a library for performing high-precision gravitational N-body simulations on NVIDIA Graphical Processing Units (GPUs). Our library mimics the GRAPE-6 library, and N-body codes currently running on GRAPE-6 can switch to Sapporo by a simple relinking of the library. The precision of our library is comparable to that of GRAPE-6, even though internally the GPU hardware is limited to single precision arithmetics. This limitation is effectively overcome by emulating double precision for calculating the distance between particles. The performance loss of this operation is small (< 20%) compared to the advantage of being able to run at high precision. We tested the library using several GRAPE-6-enabled N-body codes, in particular with Starlab and phiGRAPE. We measured peak performance of 800 Gflop/s for running with 10^6 particles on a PC with four commercial G92 architecture GPUs (two GeForce 9800GX2). As a production test, we simulated a 32k Plummer model with equal mass stars well beyond core collapse. The simulation took 41 days, during which the mean performance was 113 Gflop/s. The GPU did not show any problems from running in a production environment for such an extended period of time.Comment: 13 pages, 9 figures, accepted to New Astronom

arXiv.org e-Print Archive

Crossref

International Migration, Integration and Social Cohesion online publications

Lattice QCD based on OpenCL

Author: Bach Matthias
Lindenstruth Volker
Philipsen Owe
Pinke Christopher
Publication venue: 'Elsevier BV'
Publication date: 26/09/2012
Field of study

We present an OpenCL-based Lattice QCD application using a heatbath algorithm for the pure gauge case and Wilson fermions in the twisted mass formulation. The implementation is platform independent and can be used on AMD or NVIDIA GPUs, as well as on classical CPUs. On the AMD Radeon HD 5870 our double precision dslash implementation performs at 60 GFLOPS over a wide range of lattice sizes. The hybrid Monte-Carlo presented reaches a speedup of four over the reference code running on a server CPU.Comment: 19 pages, 11 figure

arXiv.org e-Print Archive

GSI Repository