Search CORE

26,230 research outputs found

A pilgrimage to gravity on GPUs

Author: A. Ahmad
A. Gualandris
A. Tanikawa
E. Gaburov
E. Holmberg
E.N. Dorband
G.J. Sussman
J. Barnes
J. Bédorf
J. Bédorf
J. Goodman
J. Makino
J.H. Applegate
J.R. Hurley
K. Nitadori
L. Nyland
M. Fujii
P. Hut
R. Spurzem
R. Spurzem
R. Spurzem
R. Yokota
R.G. Belleman
R.H. Miller
S. Harfst
S. Inagaki
S. Portegies Zwart
S. Portegies Zwart
S. Portegies Zwart
S. von Hoerner
S.F. Portegies Zwart
S.F. Portegies Zwart
S.J. Aarseth
S.J. Aarseth
T. Fukushige
T.S. van Albada
W. Dehnen
W. Dehnen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/04/2012
Field of study

In this short review we present the developments over the last 5 decades that have led to the use of Graphics Processing Units (GPUs) for astrophysical simulations. Since the introduction of NVIDIA's Compute Unified Device Architecture (CUDA) in 2007 the GPU has become a valuable tool for N-body simulations and is so popular these days that almost all papers about high precision N-body simulations use methods that are accelerated by GPUs. With the GPU hardware becoming more advanced and being used for more advanced algorithms like gravitational tree-codes we see a bright future for GPU like hardware in computational astrophysics.Comment: To appear in: European Physical Journal "Special Topics" : "Computer Simulations on Graphics Processing Units" . 18 pages, 8 figure

arXiv.org e-Print Archive

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

Leiden University Scholary Publications

GeNN: a code generation framework for accelerated brain simulations

Author: AJ Cope
C Rossant
DF Goodman
DF Goodman
E Ros
EM Izhikevich
EM Izhikevich
HÜ Dinkelbach
I Raikov
J Baladron
JM Nageswaran
MA Swertz
ML Hines
NF Rulkov
P Gleeson
R Brette
SC Eisenstat
T Nowotny
T Nowotny
VK Pallipuram
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/05/2015
Field of study

Large-scale numerical simulations of detailed brain circuit models are important for identifying hypotheses on brain functions and testing their consistency and plausibility. An ongoing challenge for simulating realistic models is, however, computational speed. In this paper, we present the GeNN (GPU-enhanced Neuronal Networks) framework, which aims to facilitate the use of graphics accelerators for computational models of large-scale neuronal networks to address this challenge. GeNN is an open source library that generates code to accelerate the execution of network simulations on NVIDIA GPUs, through a flexible and extensible interface, which does not require in-depth technical knowledge from the users. We present performance benchmarks showing that 200-fold speedup compared to a single core of a CPU can be achieved for a network of one million conductance based Hodgkin-Huxley neurons but that for other models the speedup can differ. GeNN is available for Linux, Mac OS X and Windows platforms. The source code, user manual, tutorials, Wiki, in-depth example projects and all other related information can be found on the project website http://genn-team.github.io/genn/

Crossref

PubMed Central

Sussex Research Online

Higher-order CFD and Interface Tracking Methods on Highly-Parallel MPI and GPU systems

Author: Appleyard J.
Drikakis Dimitris
Publication venue: 'Elsevier BV'
Publication date: 01/07/2011
Field of study

A computational investigation of the effects on parallel performance of higher-order accurate schemes was carried out on two different computational systems: a traditional CPU based MPI cluster and a system of four Graphics Processing Units (GPUs) controlled by a single quad-core CPU. The investigation was based on the solution of the level set equations for interface tracking using a High-Order Upstream Central (HOUC) scheme. Different variants of the HOUC scheme were employed together with a 3rd-order TVD Runge-Kutta time integration. An increase in performance of two orders of magnitude was seen when comparing a single CPU core to a single GPU with a greater increase at higher orders of accuracy and at lower precision

Cranfield CERES

Computational Physics on Graphics Processing Units

Author: A. Asadchev
A. Castro
A. Harju
A. Harju
A. McAdams
A.G. Anderson
A.P. Lyubartsev
A.W. Götz
B.L. Tembre
C. Bonati
C. McNeile
C.M. Isborn
D.J. Hardy
E. Darve
G. Bhanot
G. Egri
G. Kresse
H.J. Rothe
I. Montvay
I. Samish
I. Ufimtsev
I.S. Ufimtsev
I.S. Ufimtsev
I.S. Ufimtsev
J. Enkovaara
J. Gao
J. Hubbard
J.A. Anderson
J.A. McCammon
J.E. Stone
J.S. Meredith
K. Esler
K. Moreland
K. Yasuda
K. Yasuda
L. Genovese
L. Genovese
L. Greengard
L. Gu
L. Ha
M. Bordag
M. Göckeler
M. Hasenbusch
M. Hutchinson
M. Macedonia
M.C. Gutzwiller
M.C. Payne
M.P. Allen
N. Cardoso
N. Goodnight
N. Luehr
N.A. Gumerov
P. Giannozzi
P. Kipfer
P. Petreczky
R. Parr
R.D. Mawhinney
R.D. Skeel
R.G. Belleman
S. Hakala
S. Ihnatsenka
S. Maintz
T. Shirakawa
T. Siro
T. Takahashi
T.W. Chiu
V. Rokhlin
V. Springel
W. Jia
W. Kohn
W.M.C. Foulkes
X. Andrade
Y. Aoki
Y. Chen
Z. Fodor
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

The use of graphics processing units for scientific computations is an emerging strategy that can significantly speed up various different algorithms. In this review, we discuss advances made in the field of computational physics, focusing on classical molecular dynamics, and on quantum simulations for electronic structure calculations using the density functional theory, wave function techniques, and quantum field theory.Comment: Proceedings of the 11th International Conference, PARA 2012, Helsinki, Finland, June 10-13, 201

arXiv.org e-Print Archive

Crossref

Algorithmic patterns for $\mathcal{H}$ -matrices on many-core processors

Author: Zaspel Peter
Publication venue
Publication date: 01/01/2017
Field of study

In this work, we consider the reformulation of hierarchical (

\mathcal{H}

) matrix algorithms for many-core processors with a model implementation on graphics processing units (GPUs).

\mathcal{H}

matrices approximate specific dense matrices, e.g., from discretized integral equations or kernel ridge regression, leading to log-linear time complexity in dense matrix-vector products. The parallelization of

\mathcal{H}

matrix operations on many-core processors is difficult due to the complex nature of the underlying algorithms. While previous algorithmic advances for many-core hardware focused on accelerating existing

\mathcal{H}

matrix CPU implementations by many-core processors, we here aim at totally relying on that processor type. As main contribution, we introduce the necessary parallel algorithmic patterns allowing to map the full

\mathcal{H}

matrix construction and the fast matrix-vector product to many-core hardware. Here, crucial ingredients are space filling curves, parallel tree traversal and batching of linear algebra operations. The resulting model GPU implementation hmglib is the, to the best of the authors knowledge, first entirely GPU-based Open Source

\mathcal{H}

matrix library of this kind. We conclude this work by an in-depth performance analysis and a comparative performance study against a standard

\mathcal{H}

matrix library, highlighting profound speedups of our many-core parallel approach

arXiv.org e-Print Archive

edoc

Sapporo2: A versatile direct $N$ -body library

Author: Bédorf Jeroen
Gaburov Evghenii
Zwart Simon Portegies
Publication venue
Publication date: 01/01/2015
Field of study

Astrophysical direct

N

-body methods have been one of the first production algorithms to be implemented using NVIDIA's CUDA architecture. Now, almost seven years later, the GPU is the most used accelerator device in astronomy for simulating stellar systems. In this paper we present the implementation of the Sapporo2

N

-body library, which allows researchers to use the GPU for

N

-body simulations with little to no effort. The first version, released five years ago, is actively used, but lacks advanced features and versatility in numerical precision and support for higher order integrators. In this updated version we have rebuilt the code from scratch and added support for OpenCL, multi-precision and higher order integrators. We show how to tune these codes for different GPU architectures and present how to continue utilizing the GPU optimal even when only a small number of particles (

N < 100

) is integrated. This careful tuning allows Sapporo2 to be faster than Sapporo1 even with the added options and double precision data loads. The code runs on a range of NVIDIA and AMD GPUs in single and double precision accuracy. With the addition of OpenCL support the library is also able to run on CPUs and other accelerators that support OpenCL.Comment: 15 pages, 7 figures. Accepted for publication in Computational Astrophysics and Cosmolog

arXiv.org e-Print Archive

Springer - Publisher Connector

Leiden University Scholary Publications

PyCUDA and PyOpenCL: A Scripting-Based Approach to GPU Run-Time Code Generation

Author: Ahmed Fasih
Andreas Klöckner
Bell
Bryan Catanzaro
Buck
Chandler
Dalcín
Eich
Feldman
Flanagan
Frigo
Group
Hestenes
Hesthaven
Kennedy
Klöckner
Lam
Langtangen
Lindholm
McCarthy
McCool
Nicolas Pinto
Oliphant
Owens
Paul Ivanov
Pinto
Pinto
Prud’homme
Reynders
Seiler
Stein
Valiant
van Hateren
Veldhuizen
Wang
Whaley
Yunsup Lee
Publication venue: 'Elsevier BV'
Publication date: 29/03/2011
Field of study

High-performance computing has recently seen a surge of interest in heterogeneous systems, with an emphasis on modern Graphics Processing Units (GPUs). These devices offer tremendous potential for performance and efficiency in important large-scale applications of computational science. However, exploiting this potential can be challenging, as one must adapt to the specialized and rapidly evolving computing environment currently exhibited by GPUs. One way of addressing this challenge is to embrace better techniques and develop tools tailored to their needs. This article presents one simple technique, GPU run-time code generation (RTCG), along with PyCUDA and PyOpenCL, two open-source toolkits that support this technique. In introducing PyCUDA and PyOpenCL, this article proposes the combination of a dynamic, high-level scripting language with the massive performance of a GPU as a compelling two-tiered computing platform, potentially offering significant performance and productivity advantages over conventional single-tier, static systems. The concept of RTCG is simple and easily implemented using existing, robust infrastructure. Nonetheless it is powerful enough to support (and encourage) the creation of custom application-specific tools by its users. The premise of the paper is illustrated by a wide range of examples where the technique has been applied with considerable success.Comment: Submitted to Parallel Computing, Elsevie

arXiv.org e-Print Archive

Crossref