Search CORE

196 research outputs found

PyCUDA and PyOpenCL: A Scripting-Based Approach to GPU Run-Time Code Generation

Author: Ahmed Fasih
Andreas Klöckner
Bell
Bryan Catanzaro
Buck
Chandler
Dalcín
Eich
Feldman
Flanagan
Frigo
Group
Hestenes
Hesthaven
Kennedy
Klöckner
Lam
Langtangen
Lindholm
McCarthy
McCool
Nicolas Pinto
Oliphant
Owens
Paul Ivanov
Pinto
Pinto
Prud’homme
Reynders
Seiler
Stein
Valiant
van Hateren
Veldhuizen
Wang
Whaley
Yunsup Lee
Publication venue: 'Elsevier BV'
Publication date: 29/03/2011
Field of study

High-performance computing has recently seen a surge of interest in heterogeneous systems, with an emphasis on modern Graphics Processing Units (GPUs). These devices offer tremendous potential for performance and efficiency in important large-scale applications of computational science. However, exploiting this potential can be challenging, as one must adapt to the specialized and rapidly evolving computing environment currently exhibited by GPUs. One way of addressing this challenge is to embrace better techniques and develop tools tailored to their needs. This article presents one simple technique, GPU run-time code generation (RTCG), along with PyCUDA and PyOpenCL, two open-source toolkits that support this technique. In introducing PyCUDA and PyOpenCL, this article proposes the combination of a dynamic, high-level scripting language with the massive performance of a GPU as a compelling two-tiered computing platform, potentially offering significant performance and productivity advantages over conventional single-tier, static systems. The concept of RTCG is simple and easily implemented using existing, robust infrastructure. Nonetheless it is powerful enough to support (and encourage) the creation of custom application-specific tools by its users. The premise of the paper is illustrated by a wide range of examples where the technique has been applied with considerable success.Comment: Submitted to Parallel Computing, Elsevie

arXiv.org e-Print Archive

Crossref

Statistical analysis of network data and evolution on GPUs: High-performance statistical computing

Author: Michael P. H. Stumpf
Thomas W. Thorne
Publication venue
Publication date: 09/02/2012
Field of study

Network analysis typically involves as set of repetitive tasks that are particularly amenable to poor-man's parallelization. This is therefore an ideal application are for GPU architectures, which help to alleviate the tedium inherent to statistically sound analysis of network data. Here we will illustrate the use of GPUs in a range of applications, which include percolation processes on networks, the evolution of protein-protein interaction networks, and the fusion of different types of biomedical and disease data in the context of molecular interaction networks. We will pay particular attention to the numerical performance of different routines that are frequently invoked in network analysis problems. We conclude with a review over recent developments in the generation of random numbers that address the specific requirements posed by GPUs and high-performance computing needs

Nature Precedings

High-Performance Multi-Mode Ptychography Reconstruction on Distributed GPUs

Author: Campbell Stuart I.
Chu Yong S.
Dong Zhihua
Fang Yao-Lung L.
Ha Sungsoo
Huang Xiaojing
Lin Meifeng
Xu Wei
Yan Hanfei
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/08/2018
Field of study

Ptychography is an emerging imaging technique that is able to provide wavelength-limited spatial resolution from specimen with extended lateral dimensions. As a scanning microscopy method, a typical two-dimensional image requires a number of data frames. As a diffraction-based imaging technique, the real-space image has to be recovered through iterative reconstruction algorithms. Due to these two inherent aspects, a ptychographic reconstruction is generally a computation-intensive and time-consuming process, which limits the throughput of this method. We report an accelerated version of the multi-mode difference map algorithm for ptychography reconstruction using multiple distributed GPUs. This approach leverages available scientific computing packages in Python, including mpi4py and PyCUDA, with the core computation functions implemented in CUDA C. We find that interestingly even with MPI collective communications, the weak scaling in the number of GPU nodes can still remain nearly constant. Most importantly, for realistic diffraction measurements, we observe a speedup ranging from a factor of

10

10^3

depending on the data size, which reduces the reconstruction time remarkably from hours to typically about 1 minute and is thus critical for real-time data processing and visualization.Comment: work presented in NYSDS 201

arXiv.org e-Print Archive

Crossref

PyCOOL - a Cosmological Object-Oriented Lattice code written in Python

Author: A. Chambers
A. Chambers
A. {Klöckner} .
A.V. Frolov
D. Groen
E. Gaburov
G. Khanna
G.N. Felder
H.-Y. Schive
J Sainio
K.-I. Ishikawa
M.A. Amin
N. Nakasato
NVIDIA
P. Micikevicius
R. Capuzzo-Dolcetta
R. Easther
R.J. Brunner
S. Banerjee
S. Ord .
S. von Hoerner
S. von Hoerner
S.K. Chung
T. Hiramatsu
T. {Szalay}
V. Anselmi
V. Demchik
Publication venue: 'IOP Publishing'
Publication date: 30/04/2012
Field of study

There are a number of different phenomena in the early universe that have to be studied numerically with lattice simulations. This paper presents a graphics processing unit (GPU) accelerated Python program called PyCOOL that solves the evolution of scalar fields in a lattice with very precise symplectic integrators. The program has been written with the intention to hit a sweet spot of speed, accuracy and user friendliness. This has been achieved by using the Python language with the PyCUDA interface to make a program that is easy to adapt to different scalar field models. In this paper we derive the symplectic dynamics that govern the evolution of the system and then present the implementation of the program in Python and PyCUDA. The functionality of the program is tested in a chaotic inflation preheating model, a single field oscillon case and in a supersymmetric curvaton model which leads to Q-ball production. We have also compared the performance of a consumer graphics card to a professional Tesla compute card in these simulations. We find that the program is not only accurate but also very fast. To further increase the usefulness of the program we have equipped it with numerous post-processing functions that provide useful information about the cosmological model. These include various spectra and statistics of the fields. The program can be additionally used to calculate the generated curvature perturbation. The program is publicly available under GNU General Public License at https://github.com/jtksai/PyCOOL . Some additional information can be found from http://www.physics.utu.fi/tiedostot/theory/particlecosmology/pycool/ .Comment: 23 pages, 12 figures; some typos correcte

arXiv.org e-Print Archive

Crossref

CUDArray: CUDA-based NumPy

Author: Larsen Anders Boesen Lindbo
Publication venue: Technical University of Denmark
Publication date: 01/01/2014
Field of study

Online Research Database In Technology