196 research outputs found
PyCUDA and PyOpenCL: A Scripting-Based Approach to GPU Run-Time Code Generation
High-performance computing has recently seen a surge of interest in
heterogeneous systems, with an emphasis on modern Graphics Processing Units
(GPUs). These devices offer tremendous potential for performance and efficiency
in important large-scale applications of computational science. However,
exploiting this potential can be challenging, as one must adapt to the
specialized and rapidly evolving computing environment currently exhibited by
GPUs. One way of addressing this challenge is to embrace better techniques and
develop tools tailored to their needs. This article presents one simple
technique, GPU run-time code generation (RTCG), along with PyCUDA and PyOpenCL,
two open-source toolkits that support this technique.
In introducing PyCUDA and PyOpenCL, this article proposes the combination of
a dynamic, high-level scripting language with the massive performance of a GPU
as a compelling two-tiered computing platform, potentially offering significant
performance and productivity advantages over conventional single-tier, static
systems. The concept of RTCG is simple and easily implemented using existing,
robust infrastructure. Nonetheless it is powerful enough to support (and
encourage) the creation of custom application-specific tools by its users. The
premise of the paper is illustrated by a wide range of examples where the
technique has been applied with considerable success.Comment: Submitted to Parallel Computing, Elsevie
Statistical analysis of network data and evolution on GPUs: High-performance statistical computing
Network analysis typically involves as set of repetitive tasks that are particularly amenable to poor-man's parallelization. This is therefore an ideal application are for GPU architectures, which help to alleviate the tedium inherent to statistically sound analysis of network data. Here we will illustrate the use of GPUs in a range of applications, which include percolation processes on networks, the evolution of protein-protein interaction networks, and the fusion of different types of biomedical and disease data in the context of molecular interaction networks. We will pay particular attention to the numerical performance of different routines that are frequently invoked in network analysis problems. We conclude with a review over recent developments in the generation of random numbers that address the specific requirements posed by GPUs and high-performance computing needs
High-Performance Multi-Mode Ptychography Reconstruction on Distributed GPUs
Ptychography is an emerging imaging technique that is able to provide
wavelength-limited spatial resolution from specimen with extended lateral
dimensions. As a scanning microscopy method, a typical two-dimensional image
requires a number of data frames. As a diffraction-based imaging technique, the
real-space image has to be recovered through iterative reconstruction
algorithms. Due to these two inherent aspects, a ptychographic reconstruction
is generally a computation-intensive and time-consuming process, which limits
the throughput of this method. We report an accelerated version of the
multi-mode difference map algorithm for ptychography reconstruction using
multiple distributed GPUs. This approach leverages available scientific
computing packages in Python, including mpi4py and PyCUDA, with the core
computation functions implemented in CUDA C. We find that interestingly even
with MPI collective communications, the weak scaling in the number of GPU nodes
can still remain nearly constant. Most importantly, for realistic diffraction
measurements, we observe a speedup ranging from a factor of to
depending on the data size, which reduces the reconstruction time remarkably
from hours to typically about 1 minute and is thus critical for real-time data
processing and visualization.Comment: work presented in NYSDS 201
PyCOOL - a Cosmological Object-Oriented Lattice code written in Python
There are a number of different phenomena in the early universe that have to
be studied numerically with lattice simulations. This paper presents a graphics
processing unit (GPU) accelerated Python program called PyCOOL that solves the
evolution of scalar fields in a lattice with very precise symplectic
integrators. The program has been written with the intention to hit a sweet
spot of speed, accuracy and user friendliness. This has been achieved by using
the Python language with the PyCUDA interface to make a program that is easy to
adapt to different scalar field models. In this paper we derive the symplectic
dynamics that govern the evolution of the system and then present the
implementation of the program in Python and PyCUDA. The functionality of the
program is tested in a chaotic inflation preheating model, a single field
oscillon case and in a supersymmetric curvaton model which leads to Q-ball
production. We have also compared the performance of a consumer graphics card
to a professional Tesla compute card in these simulations. We find that the
program is not only accurate but also very fast. To further increase the
usefulness of the program we have equipped it with numerous post-processing
functions that provide useful information about the cosmological model. These
include various spectra and statistics of the fields. The program can be
additionally used to calculate the generated curvature perturbation. The
program is publicly available under GNU General Public License at
https://github.com/jtksai/PyCOOL . Some additional information can be found
from http://www.physics.utu.fi/tiedostot/theory/particlecosmology/pycool/ .Comment: 23 pages, 12 figures; some typos correcte
- …