Search CORE

1,888 research outputs found

A Multi-GPU Programming Library for Real-Time Applications

Author: Schaetz Sebastian
Uecker Martin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

We present MGPU, a C++ programming library targeted at single-node multi-GPU systems. Such systems combine disproportionate floating point performance with high data locality and are thus well suited to implement real-time algorithms. We describe the library design, programming interface and implementation details in light of this specific problem domain. The core concepts of this work are a novel kind of container abstraction and MPI-like communication methods for intra-system communication. We further demonstrate how MGPU is used as a framework for porting existing GPU libraries to multi-device architectures. Putting our library to the test, we accelerate an iterative non-linear image reconstruction algorithm for real-time magnetic resonance imaging using multiple GPUs. We achieve a speed-up of about 1.7 using 2 GPUs and reach a final speed-up of 2.1 with 4 GPUs. These promising results lead us to conclude that multi-GPU systems are a viable solution for real-time MRI reconstruction as well as signal-processing applications in general.Comment: 15 pages, 10 figure

arXiv.org e-Print Archive

MPG.PuRe

Accelerated Modeling of Near and Far-Field Diffraction for Coronagraphic Optical Systems

Author: Abdellah
Akeret
Cooke
Cooley
Douglas
Douglas
Fangohr
Frigo
Greenfield
Greenfield
Hirst
Jones
Kluyver
Lawrence
Lumbres
Macintosh
Marois
Mendillo
Morgan
Noecker
Pavlyk
Perrin
Shimobaba
Soummer
Steinbach
Stone
Yamamoto
Publication venue: 'SPIE-Intl Soc Optical Eng'
Publication date: 17/06/2018
Field of study

Accurately predicting the performance of coronagraphs and tolerancing optical surfaces for high-contrast imaging requires a detailed accounting of diffraction effects. Unlike simple Fraunhofer diffraction modeling, near and far-field diffraction effects, such as the Talbot effect, are captured by plane-to-plane propagation using Fresnel and angular spectrum propagation. This approach requires a sequence of computationally intensive Fourier transforms and quadratic phase functions, which limit the design and aberration sensitivity parameter space which can be explored at high-fidelity in the course of coronagraph design. This study presents the results of optimizing the multi-surface propagation module of the open source Physical Optics Propagation in PYthon (POPPY) package. This optimization was performed by implementing and benchmarking Fourier transforms and array operations on graphics processing units, as well as optimizing multithreaded numerical calculations using the NumExpr python library where appropriate, to speed the end-to-end simulation of observatory and coronagraph optical systems. Using realistic systems, this study demonstrates a greater than five-fold decrease in wall-clock runtime over POPPY's previous implementation and describes opportunities for further improvements in diffraction modeling performance.Comment: Presented at SPIE ASTI 2018, Austin Texas. 11 pages, 6 figure

arXiv.org e-Print Archive

Crossref

Application of graphics processing units to search pipelines for gravitational waves from coalescing binaries of compact objects

Author: Blair David
Cannon Kipp
Chung Shin Kee
Datta Amitava
Wen Linqing
Publication venue: 'AIP Publishing'
Publication date: 07/07/2010
Field of study

We report a novel application of a graphics processing unit (GPU) for the purpose of accelerating the search pipelines for gravitational waves from coalescing binaries of compact objects. A speed-up of 16-fold in total has been achieved with an NVIDIA GeForce 8800 Ultra GPU card compared with one core of a 2.5 GHz Intel Q9300 central processing unit (CPU). We show that substantial improvements are possible and discuss the reduction in CPU count required for the detection of inspiral sources afforded by the use of GPUs

Caltech Authors

TensorFlow Doing HPC

Author: Bulatov Yaroslav
Chien Steven W. D.
Laure Erwin
Markidis Stefano
Olshevsky Vyacheslav
Vetter Jeffrey S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 11/03/2019
Field of study

TensorFlow is a popular emerging open-source programming framework supporting the execution of distributed applications on heterogeneous hardware. While TensorFlow has been initially designed for developing Machine Learning (ML) applications, in fact TensorFlow aims at supporting the development of a much broader range of application kinds that are outside the ML domain and can possibly include HPC applications. However, very few experiments have been conducted to evaluate TensorFlow performance when running HPC workloads on supercomputers. This work addresses this lack by designing four traditional HPC benchmark applications: STREAM, matrix-matrix multiply, Conjugate Gradient (CG) solver and Fast Fourier Transform (FFT). We analyze their performance on two supercomputers with accelerators and evaluate the potential of TensorFlow for developing HPC applications. Our tests show that TensorFlow can fully take advantage of high performance networks and accelerators on supercomputers. Running our TensorFlow STREAM benchmark, we obtain over 50% of theoretical communication bandwidth on our testing platform. We find an approximately 2x, 1.7x and 1.8x performance improvement when increasing the number of GPUs from two to four in the matrix-matrix multiply, CG and FFT applications respectively. All our performance results demonstrate that TensorFlow has high potential of emerging also as HPC programming framework for heterogeneous supercomputers.Comment: Accepted for publication at The Ninth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES'19

arXiv.org e-Print Archive

Crossref

Landau Gauge Fixing on GPUs

Author: Babich
Bogolubsky
Boucaud
Bowman
Cardoso
Clark
Cucchieri
Cucchieri
Davies
Dudal
Edwards
Elitzur
Furui
Giusti
Martinelli
Nuno Cardoso
Oliveira
Oliveira
Oliveira
Orlando Oliveira
Paulo J. Silva
Pedro Bicudo
Schrock
Publication venue: 'Elsevier BV'
Publication date: 08/10/2012
Field of study

In this paper we present and explore the performance of Landau gauge fixing in GPUs using CUDA. We consider the steepest descent algorithm with Fourier acceleration, and compare the GPU performance with a parallel CPU implementation. Using

32^4

lattice volumes, we find that the computational power of a single Tesla C2070 GPU is equivalent to approximately 256 CPU cores.Comment: 10 pages, 3 figures and 3 table

arXiv.org e-Print Archive

Crossref