Search CORE

210 research outputs found

Pragmatic Performance Portability with OpenMP 4.x

Author: Gaudin Wayne
Martineau Matt
McIntosh-Smith Simon
Price James
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Crossref

Explore Bristol Research

Learning from the Success of MPI

Author: A. Geist
A. Skjellum
C.H. Koelbel
J. Boyle
J. Cownie
J. Dongarra
J.L. Traeff
K. Krechmer
Message Passing Interface Forum
Message Passing Interface Forum MPI2
N. Carriero
O. Zaki
P.B. Hansen
R. Hempel
R.C. Whaley
R.W. Numrich
W. Gropp
W. Gropp
W.W. Carlson
Publication venue
Publication date: 01/01/2001
Field of study

The Message Passing Interface (MPI) has been extremely successful as a portable way to program high-performance parallel computers. This success has occurred in spite of the view of many that message passing is difficult and that other approaches, including automatic parallelization and directive-based parallelism, are easier to use. This paper argues that MPI has succeeded because it addresses all of the important issues in providing a parallel programming model.Comment: 12 pages, 1 figur

arXiv.org e-Print Archive

CiteSeerX

Crossref

UNT Digital Library

CoreTSAR: Task Scheduling for Accelerator-aware Runtimes

Author: de Supinski Bronis R.
Feng Wu-chun
Rountree Barry
Scogland Thomas R. W.
Publication venue
Publication date: 01/01/2012
Field of study

Heterogeneous supercomputers that incorporate computational accelerators such as GPUs are increasingly popular due to their high peak performance, energy efficiency and comparatively low cost. Unfortunately, the programming models and frameworks designed to extract performance from all computational units still lack the flexibility of their CPU-only counterparts. Accelerated OpenMP improves this situation by supporting natural migration of OpenMP code from CPUs to a GPU. However, these implementations currently lose one of OpenMP’s best features, its flexibility: typical OpenMP applications can run on any number of CPUs. GPU implementations do not transparently employ multiple GPUs on a node or a mix of GPUs and CPUs. To address these shortcomings, we present CoreTSAR, our runtime library for dynamically scheduling tasks across heterogeneous resources, and propose straightforward extensions that incorporate this functionality into Accelerated OpenMP. We show that our approach can provide nearly linear speedup to four GPUs over only using CPUs or one GPU while increasing the overall flexibility of Accelerated OpenMP

Computer Science Technical Reports @Virginia Tech

Tracking Performance Portability on the Yellow Brick Road to Exascale

Author: Deakin Tom J
Lin Tom
Mcintosh-Smith Simon N
Poenaru Andrei
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/11/2020
Field of study

Explore Bristol Research

Libpsht - algorithms for efficient spherical harmonic transforms

Author: Bluestein
Doroshkevich
Driscoll
Frigo
Górski
Huffenberger
Kostelec
Lewis
M. Reinecke
Prézeau
Reinecke
Wiaux
Publication venue: 'EDP Sciences'
Publication date: 07/01/2011
Field of study

Libpsht (or "library for Performant Spherical Harmonic Transforms") is a collection of algorithms for efficient conversion between spatial-domain and spectral-domain representations of data defined on the sphere. The package supports transforms of scalars as well as spin-1 and spin-2 quantities, and can be used for a wide range of pixelisations (including HEALPix, GLESP and ECP). It will take advantage of hardware features like multiple processor cores and floating-point vector operations, if available. Even without this additional acceleration, the employed algorithms are among the most efficient (in terms of CPU time as well as memory consumption) currently being used in the astronomical community. The library is written in strictly standard-conforming C90, ensuring portability to many different hard- and software platforms, and allowing straightforward integration with codes written in various programming languages like C, C++, Fortran, Python etc. Libpsht is distributed under the terms of the GNU General Public License (GPL) version 2 and can be downloaded from http://sourceforge.net/projects/libpsht.Comment: 9 pages, 8 figures, accepted by A&

arXiv.org e-Print Archive

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

MPG.PuRe

Libsharp - spherical harmonic transforms revisited

Author: Reinecke Martin
Seljebotn Dag Sverre
Publication venue: 'EDP Sciences'
Publication date: 22/04/2013
Field of study

We present libsharp, a code library for spherical harmonic transforms (SHTs), which evolved from the libpsht library, addressing several of its shortcomings, such as adding MPI support for distributed memory systems and SHTs of fields with arbitrary spin, but also supporting new developments in CPU instruction sets like the Advanced Vector Extensions (AVX) or fused multiply-accumulate (FMA) instructions. The library is implemented in portable C99 and provides an interface that can be easily accessed from other programming languages such as C++, Fortran, Python etc. Generally, libsharp's performance is at least on par with that of its predecessor; however, significant improvements were made to the algorithms for scalar SHTs, which are roughly twice as fast when using the same CPU capabilities. The library is available at http://sourceforge.net/projects/libsharp/ under the terms of the GNU General Public License

arXiv.org e-Print Archive

CiteSeerX

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

MPG.PuRe

On the Porting and Optimisation of Physics Simulations for Heterogeneous Parallel Processors

Author: Martineau Matt J
Publication venue
Publication date: 25/06/2019
Field of study

Explore Bristol Research