Search CORE

1,718 research outputs found

Parallel Spherical Harmonic Transforms on heterogeneous architectures (GPUs/multi-core CPUs)

Author: Esterie Pierre
Falcou Joel
Grigori Laura
Stompor R.
Szydlarski Mikolaj
Publication venue
Publication date: 15/05/2012
Field of study

Spherical Harmonic Transforms (SHT) are at the heart of many scientific and practical applications ranging from climate modelling to cosmological observations. In many of these areas new, cutting-edge science goals have been recently proposed requiring simulations and analyses of experimental or observational data at very high resolutions and of unprecedented volumes. Both these aspects pose formidable challenge for the currently existing implementations of the transforms. This paper describes parallel algorithms for computing SHT with two variants of intra-node parallelism appropriate for novel supercomputer architectures, multi-core processors and Graphic Processing Units (GPU). It also discusses their performance, alone and embedded within a top-level, MPI-based parallelisation layer ported from the S2HAT library, in terms of their accuracy, overall efficiency and scalability. We show that our inverse SHT run on GeForce 400 Series GPUs equipped with latest CUDA architecture ("Fermi") outperforms the state of the art implementation for a multi-core processor executed on a current Intel Core i7-2600K. Furthermore, we show that an MPI/CUDA version of the inverse transform run on a cluster of 128 Nvidia Tesla S1070 is as much as 3 times faster than the hybrid MPI/OpenMP version executed on the same number of quad-core processors Intel Nahalem for problem sizes motivated by our target applications. Performance of the direct transforms is however found to be at the best comparable in these cases. We discuss in detail the algorithmic solutions devised for major steps involved in the transforms calculation, emphasising those with a major impact on their overall performance, and elucidates the sources of the dichotomy between the direct and the inverse operations

arXiv.org e-Print Archive

HAL-CentraleSupelec

HAL - Lille 3

HAL-IN2P3

INRIA a CCSD electronic archive server

Using hybrid GPU/CPU kernel splitting to accelerate spherical convolutions

Author: Elsner Franz
Sutter P. M.
Wandelt Benjamin D.
Publication venue
Publication date: 24/03/2015
Field of study

We present a general method for accelerating by more than an order of magnitude the convolution of pixelated functions on the sphere with a radially-symmetric kernel. Our method splits the kernel into a compact real-space component and a compact spherical harmonic space component. These components can then be convolved in parallel using an inexpensive commodity GPU and a CPU. We provide models for the computational cost of both real-space and Fourier space convolutions and an estimate for the approximation error. Using these models we can determine the optimum split that minimizes the wall clock time for the convolution while satisfying the desired error bounds. We apply this technique to the problem of simulating a cosmic microwave background (CMB) anisotropy sky map at the resolution typical of the high resolution maps produced by the Planck mission. For the main Planck CMB science channels we achieve a speedup of over a factor of ten, assuming an acceptable fractional rms error of order 1.e-5 in the power spectrum of the output map.Comment: 9 pages, 11 figures, 1 table, accepted by Astronomy & Computing w/ minor revisions. arXiv admin note: substantial text overlap with arXiv:1211.355

arXiv.org e-Print Archive

HAL-INSU

Multi-Architecture Monte-Carlo (MC) Simulation of Soft Coarse-Grained Polymeric Materials: SOft coarse grained Monte-carlo Acceleration (SOMA)

Author: Müller Marcus
Schneider Ludwig
Publication venue: 'Elsevier BV'
Publication date: 13/01/2018
Field of study

Multi-component polymer systems are important for the development of new materials because of their ability to phase-separate or self-assemble into nano-structures. The Single-Chain-in-Mean-Field (SCMF) algorithm in conjunction with a soft, coarse-grained polymer model is an established technique to investigate these soft-matter systems. Here we present an im- plementation of this method: SOft coarse grained Monte-carlo Accelera- tion (SOMA). It is suitable to simulate large system sizes with up to billions of particles, yet versatile enough to study properties of different kinds of molecular architectures and interactions. We achieve efficiency of the simulations commissioning accelerators like GPUs on both workstations as well as supercomputers. The implementa- tion remains flexible and maintainable because of the implementation of the scientific programming language enhanced by OpenACC pragmas for the accelerators. We present implementation details and features of the program package, investigate the scalability of our implementation SOMA, and discuss two applications, which cover system sizes that are difficult to reach with other, common particle-based simulation methods

arXiv.org e-Print Archive

Juelich Shared Electronic Resources

PGAS-FMM: Implementing a distributed fast multipole method using the X10 programming language

Author: Huber Thomas
Milthorpe Josh
Rendell Alistair
Publication venue: 'Wiley'
Publication date: 10/12/2015
Field of study

The fast multipole method (FMM) is a complex, multi-stage algorithm over a distributed tree data structure, with multiple levels of parallelism and inherent data locality. X10 is a modern partitioned global address space language with support for asynchr

The Australian National University

Parallel algorithms for atmospheric modelling

Author: Tett Simon F. B.
Publication venue: The University of Edinburgh
Publication date: 01/01/1992
Field of study

Edinburgh Research Archive

GPU-Based Data Processing for 2-D Microwave Imaging on MAST

Author: BITTNER R.
CASTRO R.
DAVIS W. M.
EIDIETIS N. W.
FREETHY S. J.
FREETHY S. J.
GARELLI N.
HUANG B. K.
LUJAN P.
MONTEIRO E.
NAVARRO C. A.
NAYLOR G. A.
NICKOLLS J.
OWENS J. D.
PELL O.
SALMON N. A.
SHEVCHENKO V. F.
SHEVCHENKO V. F.
THOMAS D. A.
THOUTI K.
URBAN J.
VAN CITTERT P. H.
VERMIJ E.
WYNTERS E.
XU C.
YANG L.
YUE X.
ZERNIKE F.
Publication venue: 'American Nuclear Society'
Publication date: 01/05/2016
Field of study

The Synthetic Aperture Microwave Imaging (SAMI) diagnostic is a Mega Amp Spherical Tokamak (MAST) diagnostic based at Culham Centre for Fusion Energy. The acceleration of the SAMI diagnostic data-processing code by a graphics processing unit is presented, demonstrating acceleration of up to 60 times compared to the original IDL (Interactive Data Language) data-processing code. SAMI will now be capable of intershot processing allowing pseudo-real-time control so that adjustments and optimizations can be made between shots. Additionally, for the first time the analysis of many shots will be possible

Crossref

White Rose Research Online

Parallelization of Hierarchical Matrix Algorithms for Electromagnetic Scattering Problems

Author: Ancourt C.
Francavilla M.A.
Giordanengo G.
Grelck C.
Kessler C.
Larsson E.
Righero M.
Vecchi G.
Vipiana F.
Zafari A.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

International Migration, Integration and Social Cohesion online publications