Search CORE

699 research outputs found

Recommended from our members

Preparing sparse solvers for exascale computing.

Author: Anzt Hartwig
Boman Erik
Curfman McInnes Lois
Falgout Rob
Ghysels Pieter
Heroux Michael
Li Xiaoye
Meier Yang Ulrike
Rajamanickam Sivasankaran
Rupp Karl
Smith Barry
Tran Mills Richard
Yamazaki Ichitaro
Publication venue: eScholarship, University of California
Publication date: 01/03/2020
Field of study

Sparse solvers provide essential functionality for a wide variety of scientific applications. Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms. We address the demands of systems with thousands of high-performance node devices where exposing concurrency, hiding latency and creating alternative algorithms become essential. The efforts described here are works in progress, highlighting current success and upcoming challenges. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'

eScholarship - University of California

A Modular Approach to Performance, Portability and Productivity for 3D Wave Models

Author: Bilbao Stefan
Dubach Christophe
Gray Alan
Steuwer Michel
Stoltzfus Larisa
Publication venue
Publication date: 01/01/2017
Field of study

No abstract available

Edinburgh Research Explorer

Enlighten

A Language and Hardware Independent Approach to Quantum-Classical Computing

Author: Chen Mengsu
Dumitrescu Eugene F.
Feng Wu-chun
Humble Travis S.
Liakh Dmitry
McCaskey Alexander J.
Publication venue
Publication date: 01/01/2018
Field of study

Heterogeneous high-performance computing (HPC) systems offer novel architectures which accelerate specific workloads through judicious use of specialized coprocessors. A promising architectural approach for future scientific computations is provided by heterogeneous HPC systems integrating quantum processing units (QPUs). To this end, we present XACC (eXtreme-scale ACCelerator) --- a programming model and software framework that enables quantum acceleration within standard or HPC software workflows. XACC follows a coprocessor machine model that is independent of the underlying quantum computing hardware, thereby enabling quantum programs to be defined and executed on a variety of QPUs types through a unified application programming interface. Moreover, XACC defines a polymorphic low-level intermediate representation, and an extensible compiler frontend that enables language independent quantum programming, thus promoting integration and interoperability across the quantum programming landscape. In this work we define the software architecture enabling our hardware and language independent approach, and demonstrate its usefulness across a range of quantum computing models through illustrative examples involving the compilation and execution of gate and annealing-based quantum programs

arXiv.org e-Print Archive

Directory of Open Access Journals

Development of a low-level, algebra-based library to provide platform portability on hybrid supercomputers

Author: Alsalti Baldellou Àdel
Colomer Rey Guillem
Gorobets Andrei
Oliva Llena Asensio
Trias Miquel Francesc Xavier
Álvarez Farré Xavier
Publication venue: Scipedia
Publication date: 01/01/2022
Field of study

Continuous enhancement in hardware technologies enables scientific computing to advance incessantly and reach further aims. Since the start of the global race for exascale high-performance computing, massively-parallel devices of various architectures have been incorporated into the newest supercomputers, leading to an increasing hybridization of compute nodes. In this context of accelerated innovation, software portability and efficiency become crucial. Traditionally, scientific computing software development using mesh methods is based on calculations in iterative stencil loops over a discretized geometry—the mesh. Despite being intuitive and versatile, the interdependency between algorithms and their computational implementations in stencil applications usually results in a large number of subroutines and introduces an inevitable complexity when it comes to portability and sustainability. An alternative is to break the interdependency between the algorithm and its implementation, and then to cast the calculations into a minimalist set of kernels. Algebra-based implementations rely on a reduced set of basic linear algebra subroutines, which simplifies the deployment of software in hybrid computing systems. In this work, we tackle the development of a fully-portable, algebraic library that can be coupled beneath other high-level, algebra-oriented framework. Namely, this library provides platform portability in the simplest possible manner (i.e., the user develops applications in a purely sequential style). Internally, algebraic objects are distributed among computing devices using a multilevel decomposition approach. Data exchanges between computing units or between nodes are hidden by a multithreaded overlapping scheme.The work of X.A.F, A.A.B, A.O., and F.X.T. has been financially supported by the following R+D projects: RETOtwin (PDC2021-120970-I00), given by MCIN/AEI/10.13039/501100011033 and European Union Next Generation EU/PRTR, FusionCAT (001-P-001722), given by Generalitat de Catalunya RIS3CAT-FEDER. X. A. F. has also been supported by a predoctoral contract (2019FI B2-00076) by the Government of Catalonia. A.A.B has also been supported by the predoctoral grants DIN2018-010061 and 2019-DI-90, given by MCIN/AEI/10.13039/501100011033 and the Catalan Agency for Management of University and Research Grants (AGAUR), respectively. The work of A. G. has been funded by the Russian Science Foundation, project 19-11-00299. The studies of this work have been carried out using computational resources of the Barcelona Supercomputing Center (IM-2020-3-0030 and IM-2022-1-0015). The authors thankfully acknowledge these institutions.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Towards Accelerating High-Order Stencils on Modern GPUs and Emerging Architectures with a Portable Framework

Author: Araya-Polo Mauricio
Mellor-Crummey John
Sai Ryuichi
Xu Jinfan
Publication venue
Publication date: 08/09/2023
Field of study

PDE discretization schemes yielding stencil-like computing patterns are commonly used for seismic modeling, weather forecast, and other scientific applications. Achieving HPC-level stencil computations on one architecture is challenging, porting to other architectures without sacrificing performance requires significant effort, especially in this golden age of many distinctive architectures. To help developers achieve performance, portability, and productivity with stencil computations, we developed StencilPy. With StencilPy, developers write stencil computations in a high-level domain-specific language, which promotes productivity, while its backends generate efficient code for existing and emerging architectures, including NVIDIA, AMD, and Intel GPUs, A64FX, and STX. StencilPy demonstrates promising performance results on par with hand-written code, maintains cross-architectural performance portability, and enhances productivity. Its modular design enables easy configuration, customization, and extension. A 25-point star-shaped stencil written in StencilPy is one-quarter of the length of a hand-crafted CUDA code and achieves similar performance on an NVIDIA H100 GPU

arXiv.org e-Print Archive