Search CORE

9 research outputs found

GPU accelerated implementations of a generalized eigenvalue solver for Hermitian matrices with systematic energy and time to solution analysis

Author: Solcà Raffaele G.
Publication venue: ETH Zürich
Publication date: 01/01/2016
Field of study

Repository for Publications and Research Data

A Novel Hybrid CPU-GPU Generalized Eigensolver for Electronic Structure Calculations Based on Fine Grained Memory Aware Tasks

Author: Dongarra Jack
Haidar Azzam
Schulthess Thomas
Solcà Raffaele
Tomov Stanimire
Publication venue: 'SAGE Publications'
Publication date: 30/08/2013
Field of study

The adoption of hybrid CPU–GPU nodes in traditional supercomputing platforms such as the Cray-XK6 opens acceleration opportunities for electronic structure calculations in materials science and chemistry applications, where medium-sized generalized eigenvalue problems must be solved many times. These eigenvalue problems are too small to effectively solve on distributed systems, but can benefit from the massive computing power concentrated on a single-node, hybrid CPU–GPU system. However, hybrid systems call for the development of new algorithms that efficiently exploit heterogeneity and massive parallelism of not just GPUs, but of multicore/manycore CPUs as well. Addressing these demands, we developed a generalized eigensolver featuring novel algorithms of increased computational intensity (compared with the standard algorithms), decomposition of the computation into fine-grained memory aware tasks, and their hybrid execution. The resulting eigensolvers are state-of-the-art in high-performance computing, significantly outperforming existing libraries. We describe the algorithm and analyze its performance impact on applications of interest when different fractions of eigenvectors are needed by the host electronic structure code. </jats:p

Crossref

The University of Manchester - Institutional Repository

Performance optimizations for porting the openQ*D package to GPUs

Author: Gruber Roman
Kozhevnikov Anton
Marinkovic Marina
Schulthess Thomas C.
Solcà Raffaele
Publication venue: 'Sissa Medialab'
Publication date: 01/01/2022
Field of study

OpenQ*D code has been used by the RC* collaboration for the generation of fully dynamical QCD+QED gauge configurations with C* boundary conditions. In this talk, optimization of solvers provided with the openQ*D package relevant for porting the code on GPU-accelerated supercomputing platforms is discussed. We present the analysis of the current implementations of the GCR solver preconditioned with Schwarz alternating procedure for ill-conditioned Dirac-operators. With the goal of enabling support for GPUs from various vendors, a novel method of adaptive CPU/GPU-hybrid implementation is proposed.ISSN:1824-803

Repository for Publications and Research Data

Red-Blue Pebbling Revisited: Near Optimal Parallel Matrix-Matrix Multiplication

Author: Besta Maciej
Hoefler Torsten
Kabić Marko
Kwasniewski Grzegorz
Solcà Raffaele
VandeVondele Joost
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/11/2019
Field of study

We propose COSMA: a parallel matrix-matrix multiplication algorithm that is near communication-optimal for all combinations of matrix dimensions, processor counts, and memory sizes. The key idea behind COSMA is to derive an optimal (up to a factor of 0.03% for 10MB of fast memory) sequential schedule and then parallelize it, preserving I/O optimality. To achieve this, we use the red-blue pebble game to precisely model MMM dependencies and derive a constructive and tight sequential and parallel I/O lower bound proofs. Compared to 2D or 3D algorithms, which fix processor decomposition upfront and then map it to the matrix dimensions, it reduces communication volume by up to √ times. COSMA outperforms the established ScaLAPACK, CARMA, and CTF algorithms in all scenarios up to 12.8x (2.2x on average), achieving up to 88% of Piz Daint's peak performance. Our work does not require any hand tuning and is maintained as an open source implementation

arXiv.org e-Print Archive

Repository for Publications and Research Data

Crossref

DCA++ project: Sustainable and scalable development of a high-performance research code

Author: Balduzzi Giovanni
Doak Peter W.
Hähner Urs R.
Maier Thomas A.
Schulthess Thomas C.
Solcà Raffaele
Publication venue: 'IOP Publishing'
Publication date: 01/01/2019
Field of study

Scientific discoveries across all fields, from physics to biology, are increasingly driven by computer simulations. At the same time, the computational demand of many problems necessitates large-scale calculations on high-performance supercomputers. Developing and maintaining the underlying codes, however, has become a challenging task due to a combination of factors. Leadership computer systems require massive parallelism, while their architectures are diversifying. New sophisticated algorithms are continuously developed and have to be implemented efficiently for such complex systems. Finally, the multidisciplinary nature of modern science involves large, changing teams to work on a given codebase. Using the example of the DCA++ project, a highly scalable and efficient research code to solve quantum many-body problems, we explore how computational science can overcome these challenges by adopting modern software engineering approaches. We present our principles for scientific software development and describe concrete practices to meet them, adapted from agile software development frameworks.ISSN:1742-6588ISSN:1742-659

Repository for Publications and Research Data

DCA++: A software framework to solve correlated electron problems with modern quantum cluster methods

Author: Alvarez Gonzalo
Bulling Urs
Maier Thomas A.
Schulthess Thomas C.
Solcà Raffaele
Staar Peter
Summers Michael S.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

We present the first open release of the DCA++ project, a high-performance research software framework to solve quantum many-body problems with cutting edge quantum cluster algorithms. DCA++ implements the dynamical cluster approximation (DCA) and its DCA+ extension with a continuous self-energy. The algorithms capture nonlocal correlations in strongly correlated electron systems, thereby giving insight into high-Tc superconductivity. The code's scalability allows efficient usage of systems at all scales, from workstations to leadership computers. With regard to the increasing heterogeneity of modern computing machines, DCA++ provides portable performance on conventional and emerging new architectures, such as hybrid CPU–GPU, sustaining multiple petaflops on ORNL's Titan and CSCS’ Piz Daint supercomputers. Moreover, we show how sustainable and scalable development of the code base has been achieved by adopting standard techniques of the software industry. These include employing a distributed version control system, applying test-driven development and following continuous integration.ISSN:0010-4655ISSN:1879-294

Repository for Publications and Research Data