Search CORE

6 research outputs found

Hybrid CPU-GPU generation of the Hamiltonian and overlap matrices in FLAPW methods

Author: D Sholl
E Napoli Di
E Wimmer
F Nogueira
HJF Jansen
W Kohn
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

In this paper we focus on the integration of high-performance numerical libraries in ab initio codes and the portability of performance and scalability. The target of our work is FLEUR, a software for electronic structure calculations developed in the Forschungszentrum J\"ulich over the course of two decades. The presented work follows up on a previous effort to modernize legacy code by re-engineering and rewriting it in terms of highly optimized libraries. We illustrate how this initial effort to get efficient and portable shared-memory code enables fast porting of the code to emerging heterogeneous architectures. More specifically, we port the code to nodes equipped with multiple GPUs. We divide our study in two parts. First, we show considerable speedups attained by minor and relatively straightforward code changes to off-load parts of the computation to the GPUs. Then, we identify further possible improvements to achieve even higher performance and scalability. On a system consisting of 16-cores and 2 GPUs, we observe speedups of up to 5x with respect to our optimized shared-memory code, which in turn means between 7.5x and 12.5x speedup with respect to the original FLEUR code

arXiv.org e-Print Archive

Crossref

Full-text Institutional Repository of the Ruđer Bošković Institute

Juelich Shared Electronic Resources

Accelerating the computation of FLAPW methods on heterogeneous architectures

Author: Auckenthaler
Deserno
Di Napoli
Fabregat-Traver
Fabregat-Traver
Fiolhais
Jansen
Kohn
Marek
Sholl
Tomov
Wimmer
Publication venue: 'Wiley'
Publication date: 19/12/2017
Field of study

Legacy codes in computational science and engineering have been very successful in providing essential functionality to researchers. However, they are not capable of exploiting the massive parallelism provided by emerging heterogeneous architectures. The lack of portable performance and scalability puts them at high risk, ie, either they evolve or they are destined to be executed on older platforms and small clusters. One example of a legacy code which would heavily benefit from a modern redesign is FLEUR, a software for electronic structure calculations. In previous work, the computational bottleneck of FLEUR was partially re-engineered to have a modular design that relies on standard building blocks, namely, BLAS and LAPACK libraries. In this paper, we demonstrate how the initial redesign enables the portability to heterogeneous architectures. More specifically, we study different approaches to port the code to architectures consisting of multi-core CPUs equipped with one or more coprocessors such as Nvidia GPUs and Intel Xeon Phis. Our final code attains over 70% of the architectures' peak performance and outperforms Nvidia's and Intel's libraries. On JURECA, the large tier-0 cluster where FLEUR is often executed, the code takes advantage of the full power of the computing nodes, attaining 5× speedup over the sole use of the CPUs

arXiv.org e-Print Archive

Crossref

Full-text Institutional Repository of the Ruđer Bošković Institute

Publikationsserver der RWTH Aachen University

Juelich Shared Electronic Resources

The LAPW method with eigendecomposition based on the Hari--Zimmermann generalized hyperbolic SVD

Author: Di Napoli Edoardo
Novaković Vedran
Singer Sanja
Čaklović Gayatri
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2020
Field of study

In this paper we propose an accurate, highly parallel algorithm for the generalized eigendecomposition of a matrix pair

(H, S)

, given in a factored form

(F^{\ast} J F, G^{\ast} G)

. Matrices

H

and

S

are generally complex and Hermitian, and

S

is positive definite. This type of matrices emerges from the representation of the Hamiltonian of a quantum mechanical system in terms of an overcomplete set of basis functions. This expansion is part of a class of models within the broad field of Density Functional Theory, which is considered the golden standard in condensed matter physics. The overall algorithm consists of four phases, the second and the fourth being optional, where the two last phases are computation of the generalized hyperbolic SVD of a complex matrix pair

(F,G)

, according to a given matrix

J

defining the hyperbolic scalar product. If

J = I

, then these two phases compute the GSVD in parallel very accurately and efficiently.Comment: The supplementary material is available at https://web.math.pmf.unizg.hr/mfbda/papers/sm-SISC.pdf due to its size. This revised manuscript is currently being considered for publicatio

arXiv.org e-Print Archive

Publikationsserver der RWTH Aachen University

Juelich Shared Electronic Resources

GPAW: open Python package for electronic-structure calculations

Author: Bligaard Thomas
Boland Tara Maria
Chen Xi
Dohn Asmus Ougaard
Dulak Marcin
Enkovaara Jussi
Erhart Paul
Fojt Jakub
Gjerding Morten
Haldar Anubhab
Hermes Eric D.
Heske Julian
Häkkinen Hannu
Ivanov Aleksei V.
Jacobsen Karsten Wedel
Jónsson Elvar Örn
Jónsson Hannes
Kaappa Sami
Kangsabanik Jiban
Kastlunger Georg
Kuisma Mikael
Larsen Ask Hjorth
Latini Simone
Lehtomäki Jouko
Levi Gianluca
Lopez-Acevedo Olga
Louhivuori Martti
Maxson Tristan
Melander Marko M.
Mortensen Jens Jørgen
Nilsson Fredrik Andreas
Olsen Thomas
Ovesen Martin
Peterson Andrew
Rossi Tuomas
Schiøtz Jakob
Schmerwitz Yorick Leonard A.
Schäfer Christian
Skovhus Thorbjørn
Susi Toma
Sødequist Joachim
Taghizadeh Alireza
Thygesen Kristian Sommer
Walter Michael
Warmbier Robert
Winther Kirsten Trøstrup
Würdemann Rolf
Publication venue
Publication date: 23/10/2023
Field of study

We review the GPAW open-source Python package for electronic structure calculations. GPAW is based on the projector-augmented wave method and can solve the self-consistent density functional theory (DFT) equations using three different wave-function representations, namely real-space grids, plane waves, and numerical atomic orbitals. The three representations are complementary and mutually independent and can be connected by transformations via the real-space grid. This multi-basis feature renders GPAW highly versatile and unique among similar codes. By virtue of its modular structure, the GPAW code constitutes an ideal platform for implementation of new features and methodologies. Moreover, it is well integrated with the Atomic Simulation Environment (ASE) providing a flexible and dynamic user interface. In addition to ground-state DFT calculations, GPAW supports many-body GW band structures, optical excitations from the Bethe-Salpeter Equation (BSE), variational calculations of excited states in molecules and solids via direct optimization, and real-time propagation of the Kohn-Sham equations within time-dependent DFT. A range of more advanced methods to describe magnetic excitations and non-collinear magnetism in solids are also now available. In addition, GPAW can calculate non-linear optical tensors of solids, charged crystal point defects, and much more. Recently, support of GPU acceleration has been achieved with minor modifications of the GPAW code thanks to the CuPy library. We end the review with an outlook describing some future plans for GPAW

arXiv.org e-Print Archive

Commemorative Issue in Honor of Professor Karlheinz Schwarz on the Occasion of His 80th Birthday

Author
Publication venue: 'MDPI AG'
Publication date: 16/09/2022
Field of study

A collection of 18 scientific papers written in honor of Professor Karlheinz Schwarz's 80th birthday. The main topics include spectroscopy, excited states, DFT developments, results analysis, solid states, and surfaces

Directory of Open Access Books (DOAB)

Performance Modeling and Prediction for Dense Linear Algebra

Author: Peise Elmar
Publication venue
Publication date: 01/01/2017
Field of study

This dissertation introduces measurement-based performance modeling and prediction techniques for dense linear algebra algorithms. As a core principle, these techniques avoid executions of such algorithms entirely, and instead predict their performance through runtime estimates for the underlying compute kernels. For a variety of operations, these predictions allow to quickly select the fastest algorithm configurations from available alternatives. We consider two scenarios that cover a wide range of computations: To predict the performance of blocked algorithms, we design algorithm-independent performance models for kernel operations that are generated automatically once per platform. For various matrix operations, instantaneous predictions based on such models both accurately identify the fastest algorithm, and select a near-optimal block size. For performance predictions of BLAS-based tensor contractions, we propose cache-aware micro-benchmarks that take advantage of the highly regular structure inherent to contraction algorithms. At merely a fraction of a contraction's runtime, predictions based on such micro-benchmarks identify the fastest combination of tensor traversal and compute kernel

arXiv.org e-Print Archive

Publikationsserver der RWTH Aachen University