Search CORE

20 research outputs found

Benefits from using mixed precision computations in the ELPA-AEO and ESSEX-II eigensolver projects

We first briefly report on the status and recent achievements of the ELPA-AEO (Eigenvalue Solvers for Petaflop Applications - Algorithmic Extensions and Optimizations) and ESSEX II (Equipping Sparse Solvers for Exascale) projects. In both collaboratory efforts, scientists from the application areas, mathematicians, and computer scientists work together to develop and make available efficient highly parallel methods for the solution of eigenvalue problems. Then we focus on a topic addressed in both projects, the use of mixed precision computations to enhance efficiency. We give a more detailed description of our approaches for benefiting from either lower or higher precision in three selected contexts and of the results thus obtained

arXiv.org e-Print Archive

Institute of Transport Research:Publications

MPG.PuRe

Performance analysis and comparison of parallel eigensolvers on Blue Gene architectures

Author: Münchhalfen Jan Felix
Publication venue: Forschungszentrum Jülich GmbH Zentralbibliothek, Verlag
Publication date: 01/01/2013
Field of study

The solution of eigenproblems with dense, symmetric system matrices is a core task in many felds of computational science and engineering. As the problem complexity and thus the size of the matrices involved increases, the application of distributed memory supercomputer architectures and parallel algorithms becomes inevitable. Nearly all modern algorithms for eigensolving implement a tridiagonal reduction of the eigenproblem system matrix and a subsequentsolution of the tridigonalized eigenproblem. Additionally, back transformation of the eigenvectors is required if these are of interest. In the context of this thesis, implementations of two basically different approaches to the parallelsolution of eigenproblems were benchmarked, reviewed and compared with particular regard to their performance on the Blue Gene/P and Blue Gene/Q supercomputers JUGENE and JUQUEEN at the Forschungszentrum Jülich: ELPA, which implements an optimized version of the divide and conquer algorithm and Elemental which utilizes the PMRRR implementation of the MR3 algorithm. ELPA features two fundamentally different kinds of tridiagonalization, the standard one-stage and a two-stage approach. The comparision of thetwo-stage to the direct reduction was a primary concern in the performance analysis

Juelich Shared Electronic Resources

Roadmap on Electronic Structure Codes in the Exascale Era

Electronic structure calculations have been instrumental in providing many important insights into a range of physical and chemical properties of various molecular and solid-state systems. Their importance to various fields, including materials science, chemical sciences, computational chemistry and device physics, is underscored by the large fraction of available public supercomputing resources devoted to these calculations. As we enter the exascale era, exciting new opportunities to increase simulation numbers, sizes, and accuracies present themselves. In order to realize these promises, the community of electronic structure software developers will however first have to tackle a number of challenges pertaining to the efficient use of new architectures that will rely heavily on massive parallelism and hardware accelerators. This roadmap provides a broad overview of the state-of-the-art in electronic structure calculations and of the various new directions being pursued by the community. It covers 14 electronic structure codes, presenting their current status, their development priorities over the next five years, and their plans towards tackling the challenges and leveraging the opportunities presented by the advent of exascale computing.Comment: Submitted as a roadmap article to Modelling and Simulation in Materials Science and Engineering; Address any correspondence to Vikram Gavini ([email protected]) and Danny Perez ([email protected]

arXiv.org e-Print Archive

DIAL UCLouvain

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Roadmap on Electronic Structure Codes in the Exascale Era

Author: Baroni S.
Blum V.
Bowler D.
Buccheri A.
Chelikowsky J.
Das S.
Dawson W.
Delugas P.
Dogan M.
Draxl C.
Galli G.
Gavini V.
Genovese L.
Giannozzi P.
Giantomassi M.
Gonze X.
Govoni M.
Gulans A.
Gygi F.
Herbert J.
Kokott S.
Kühne T.
Liou K.
Miyazaki T.
Motamarri P.
Nakata A.
Pask J.
Perez D.
Plessl C.
Ratcliff L.
Richard R.
Rossi M.
Schade R.
Scheffler M.
Schütt O.
Suryanarayana P.
Torrent M.
Truflandier L.
Windus T.
Xu Q.
Yu V.
Publication venue
Publication date: 26/09/2022
Field of study

UCL Discovery

MPG.PuRe

Roadmap on Electronic Structure Codes in the Exascale Era

Author: Baroni S.
Blum V.
Bowler D.
Buccheri A.
Chelikowsky J.
Das S.
Dawson W.
Delugas P.
Dogan M.
Draxl C.
Galli G.
Gavini V.
Genovese L.
Giannozzi P.
Giantomassi M.
Gonze X.
Govoni M.
Gulans A.
Gygi F.
Herbert J.
Kokott S.
Kühne T.
Liou K.
Miyazaki T.
Motamarri P.
Nakata A.
Pask J.
Perez D.
Plessl C.
Ratcliff L.
Richard R.
Rossi M.
Schade R.
Scheffler M.
Schütt O.
Suryanarayana P.
Torrent M.
Truflandier L.
Windus T.
Xu Q.
Yu V.
Publication venue
Publication date
Field of study

MPG.PuRe

Solving Large Dense Symmetric Eigenproblem on Hybrid Architectures

Author: Davidović Davor
Publication venue
Publication date: 01/01/2014
Field of study

Dense symmetric eigenproblem is one of the most significant problems in the numerical linear algebra that arises in numerous research fields such as bioinformatics, computational chemistry, and meteorology. In the past years, the problems arising in these fields become bigger than ever resulting in growing demands in both computational power as well as the storage capacities. In such problems, the eigenproblem becomes the main computational bottleneck for which solution is required an extremely high computational power. Modern computing architectures that can meet these growing demands are those that combine the power of the traditional multi-core processors and the general-purpose GPUs and are called hybrid systems. These systems exhibit very high performance when the data fits into the GPU memory ; however, if the volume of the data exceeds the total GPU memory, i.e. the data is out-of-core from the GPU perspective, the performance rapidly decreases. This dissertation is focused on the development of the algorithms that solve dense symmetric eigenproblems on the hybrid GPU-based architectures. In particular, it aims at developing the eigensolvers that exhibit very high performance even if a problem is out- of-core for the GPU. The developed out-of-core eigensolvers are evaluated and compared on real problems that arise in the simulation of molecular motions. In such problems the data, usually too large to fit into the GPU memory, are stored in the main memory and copied to the GPU memory in pieces. That approach results in the performance drop due to a slow interconnection and a high memory latency. To overcome this problem an approach that applies blocking strategy and re- designs the existing eigensolvers, in order to decrease the volume of data transferred and the number of memory transfers, is presented. This approach designs and implements a set of the block- oriented, communication-avoiding BLAS routines that overlap the data transfers with the number of computations performed. Next, these routines are applied to speed-up the following eigensolvers: the solver based on the multi-stage reduction to a tridiagonal form, the Krylov subspace-based method, and the spectral divide-and-conquer method. Although the out-of-core BLAS routines significantly improve the performance of these three eigensolvers, a careful re-design is required in order to tackle the solution of the large eigenproblems on the hybrid CPU-GPU systems. In the out-of-core multi-stage reduction approach, the factor that mostly influences the performance is the band size of the obtained band matrix. On the other hand, the Krylov subspace- based method, although it is based on the memory- bound BLAS-2 operations, is the fastest method if only a small subset of the eigenpairs is required. Finally, the spectral divide-and- conquer algorithm, which exhibits significantly higher arithmetic cost than the other two eigensolvers, achieves extremely high performance since it can be performed completely in terms of the compute-bound BLAS-3 operations. Furthermore, its high arithmetic cost is further reduced by exploiting the special structure of a matrix. Finally, the results presented in the dissertation show that the three out-of-core eigen- solvers, for a set of the specific macromolecular problems, significantly overcome the multi-core variants and attain high flops rate even if data do not fit into the GPU memory. This proves that it is possible to solve large eigenproblems on modest computing systems equipped with a single GPU

Full-text Institutional Repository of the Ruđer Bošković Institute

Scientific Application Requirements for Leadership Computing at the Exascale

Author
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date
Field of study

Crossref