Search CORE

33 research outputs found

Optimization of condensed matter physics application with OpenMP tasking model

Author: Chatterjee Arghya
Criado Joel
Garcia Gasulla Marta
Hernández Óscar
Labarta Mancho Jesús José
Sirvent Raül
Álvarez Gonzalo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

The Density Matrix Renormalization Group (DMRG++) is a condensed matter physics application used to study superconductivity properties of materials. It’s main computations consist of calculating hamiltonian matrix which requires sparse matrix-vector multiplications. This paper presents task-based parallelization and optimization strategies of the Hamiltonian algorithm. The algorithm is implemented as a mini-application in C++ and parallelized with OpenMP. The optimization leverages tasking features, such as dependencies or priorities included in the OpenMP standard 4.5. The code refactoring targets performance as much as programmability. The optimized version achieves a speedup of 8.0 × with 8 threads and 20.5 × with 40 threads on a Power9 computing node while reducing the memory consumption to 90 MB with respect to the original code, by adding less than ten OpenMP directives.This work is partially supported by the Spanish Government through Programa Severo Ochoa (SEV2015-0493), by the Spanish Ministry of Science and Technology (project TIN2015-65316-P), by the Generalitat de Catalunya (contract 2017-SGR-1414) and by the BSC-IBM Deep Learning Research Agreement, under JSA “Application porting, analysis and optimization for POWER and POWER AI”. This work was partially supported by the Scientific Discovery through Advanced Computing (SciDAC) program funded by U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research and Basic Energy Sciences, Division of Materials Sciences and Engineering. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

Role-shifting threads: Increasing OpenMP malleability to address load imbalance at MPI and OpenMP

Author: Criado Ledesma Joel
Garcia Gasulla Marta
López Herrero Víctor
Ramirez Miranda Guillem
Teruel García Xavier
Vinyals Ylla Català Joan
Publication venue: SAGE Publications
Publication date: 01/10/2023
Field of study

This paper presents the evolution of the free agent threads for OpenMP to the new role-shifting threads model and their integration with the Dynamic Load Balancing (DLB) library. We demonstrate how free agent threads can improve resource utilization in OpenMP applications with load imbalance in their nested parallel regions. We also demonstrate how DLB efficiently manages the malleability exposed by the role-shifting threads to address load imbalance issues. We use three real-world scientific applications, one of them to demonstrate that free agents alone can improve the OpenMP model without external tools, and two other MPI+OpenMP applications, one of them with a coupling case, to illustrate the potential of the free agent threads’ malleability with an external resource manager to increase the efficiency of the system. In addition, we demonstrate that the new implementation is more usable than the former one, letting the runtime system automatically make decisions that were made by the programmer previously. All software is released open-source.This work has received funding from the DEEP Projects, at the European Commission’s FP7, H2020, and EuroHPC Programmes, under Grant Agreements 287530, 610476, 754304, and 955606. The PCI2021-121958 financed by the Spanish State Research Agency - Ministry of Science and Innovation. And it also has the support of the Spanish Ministry of Science and Innovation (Computacion de Altas Prestaciones VIII: PID2019-107255GB).Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Performance analysis and optimization of the FFTXlib on the Intel knights landing architecture

Author: Affinito Fabio
Cavazzoni Carlo
Gimenez Judit
Labarta Mancho Jesús José
López Victor
Morillo Julian
Wagner Michael
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

In this paper, we address the decreasing performance of the FFTXlib, the Fast Fourier Transformation (FFT) kernel of Quantum ESPRESSO, when scaling to a full KNL node. An increased performance in the FFTXlib will likewise increase the performance of the entire Quantum ESPRESSO code one of the most used plane-wave DFT codes in the community of material science. Our approach focuses on, first, overlapping computation and communication and, second, decreasing resource contention for higher compute efficiency. In order to achieve this we use the OmpSs programming model based on task dependencies. We allow overlapping of computation and communication by converting all steps of the FFT into tasks following a flow dependency. In the same way, we decrease resource contention by converting each FFT into an individual task that can be scheduled asynchronously. In both cases, multiple FFTs can be computed in parallel. The task-based optimizations are implemented in the FFTXlib and show up to 10% runtime reduction on the already highly optimized version. Since the task scheduling is done dynamically during execution by the parallel runtime, not statically by the user, it also frees the user from finding the ideal parallel configuration himself.We gratefully acknowledge the support of the MaX and POP projects, which have received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 676598 and 676553, respectively.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

Enabling task parallelism for many-core architectures

Author: Atkinson Patrick R
Publication venue
Publication date: 28/09/2021
Field of study

Explore Bristol Research

Quantification of 3D spatial correlations between state variables and distances to the grain boundary network in full-field crystal plasticity spectral method simulations

Author: Kühbach M.
Roters F.
Publication venue: 'IOP Publishing'
Publication date: 28/09/2019
Field of study

Deformation microstructure heterogeneities play a pivotal role during dislocation patterning and interface network restructuring. Thus, they affect indirectly how an alloy recrystallizes if at all. Given this relevance, it has become common practice to study the evolution of deformation microstructure heterogeneities with 3D experiments and full-field crystal plasticity computer simulations including tools such as the spectral method. Quantifying material point to grain or phase boundary distances, though, is a practical challenge with spectral method crystal plasticity models because these discretize the material volume rather than mesh explicitly the grain and phase boundary interface network. This limitation calls for the development of interface reconstruction algorithms which enable us to develop specific data post-processing protocols to quantify spatial correlations between state variable values at each material point and the points' corresponding distance to the closest grain or phase boundary. This work contributes to advance such post-processing routines. Specifically, two grain reconstruction and three distancing methods are developed to solve above challenge. The individual strengths and limitations of these methods surplus the efficiency of their parallel implementation is assessed with an exemplary DAMASK large scale crystal plasticity study. We apply the new tool to assess the evolution of subtle stress and disorientation gradients towards grain boundaries.Comment: Manuscript submitted to Modelling and Simulation in Materials Science and Engineerin

arXiv.org e-Print Archive

MPG.PuRe

The LAPW method with eigendecomposition based on the Hari--Zimmermann generalized hyperbolic SVD

Author: Di Napoli Edoardo
Novaković Vedran
Singer Sanja
Čaklović Gayatri
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2020
Field of study

In this paper we propose an accurate, highly parallel algorithm for the generalized eigendecomposition of a matrix pair

(H, S)

, given in a factored form

(F^{\ast} J F, G^{\ast} G)

. Matrices

H

and

S

are generally complex and Hermitian, and

S

is positive definite. This type of matrices emerges from the representation of the Hamiltonian of a quantum mechanical system in terms of an overcomplete set of basis functions. This expansion is part of a class of models within the broad field of Density Functional Theory, which is considered the golden standard in condensed matter physics. The overall algorithm consists of four phases, the second and the fourth being optional, where the two last phases are computation of the generalized hyperbolic SVD of a complex matrix pair

(F,G)

, according to a given matrix

J

defining the hyperbolic scalar product. If

J = I

, then these two phases compute the GSVD in parallel very accurately and efficiently.Comment: The supplementary material is available at https://web.math.pmf.unizg.hr/mfbda/papers/sm-SISC.pdf due to its size. This revised manuscript is currently being considered for publicatio

arXiv.org e-Print Archive

Publikationsserver der RWTH Aachen University

Juelich Shared Electronic Resources

Performance analysis and optimization of an HPC application: DMRG++

Author: Criado Ledesma Joel
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/07/2019
Field of study

DMRG++ (Density Matrix Renormalization Group) és una aplicació de física de la matèria condensada orientada a HPC, originalment desenvolupada per l'Oak Ridge National Laboratory (ORNL). En aquest projecte es treballarà en la millora de la part de càlcul intensiu de l'aplicació, fent ús d'una miniapp que encapsula aquesta secció crítica. Partint d'una implementació inicial amb OpenMP basada en diversos parallel for aniuats, s'exploraran diferents alternatives per millorar el temps d'execució i el consum de memòria mitjançant el model de tasques amb dependències d'OpenMP, tot fent servir una estratègia d'anàlisi de l'aplicació i de desenvolupament iterativa. D'aquesta manera, no només esperem contribuir amb la millora d'una aplicació científica, sinó també mostrar tècniques d'anàlisi efectives i estratègies de paral·lelització per a aplicacions amb distribucions de feina molt desiguals.DMRG++ (Density Matrix Renormalization Group) is a condensed matter physics application oriented to HPC, developed by Oak Ridge National Laboratory (ORNL). In this project, we will focus on improving the intensive arithmetic kernel of the application, using a miniapp that encapsulates this critical program part. Starting with an initial implementation with OpenMP, which uses several nested parallel for, we will explore different alternatives to improve its execution time and memory consumption through OpenMP task dependency model, taking advantage of an iterative strategy of in-depth application analysis and development. In this way, we are not just contributing by improving a scientific application, but also showing effective analysis techniques and best practices for programmability and parallelization focused on applications with irregular workloads

UPCommons. Portal del coneixement obert de la UPC

Modelling the behaviour of dense colloidal suspensions of cuboids. Effect of ordered crowding on dynamics and microrheology.

Author: Tonti Luca
Publication venue
Publication date: 01/08/2023
Field of study

The University of Manchester - Institutional Repository