Search CORE

1,687 research outputs found

MPI+X: task-based parallelization and dynamic load balance of finite element assembly

Author: Artigues Antoni
Ferrer Roger
Garcia-Gasulla Marta
Houzeaux Guillaume
Labarta Jesús
López Victor
Vázquez Mariano
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2018
Field of study

The main computing tasks of a finite element code(FE) for solving partial differential equations (PDE's) are the algebraic system assembly and the iterative solver. This work focuses on the first task, in the context of a hybrid MPI+X paradigm. Although we will describe algorithms in the FE context, a similar strategy can be straightforwardly applied to other discretization methods, like the finite volume method. The matrix assembly consists of a loop over the elements of the MPI partition to compute element matrices and right-hand sides and their assemblies in the local system to each MPI partition. In a MPI+X hybrid parallelism context, X has consisted traditionally of loop parallelism using OpenMP. Several strategies have been proposed in the literature to implement this loop parallelism, like coloring or substructuring techniques to circumvent the race condition that appears when assembling the element system into the local system. The main drawback of the first technique is the decrease of the IPC due to bad spatial locality. The second technique avoids this issue but requires extensive changes in the implementation, which can be cumbersome when several element loops should be treated. We propose an alternative, based on the task parallelism of the element loop using some extensions to the OpenMP programming model. The taskification of the assembly solves both aforementioned problems. In addition, dynamic load balance will be applied using the DLB library, especially efficient in the presence of hybrid meshes, where the relative costs of the different elements is impossible to estimate a priori. This paper presents the proposed methodology, its implementation and its validation through the solution of large computational mechanics problems up to 16k cores

arXiv.org e-Print Archive

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

QPACE 2 and Domain Decomposition on the Intel Xeon Phi

Author: Arts Paul
Bloch Jacques
Georg Peter
Glaessle Benjamin
Heybrock Simon
Komatsubara Yu
Lohmayer Robert
Mages Simon
Mendl Bernhard
Meyer Nils
Parcianello Alessio
Pleiter Dirk
Rappl Florian
Rossi Mauro
Solbrig Stefan
Tecchiolli Giampietro
Wettig Tilo
Zanier Gianpaolo
Publication venue
Publication date: 01/01/2015
Field of study

We give an overview of QPACE 2, which is a custom-designed supercomputer based on Intel Xeon Phi processors, developed in a collaboration of Regensburg University and Eurotech. We give some general recommendations for how to write high-performance code for the Xeon Phi and then discuss our implementation of a domain-decomposition-based solver and present a number of benchmarks.Comment: plenary talk at Lattice 2014, to appear in the conference proceedings PoS(LATTICE2014), 15 pages, 9 figure

arXiv.org e-Print Archive

Juelich Shared Electronic Resources

JURECA: Data Centric and Booster Modules implementing the Modular Supercomputing Architecture at Jülich Supercomputing Centre

Author: Thörnig Philipp
von St. Vieth Benedikt
Publication venue: 'Forschungszentrum Julich, Zentralbibliothek'
Publication date: 01/01/2021
Field of study

JURECA is a Pre-Exascale Modular Supercomputer operated by Jülich Supercomputing Centre at Forschungszentrum Jülich. The system combines a flexible Data Centric (DC) module, based on the Atos BullSequana XH2000 with a selection of best-of-its-kind components, and a scalability-focused Booster module, delivered by Intel and Dell Technologies based on the Xeon Phi many-core processor. With its novel architecture, it supports a wide variety of high-performance computing and data analytics workloads

Journal of large-scale research facilities (JLSRF)

Juelich Shared Electronic Resources