Search CORE

8,105 research outputs found

A highly scalable parallel implementation of balancing domain decomposition by constraints

Author: Badia Santiago
Martín Alberto F.
Principe Javier
Publication venue
Publication date: 01/01/2020
Field of study

In this work we propose a novel parallelization approach of two-level balancing domain decomposition by constraints preconditioning based on overlapping of fine-grid and coarse-grid duties in time. The global set of MPI tasks is split into those that have fine-grid duties and those that have coarse-grid duties, and the different computations and communications in the algorithm are then re-scheduled and mapped in such a way that the maximum degree of overlapping is achieved while preserving data dependencies among them. In many ranges of interest, the extra cost associated to the coarse-grid problem can be fully masked by fine-grid related computations (which are embarrassingly parallel). Apart from discussing code implementation details, the paper also presents a comprehensive set of numerical experiments, that includes weak scalability analyses, with structured and unstructured meshes, and exact and inexact solvers for the 3D Poisson and linear elasticity problems on a pair of state-of-the-art multicore-based distributed-memory machines. This experimental study reveals remarkable weak scalability in the solution of problems with thousands of millions of unknowns on several tens of thousands of computational cores

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Scipedia

Balancing domain decomposition by constraints and perturbation

Author: Badia Santiago
Nguyen Hieu Trung
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2016
Field of study

In this paper, we formulate and analyze a perturbed formulation of the balancing domain decomposition by constraints (BDDC) method. We prove that the perturbed BDDC has the same polylogarithmic bound for the condition number as the standard formulation. Two types of properly scaled zero-order perturbations are considered: one uses a mass matrix, and the other uses a Robin-type boundary condition, i.e, a mass matrix on the interface. With perturbation, the wellposedness of the local Neumann problems and the global coarse problem is automatically guaranteed, and coarse degrees of freedom can be defined only for convergence purposes but not well-posedness. This allows a much simpler implementation as no complicated corner selection algorithm is needed. Minimal coarse spaces using only face or edge constraints can also be considered. They are very useful in extreme scale calculations where the coarse problem is usually the bottleneck that can jeopardize scalability. The perturbation also adds extra robustness as the perturbed formulation works even when the constraints fail to eliminate a small number of subdomain rigid body modes from the standard BDDC space. This is extremely important when solving problems on unstructured meshes partitioned by automatic graph partitioners since arbitrary disconnected subdomains are possible. Numerical results are provided to support the theoretical findings.Peer ReviewedPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

On the scalability of inexact balancing domain decomposition by constraints with overlapped coarse/fine corrections

Author: Badia Santiago
Martín Alberto F.
Principe Javier
Publication venue
Publication date: 01/01/2020
Field of study

In this work, we analyze the scalability of inexact two-level balancing domain decomposition by constraints (BDDC) preconditioners for Krylov subspace iterative solvers, when using a highly scalable asynchronous parallel implementation where fine and coarse correction computations are overlapped in time. This way, the coarse-grid problem can be fully overlapped by fine-grid computations (which are embarrassingly parallel) in a wide range of cases. Further, we consider inexact solvers to reduce the computational cost/complexity and memory consumption of coarse and local problems and boost the scalability of the solver. Out of our numerical experimentation, we conclude that the BDDC preconditioner is quite insensitive to inexact solvers. In particular, one cycle of algebraic multigrid (AMG) is enough to attain algorithmic scalability. Further, the clear reduction of computing time and memory requirements of inexact solvers compared to sparse direct ones makes possible to scale far beyond state-of-the-art BDDC implementations. Excellent weak scalability results have been obtained with the proposed inexact/overlapped implementation of the two-level BDDC preconditioner, up to 93,312 cores and 20 billion unknowns on JUQUEEN. Further, we have also applied the proposed setting to unstructured meshes and partitions for the pressure Poisson solver in the backward-facing step benchmark domain

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Scipedia

Adaptive FETI-DP and BDDC methods for highly heterogeneous elliptic finite element problems in three dimensions

Author: Kühn Martin Joachim
Publication venue
Publication date: 16/02/2018
Field of study

Numerical methods are often well-suited for the solution of (elliptic) partial differential equations (PDEs) modeling naturally occuring processes. Many different solvers can be applied to systems which are obtained after discretization by the finite element method. Parallel architectures in modern computers facilitate the efficient use of diverse divide and conquer strategies. The intuitive approach, to divide a large (global) problem into subproblems, which are then solved in parallel, can significantly reduce the solution time. It is obvious that the solvers on the local subproblems then should deliver the contributions of the global solution restricted to the subdomains of computational region. The class of domain decomposition methods provides widely-used iterative algorithms for the parallel solution of implicit finite element problems. Often, an additional coarse space, which introduces a coupling between the subdomains, is used to ensure a global transport of information between the subdomains across the entire domain. The FETI-DP and BDDC domain decomposition methods are highly scalable parallel algorithms. However, when the parameter or coefficient distribution in the underlying partial differential equation becomes highly heterogeneous, classical methods, with a priori chosen coarse spaces, might not converge in a limited number of iterations. A remedy is offered by problem-dependent coarse spaces. These coarse spaces can be provided by adaptive methods, which then can improve the convergence at the cost of additional constraints. In this thesis, we introduce robust FETI-DP and BDDC methods for three-dimensional problems. These methods incorporate constraints, which are computed from local eigenvalue problems on faces and edges between subdomains, into the coarse space. The implementation of the constraints is performed by a deflation or balancing approach or by partial finite element assembly after a transformation of basis. For the latter, we introduce the generalized transformation-of-basis approach and show its correspondence to a deflation or balancing approach. An efficient parallel implementation of adaptive FETI-DP is discussed in the last part of this thesis. We provide weak and strong parallel scalability results for our adaptive algorithm executed on the supercomputer magnitUDE of the University of Duisburg-Essen. For weak scaling, we can show very good results up to 4,096 cores. We can also present very good strong scaling results up to 864 cores

Kölner UniversitätsPublikationsServer

Tackling Exascale Software Challenges in Molecular Dynamics Simulations with GROMACS

Author: A Arnold
A Faradjian
B Hess
C Schütte
G Wilson
JA Anderson
JC Phillips
KJ Bowers
KJ Bowers
L Verlet
M Eleftheriou
M Shirts
MJ Abraham
P Eastman
R Yokota
S Pronk
S Páll
U Essmann
W Humphrey
WM Brown
Y Andoh
Y Sugita
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

GROMACS is a widely used package for biomolecular simulation, and over the last two decades it has evolved from small-scale efficiency to advanced heterogeneous acceleration and multi-level parallelism targeting some of the largest supercomputers in the world. Here, we describe some of the ways we have been able to realize this through the use of parallelization on all levels, combined with a constant focus on absolute performance. Release 4.6 of GROMACS uses SIMD acceleration on a wide range of architectures, GPU offloading acceleration, and both OpenMP and MPI parallelism within and between nodes, respectively. The recent work on acceleration made it necessary to revisit the fundamental algorithms of molecular simulation, including the concept of neighborsearching, and we discuss the present and future challenges we see for exascale simulation - in particular a very fine-grained task parallelism. We also discuss the software management, code peer review and continuous integration testing required for a project of this complexity.Comment: EASC 2014 conference proceedin

arXiv.org e-Print Archive

Publikationer från KTH

Crossref

Digitala Vetenskapliga Arkivet - Academic Archive On-line

MPG.PuRe

Segregated Runge–Kutta time integration of convection-stabilized mixed finite element schemes for wall-unresolved LES of incompressible flows

Author: Badia Santiago
Colomés Gené Oriol
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

In this work, we develop a high-performance numerical framework for the large eddy simulation (LES) of incompressible flows. The spatial discretization of the nonlinear system is carried out using mixed finite element (FE) schemes supplemented with symmetric projection stabilization of the convective term and a penalty term for the divergence constraint. These additional terms introduced at the discrete level have been proved to act as implicit LES models. In order to perform meaningful wall-unresolved simulations, we consider a weak imposition of the boundary conditions using a Nitsche’s-type scheme, where the tangential component penalty term is designed to act as a wall law. Next, segregated Runge–Kutta (SRK) schemes (recently proposed by the authors for laminar flow problems) are applied to the LES simulation of turbulent flows. By the introduction of a penalty term on the trace of the acceleration, these methods exhibit excellent stability properties for both implicit and explicit treatment of the convective terms. SRK schemes are excellent for large-scale simulations, since they reduce the computational cost of the linear system solves by splitting velocity and pressure computations at the time integration level, leading to two uncoupled systems. The pressure system is a Darcy-type problem that can easily be preconditioned using a traditional block-preconditioning scheme that only requires a Poisson solver. At the end, only coercive systems have to be solved, which can be effectively preconditioned by multilevel domain decomposition schemes, which are both optimal and scalable. The framework is applied to the Taylor–Green and turbulent channel flow benchmarks in order to prove the accuracy of the convection-stabilized mixed FEs as LES models and SRK time integrators. The scalability of the preconditioning techniques (in space only) has also been proven for one step of the SRK scheme for the Taylor–Green flow using uniform meshes. Moreover, a turbulent flow around a NACA profile is solved to show the applicability of the proposed algorithms for a realistic problem.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

A scalable parallel finite element framework for growing geometries. Application to metal additive manufacturing

Author: Ayachit U
Burstedde C
Carslaw HS
Cole KD
Ern A
Kaufman L
Kergaßner A
Lindgren LE
Mozaffar M
Schroeder WJ
Wohlers Associates Inc
Publication venue: 'Wiley'
Publication date: 01/01/2019
Field of study

This work introduces an innovative parallel, fully-distributed finite element framework for growing geometries and its application to metal additive manufacturing. It is well-known that virtual part design and qualification in additive manufacturing requires highly-accurate multiscale and multiphysics analyses. Only high performance computing tools are able to handle such complexity in time frames compatible with time-to-market. However, efficiency, without loss of accuracy, has rarely held the centre stage in the numerical community. Here, in contrast, the framework is designed to adequately exploit the resources of high-end distributed-memory machines. It is grounded on three building blocks: (1) Hierarchical adaptive mesh refinement with octree-based meshes; (2) a parallel strategy to model the growth of the geometry; (3) state-of-the-art parallel iterative linear solvers. Computational experiments consider the heat transfer analysis at the part scale of the printing process by powder-bed technologies. After verification against a 3D benchmark, a strong-scaling analysis assesses performance and identifies major sources of parallel overhead. A third numerical example examines the efficiency and robustness of (2) in a curved 3D shape. Unprecedented parallelism and scalability were achieved in this work. Hence, this framework contributes to take on higher complexity and/or accuracy, not only of part-scale simulations of metal or polymer additive manufacturing, but also in welding, sedimentation, atherosclerosis, or any other physical problem where the physical domain of interest grows in time

arXiv.org e-Print Archive

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

UPCommons. Portal del coneixement obert de la UPC

Scipedia