381 research outputs found
Recommended from our members
Preparing sparse solvers for exascale computing.
Sparse solvers provide essential functionality for a wide variety of scientific applications. Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms. We address the demands of systems with thousands of high-performance node devices where exposing concurrency, hiding latency and creating alternative algorithms become essential. The efforts described here are works in progress, highlighting current success and upcoming challenges. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'
Extending a serial 3D two-phase CFD code to parallel execution over MPI by using the PETSc library for domain decomposition
To leverage the last two decades' transition in High-Performance Computing
(HPC) towards clusters of compute nodes bound together with fast interconnects,
a modern scalable CFD code must be able to efficiently distribute work amongst
several nodes using the Message Passing Interface (MPI). MPI can enable very
large simulations running on very large clusters, but it is necessary that the
bulk of the CFD code be written with MPI in mind, an obstacle to parallelizing
an existing serial code.
In this work we present the results of extending an existing two-phase 3D
Navier-Stokes solver, which was completely serial, to a parallel execution
model using MPI. The 3D Navier-Stokes equations for two immiscible
incompressible fluids are solved by the continuum surface force method, while
the location of the interface is determined by the level-set method.
We employ the Portable Extensible Toolkit for Scientific Computing (PETSc)
for domain decomposition (DD) in a framework where only a fraction of the code
needs to be altered. We study the strong and weak scaling of the resulting
code. Cases are studied that are relevant to the fundamental understanding of
oil/water separation in electrocoalescers.Comment: 8 pages, 6 figures, final version for to the CFD 2014 conferenc
An Experimental Study of Two-Level Schwarz Domain Decomposition Preconditioners on GPUs
The generalized Dryja--Smith--Widlund (GDSW) preconditioner is a two-level
overlapping Schwarz domain decomposition (DD) preconditioner that couples a
classical one-level overlapping Schwarz preconditioner with an
energy-minimizing coarse space. When used to accelerate the convergence rate of
Krylov subspace iterative methods, the GDSW preconditioner provides robustness
and scalability for the solution of sparse linear systems arising from the
discretization of a wide range of partial different equations. In this paper,
we present FROSch (Fast and Robust Schwarz), a domain decomposition solver
package which implements GDSW-type preconditioners for both CPU and GPU
clusters. To improve the solver performance on GPUs, we use a novel
decomposition to run multiple MPI processes on each GPU, reducing both solver's
computational and storage costs and potentially improving the convergence rate.
This allowed us to obtain competitive or faster performance using GPUs compared
to using CPUs alone. We demonstrate the performance of FROSch on the Summit
supercomputer with NVIDIA V100 GPUs, where we used NVIDIA Multi-Process Service
(MPS) to implement our decomposition strategy.
The solver has a wide variety of algorithmic and implementation choices,
which poses both opportunities and challenges for its GPU implementation. We
conduct a thorough experimental study with different solver options including
the exact or inexact solution of the local overlapping subdomain problems on a
GPU. We also discuss the effect of using the iterative variant of the
incomplete LU factorization and sparse-triangular solve as the approximate
local solver, and using lower precision for computing the whole FROSch
preconditioner. Overall, the solve time was reduced by factors of about
using GPUs, while the GPU acceleration of the numerical setup time
depend on the solver options and the local matrix sizes.Comment: Accepted for publication in IPDPS'2
Parallel accelerated cyclic reduction preconditioner for three-dimensional elliptic PDEs with variable coefficients
We present a robust and scalable preconditioner for the solution of
large-scale linear systems that arise from the discretization of elliptic PDEs
amenable to rank compression. The preconditioner is based on hierarchical
low-rank approximations and the cyclic reduction method. The setup and
application phases of the preconditioner achieve log-linear complexity in
memory footprint and number of operations, and numerical experiments exhibit
good weak and strong scalability at large processor counts in a distributed
memory environment. Numerical experiments with linear systems that feature
symmetry and nonsymmetry, definiteness and indefiniteness, constant and
variable coefficients demonstrate the preconditioner applicability and
robustness. Furthermore, it is possible to control the number of iterations via
the accuracy threshold of the hierarchical matrix approximations and their
arithmetic operations, and the tuning of the admissibility condition parameter.
Together, these parameters allow for optimization of the memory requirements
and performance of the preconditioner.Comment: 24 pages, Elsevier Journal of Computational and Applied Mathematics,
Dec 201
Recommended from our members
Randomized Computations for Efficient and Robust Finite Element Domain Decomposition Methods in Electromagnetics
Numerical modeling of electromagnetic (EM) phenomenon has proved to become an effective and efficient tool in design and optimization of modern electronic devices, integrated circuits (IC) and RF systems. However the generality, efficiency and reliability/resilience of the computational EM solver is often criticised due to the fact that the underlying characteristics of the simulated problems are usually different, which makes the development of a general, \u27\u27black-box\u27\u27 EM solver to be a difficult task.
In this work, we aim to propose a reliable/resilient, scalable and efficient finite elements based domain decomposition method (FE-DDM) as a general CEM solver to tackle such ultimate CEM problems to some extent. We recognize the rank deficiency property of the Dirichlet-to-Neumann (DtN) operators involved in the previously proposed FETI-2 DDM formulation and apply such principle to improve the computational efficiency and robustness of FETI-2 DDM. Specifically, the rank deficient DtN operator is computed by a randomized computation method that was originally proposed to approximate matrix singular value decomposition (SVD). Numerical results show a up to 35\% run-time and 75% memory saving of the DtN operators computation can be achieved on a realistic example. Later, such rank deficiency principle is incorporated into a new global DDM preconditioner (W-FETI) that is inspired by the matrix Woodbury identity. Numerical study of the eigenspectrum shows the validity of the proposed W-FETI global preconditioner. Several industrial-scaled examples show significant iterative convergence advantage of W-FETI that uses 35%-80% matrix-vector-products (MxVs) than state-of-the-art DDM solvers
The LifeV library: engineering mathematics beyond the proof of concept
LifeV is a library for the finite element (FE) solution of partial
differential equations in one, two, and three dimensions. It is written in C++
and designed to run on diverse parallel architectures, including cloud and high
performance computing facilities. In spite of its academic research nature,
meaning a library for the development and testing of new methods, one
distinguishing feature of LifeV is its use on real world problems and it is
intended to provide a tool for many engineering applications. It has been
actually used in computational hemodynamics, including cardiac mechanics and
fluid-structure interaction problems, in porous media, ice sheets dynamics for
both forward and inverse problems. In this paper we give a short overview of
the features of LifeV and its coding paradigms on simple problems. The main
focus is on the parallel environment which is mainly driven by domain
decomposition methods and based on external libraries such as MPI, the Trilinos
project, HDF5 and ParMetis.
Dedicated to the memory of Fausto Saleri.Comment: Review of the LifeV Finite Element librar
- …