261 research outputs found
A PETSc parallel-in-time solver based on MGRIT algorithm
We address the development of a modular implementation of the MGRIT (MultiGrid-In-Time) algorithm to solve linear and nonlinear systems that arise from the discretization of evolutionary models with a parallel-in-time approach in the context of the PETSc (the Portable, Extensible Toolkit for Scientific computing) library. Our aim is to give the opportunity of predicting the performance gain achievable when using the MGRIT approach instead of the Time Stepping integrator (TS). To this end, we analyze the performance parameters of the algorithm that provide a-priori the best number of processing elements and grid levels to use to address the scaling of MGRIT, regarded as a parallel iterative algorithm proceeding along the time dimensio
Multilevel convergence analysis of multigrid-reduction-in-time
This paper presents a multilevel convergence framework for
multigrid-reduction-in-time (MGRIT) as a generalization of previous two-grid
estimates. The framework provides a priori upper bounds on the convergence of
MGRIT V- and F-cycles, with different relaxation schemes, by deriving the
respective residual and error propagation operators. The residual and error
operators are functions of the time stepping operator, analyzed directly and
bounded in norm, both numerically and analytically. We present various upper
bounds of different computational cost and varying sharpness. These upper
bounds are complemented by proposing analytic formulae for the approximate
convergence factor of V-cycle algorithms that take the number of fine grid time
points, the temporal coarsening factors, and the eigenvalues of the time
stepping operator as parameters.
The paper concludes with supporting numerical investigations of parabolic
(anisotropic diffusion) and hyperbolic (wave equation) model problems. We
assess the sharpness of the bounds and the quality of the approximate
convergence factors. Observations from these numerical investigations
demonstrate the value of the proposed multilevel convergence framework for
estimating MGRIT convergence a priori and for the design of a convergent
algorithm. We further highlight that observations in the literature are
captured by the theory, including that two-level Parareal and multilevel MGRIT
with F-relaxation do not yield scalable algorithms and the benefit of a
stronger relaxation scheme. An important observation is that with increasing
numbers of levels MGRIT convergence deteriorates for the hyperbolic model
problem, while constant convergence factors can be achieved for the diffusion
equation. The theory also indicates that L-stable Runge-Kutta schemes are more
amendable to multilevel parallel-in-time integration with MGRIT than A-stable
Runge-Kutta schemes.Comment: 26 pages; 17 pages Supplementary Material
A Flexible Patch-Based Lattice Boltzmann Parallelization Approach for Heterogeneous GPU-CPU Clusters
Sustaining a large fraction of single GPU performance in parallel
computations is considered to be the major problem of GPU-based clusters. In
this article, this topic is addressed in the context of a lattice Boltzmann
flow solver that is integrated in the WaLBerla software framework. We propose a
multi-GPU implementation using a block-structured MPI parallelization, suitable
for load balancing and heterogeneous computations on CPUs and GPUs. The
overhead required for multi-GPU simulations is discussed in detail and it is
demonstrated that the kernel performance can be sustained to a large extent.
With our GPU implementation, we achieve nearly perfect weak scalability on
InfiniBand clusters. However, in strong scaling scenarios multi-GPUs make less
efficient use of the hardware than IBM BG/P and x86 clusters. Hence, a cost
analysis must determine the best course of action for a particular simulation
task. Additionally, weak scaling results of heterogeneous simulations conducted
on CPUs and GPUs simultaneously are presented using clusters equipped with
varying node configurations.Comment: 20 pages, 12 figure
Invasive Computing in HPC with X10
High performance computing with thousands of cores relies on distributed
memory due to memory consistency reasons. The resource
management on such systems usually relies on static assignment of
resources at the start of each application. Such a static scheduling
is incapable of starting applications with required resources being
used by others since a reduction of resources assigned to applications
without stopping them is not possible. This lack of dynamic
adaptive scheduling leads to idling resources until the remaining
amount of requested resources gets available. Additionally, applications
with changing resource requirements lead to idling or less
efficiently used resources. The invasive computing paradigm suggests
dynamic resource scheduling and applications able to dynamically
adapt to changing resource requirements.
As a case study, we developed an invasive resource manager as
well as a multigrid with dynamically changing resource demands.
Such a multigrid has changing scalability behavior during its execution
and requires data migration upon reallocation due to distributed
memory systems.
To counteract the additional complexity introduced by the additional
interfaces, e. g. for data migration, we use the X10 programming
language for improved programmability. Our results show
improved application throughput and the dynamic adaptivity. In addition,
we show our extension for the distributed arrays of X10 to
support data migrationThis work was supported by the German Research Foundation
(DFG) as part of the Transregional Collaborative Research Centre
“Invasive Computing” (SFB/TR 89)
An object-oriented approach for parallel self adaptive mesh refinement on block structured grids
Self-adaptive mesh refinement dynamically matches the computational demands of a solver for partial differential equations to the activity in the application's domain. In this paper we present two C++ class libraries, P++ and AMR++, which significantly simplify the development of sophisticated adaptive mesh refinement codes on (massively) parallel distributed memory architectures. The development is based on our previous research in this area. The C++ class libraries provide abstractions to separate the issues of developing parallel adaptive mesh refinement applications into those of parallelism, abstracted by P++, and adaptive mesh refinement, abstracted by AMR++. P++ is a parallel array class library to permit efficient development of architecture independent codes for structured grid applications, and AMR++ provides support for self-adaptive mesh refinement on block-structured grids of rectangular non-overlapping blocks. Using these libraries, the application programmers' work is greatly simplified to primarily specifying the serial single grid application and obtaining the parallel and self-adaptive mesh refinement code with minimal effort. Initial results for simple singular perturbation problems solved by self-adaptive multilevel techniques (FAC, AFAC), being implemented on the basis of prototypes of the P++/AMR++ environment, are presented. Singular perturbation problems frequently arise in large applications, e.g. in the area of computational fluid dynamics. They usually have solutions with layers which require adaptive mesh refinement and fast basic solvers in order to be resolved efficiently
- …