261 research outputs found

    A PETSc parallel-in-time solver based on MGRIT algorithm

    Get PDF
    We address the development of a modular implementation of the MGRIT (MultiGrid-In-Time) algorithm to solve linear and nonlinear systems that arise from the discretization of evolutionary models with a parallel-in-time approach in the context of the PETSc (the Portable, Extensible Toolkit for Scientific computing) library. Our aim is to give the opportunity of predicting the performance gain achievable when using the MGRIT approach instead of the Time Stepping integrator (TS). To this end, we analyze the performance parameters of the algorithm that provide a-priori the best number of processing elements and grid levels to use to address the scaling of MGRIT, regarded as a parallel iterative algorithm proceeding along the time dimensio

    Multilevel convergence analysis of multigrid-reduction-in-time

    Full text link
    This paper presents a multilevel convergence framework for multigrid-reduction-in-time (MGRIT) as a generalization of previous two-grid estimates. The framework provides a priori upper bounds on the convergence of MGRIT V- and F-cycles, with different relaxation schemes, by deriving the respective residual and error propagation operators. The residual and error operators are functions of the time stepping operator, analyzed directly and bounded in norm, both numerically and analytically. We present various upper bounds of different computational cost and varying sharpness. These upper bounds are complemented by proposing analytic formulae for the approximate convergence factor of V-cycle algorithms that take the number of fine grid time points, the temporal coarsening factors, and the eigenvalues of the time stepping operator as parameters. The paper concludes with supporting numerical investigations of parabolic (anisotropic diffusion) and hyperbolic (wave equation) model problems. We assess the sharpness of the bounds and the quality of the approximate convergence factors. Observations from these numerical investigations demonstrate the value of the proposed multilevel convergence framework for estimating MGRIT convergence a priori and for the design of a convergent algorithm. We further highlight that observations in the literature are captured by the theory, including that two-level Parareal and multilevel MGRIT with F-relaxation do not yield scalable algorithms and the benefit of a stronger relaxation scheme. An important observation is that with increasing numbers of levels MGRIT convergence deteriorates for the hyperbolic model problem, while constant convergence factors can be achieved for the diffusion equation. The theory also indicates that L-stable Runge-Kutta schemes are more amendable to multilevel parallel-in-time integration with MGRIT than A-stable Runge-Kutta schemes.Comment: 26 pages; 17 pages Supplementary Material

    A Flexible Patch-Based Lattice Boltzmann Parallelization Approach for Heterogeneous GPU-CPU Clusters

    Full text link
    Sustaining a large fraction of single GPU performance in parallel computations is considered to be the major problem of GPU-based clusters. In this article, this topic is addressed in the context of a lattice Boltzmann flow solver that is integrated in the WaLBerla software framework. We propose a multi-GPU implementation using a block-structured MPI parallelization, suitable for load balancing and heterogeneous computations on CPUs and GPUs. The overhead required for multi-GPU simulations is discussed in detail and it is demonstrated that the kernel performance can be sustained to a large extent. With our GPU implementation, we achieve nearly perfect weak scalability on InfiniBand clusters. However, in strong scaling scenarios multi-GPUs make less efficient use of the hardware than IBM BG/P and x86 clusters. Hence, a cost analysis must determine the best course of action for a particular simulation task. Additionally, weak scaling results of heterogeneous simulations conducted on CPUs and GPUs simultaneously are presented using clusters equipped with varying node configurations.Comment: 20 pages, 12 figure

    Invasive Computing in HPC with X10

    Get PDF
    High performance computing with thousands of cores relies on distributed memory due to memory consistency reasons. The resource management on such systems usually relies on static assignment of resources at the start of each application. Such a static scheduling is incapable of starting applications with required resources being used by others since a reduction of resources assigned to applications without stopping them is not possible. This lack of dynamic adaptive scheduling leads to idling resources until the remaining amount of requested resources gets available. Additionally, applications with changing resource requirements lead to idling or less efficiently used resources. The invasive computing paradigm suggests dynamic resource scheduling and applications able to dynamically adapt to changing resource requirements. As a case study, we developed an invasive resource manager as well as a multigrid with dynamically changing resource demands. Such a multigrid has changing scalability behavior during its execution and requires data migration upon reallocation due to distributed memory systems. To counteract the additional complexity introduced by the additional interfaces, e. g. for data migration, we use the X10 programming language for improved programmability. Our results show improved application throughput and the dynamic adaptivity. In addition, we show our extension for the distributed arrays of X10 to support data migrationThis work was supported by the German Research Foundation (DFG) as part of the Transregional Collaborative Research Centre “Invasive Computing” (SFB/TR 89)

    An object-oriented approach for parallel self adaptive mesh refinement on block structured grids

    Get PDF
    Self-adaptive mesh refinement dynamically matches the computational demands of a solver for partial differential equations to the activity in the application's domain. In this paper we present two C++ class libraries, P++ and AMR++, which significantly simplify the development of sophisticated adaptive mesh refinement codes on (massively) parallel distributed memory architectures. The development is based on our previous research in this area. The C++ class libraries provide abstractions to separate the issues of developing parallel adaptive mesh refinement applications into those of parallelism, abstracted by P++, and adaptive mesh refinement, abstracted by AMR++. P++ is a parallel array class library to permit efficient development of architecture independent codes for structured grid applications, and AMR++ provides support for self-adaptive mesh refinement on block-structured grids of rectangular non-overlapping blocks. Using these libraries, the application programmers' work is greatly simplified to primarily specifying the serial single grid application and obtaining the parallel and self-adaptive mesh refinement code with minimal effort. Initial results for simple singular perturbation problems solved by self-adaptive multilevel techniques (FAC, AFAC), being implemented on the basis of prototypes of the P++/AMR++ environment, are presented. Singular perturbation problems frequently arise in large applications, e.g. in the area of computational fluid dynamics. They usually have solutions with layers which require adaptive mesh refinement and fast basic solvers in order to be resolved efficiently
    • …
    corecore