36 research outputs found

    Cluster-based communication and load balancing for simulations on dynamically adaptive grids

    Get PDF
    short paperThe present paper introduces a new communication and load-balancing scheme based on a clustering of the grid which we use for the efficient parallelization of simulations on dynamically adaptive grids. With a partitioning based on space-filling curves (SFCs), this yields several advantageous properties regarding the memory requirements and load balancing. However, for such an SFC- based partitioning, additional connectivity information has to be stored and updated for dynamically changing grids. In this work, we present our approach to keep this connectivity information run-length encoded (RLE) only for the interfaces shared between partitions. Using special properties of the underlying grid traversal and used communication scheme, we update this connectivity information implicitly for dynamically changing grids and can represent the connectivity information as a sparse communication graph: graph nodes (partitions) represent bulks of connected grid cells and each graph edge (RLE connectivity information) a unique relation between adjacent partitions. This directly leads to an efficient shared-memory parallelization with graph nodes assigned to computing cores and an efficient en bloc data exchange via graph edges. We further refer to such a partitioning approach with RLE meta information as a cluster-based domain decomposition and to each partition as a cluster. With the sparse communication graph in mind, we then extend the connectivity information represented by the graph edges with MPI ranks, yielding an en bloc communication for distributed-memory systems and a hybrid parallelization. For data migration, the stack-based intra-cluster communication allows a very low memory footprint for data migration and the RLE leads to efficient updates of connectivity information. Our benchmark is based on a shallow water simulation on a dynamically adaptive grid. We conducted performance studies for MPI-only and hybrid parallelizations, yielding an efficiency of over 90% on 256 cores. Furthermore, we demonstrate the applicability of cluster-based optimizations on distributed-memory systems.We like to thank the Munich Centre of Advanced Computing for for funding this project by providing computing time on the MAC Cluster. This work was partly supported by the German Research Foundation (DFG) as part of the Transregional Collaborative Research Centre ”Invasive Computing” (SFB/TR 89)

    SFC-based Communication Metadata Encoding for Adaptive Mesh

    Get PDF
    This volume of the series “Advances in Parallel Computing” contains the proceedings of the International Conference on Parallel Programming – ParCo 2013 – held from 10 to 13 September 2013 in Garching, Germany. The conference was hosted by the Technische Universität München (Department of Informatics) and the Leibniz Supercomputing Centre.The present paper studies two adaptive mesh refinement (AMR) codes whose grids rely on recursive subdivison in combination with space-filling curves (SFCs). A non-overlapping domain decomposition based upon these SFCs yields several well-known advantageous properties with respect to communication demands, balancing, and partition connectivity. However, the administration of the meta data, i.e. to track which partitions exchange data in which cardinality, is nontrivial due to the SFC’s fractal meandering and the dynamic adaptivity. We introduce an analysed tree grammar for the meta data that restricts it without loss of information hierarchically along the subdivision tree and applies run length encoding. Hence, its meta data memory footprint is very small, and it can be computed and maintained on-the-fly even for permanently changing grids. It facilitates a forkjoin pattern for shared data parallelism. And it facilitates replicated data parallelism tackling latency and bandwidth constraints respectively due to communication in the background and reduces memory requirements by avoiding adjacency information stored per element. We demonstrate this at hands of shared and distributed parallelized domain decompositions.This work was supported by the German Research Foundation (DFG) as part of the Transregional Collaborative Research Centre “Invasive Computing (SFB/TR 89). It is partially based on work supported by Award No. UK-c0020, made by the King Abdullah University of Science and Technology (KAUST)

    Novel Graph-based Adaptive Triangular Mesh Refinement for Finite-volume Discretizations

    Get PDF
    A novel graph-based adaptive mesh refinement technique for triangular finite-volume discretizations in order to solve second-order partial differential equations is described. Adaptive refined meshes are built in order to solve time-dependent problems aiming low computational costs. In the approach proposed, flexibility to link and traverse nodes among neighbors in different levels of refinement is admitted; and volumes are refined using an approach that allows straightforward and strictly local update of the data structure. In addition, linear equation system solvers based on the minimization of functionals can be easily used; specifically, the Conjugate Gradient Method. Numerical and analytical tests were carried out in order to study the required execution time and the data storage cost. These tests confirmed the advantages of the approach proposed in elliptic and parabolic problems

    High-performance tsunami modelling with modern GPU technology

    Get PDF
    PhD ThesisEarthquake-induced tsunamis commonly propagate in the deep ocean as long waves and develop into sharp-fronted surges moving rapidly coastward, which may be effectively simulated by hydrodynamic models solving the nonlinear shallow water equations (SWEs). Tsunamis can cause substantial economic and human losses, which could be mitigated through early warning systems given efficient and accurate modelling. Most existing tsunami models require long simulation times for real-world applications. This thesis presents a graphics processing unit (GPU) accelerated finite volume hydrodynamic model using the compute unified device architecture (CUDA) for computationally efficient tsunami simulations. Compared with a standard PC, the model is able to reduce run-time by a factor of > 40. The validated model is used to reproduce the 2011 Japan tsunami. Two source models were tested, one based on tsunami waveform inversion and another using deep-ocean tsunameters. Vertical sea surface displacement is computed by the Okada model, assuming instantaneous sea-floor deformation. Both source models can reproduce the wave propagation at offshore and nearshore gauges, but the tsunameter-based model better simulates the first wave amplitude. Effects of grid resolutions between 450-3600 m, slope limiters, and numerical accuracy are also investigated for the simulation of the 2011 Japan tsunami. Grid resolutions of 1-2 km perform well with a proper limiter; the Sweby limiter is optimal for coarser resolutions, recovers wave peaks better than minmod, and is more numerically stable than Superbee. One hour of tsunami propagation can be predicted in 50 times on a regular low-cost PC-hosted GPU, compared to a single CPU. For 450 m resolution on a larger-memory server-hosted GPU, performance increased by ~70 times. Finally, two adaptive mesh refinement (AMR) techniques including simplified dynamic adaptive grids on CPU and a static adaptive grid on GPU are introduced to provide multi-scale simulations. Both can reduce run-time by ~3 times while maintaining acceptable accuracy. The proposed computationally-efficient tsunami model is expected to provide a new practical tool for tsunami modelling for different purposes, including real-time warning, evacuation planning, risk management and city planning

    RKDG2 shallow-water solver on non-uniform grids with local time steps: Application to 1D and 2D hydrodynamics

    Get PDF
    This paper investigates local time stepping (LTS) with the RKDG2 (second-order Runge–Kutta Discontinuous Galerkin) non-uniform solutions of the inhomogeneous SWEs (shallow water equations) with source terms. A LTS algorithm – recently designed for homogenous hyperbolic PDE(s) – is herein reconsidered and improved in combination with the RKDG2 shallow-flow solver (LTS-RKDG2) including topography and friction source terms as well as wetting and drying. Two LTS-RKDG2 schemes that adapt 3 and 4 levels of LTSs are configured on 1D and/or 2D (quadrilateral) non-uniform meshes that, respectively, adopt 3 and 4 scales of spatial discretization. Selected shallow water benchmark tests are used to verify, assess and compare the LTS-RKDG2 schemes relative to their conventional Global Time Step RKDG2 alternatives (GTS-RKDG2) considering several issues of practical relevance to hydraulic modelling. Results show that the LTS-RKDG2 models could offer (depending on both the mesh setting and the features of the flow) comparable accuracy to the associated GTS-RKDG2 models with a savings in runtime of up to a factor of 2.5 in 1D simulations and 1.6 in 2D simulations

    Wavelet-based numerical methods adaptive modelling of shallow water flows

    Get PDF
    Mesh adaptation techniques are commonly coupled with the numerical schemes in an attempt to improve the modelling efficiency and capturing of the different physical scales which are involved in the shallow water flow problems. This work designs an adaptive technique that avails from the wavelets theory for transforming the local single resolution information into multiresolution information in which these data information became accessible. The adaptivity of wavelets was first comprehensively tested via using an arbitrary function in which the spatial resolution adaptivity was achieved from the local solution itself and it was based on a single user-prescribed parameter. Secondly, the adaptive technique was combined with two standard numerical modelling schemes (i.e. finite volume and discontinuous Galerkin schemes) to produce two wavelet-based adaptive schemes. These schemes are designed for modelling one-dimensional shallow water flows and are referred to the Haar wavelets finite volume (HWFV) and multiwavelet discontinuous Galerkin (MWDG) schemes. Both adaptive schemes were systematically tested using hydraulic test cases. The results demonstrated that the proposed adaptive technique could serve as lucid foundation on which to construct holistic and smart adaptive schemes for simulating real shallow water flow
    corecore