11 research outputs found

    Evaluation of an efficient etack-RLE clustering concept for dynamically adaptive grids

    Get PDF
    This is the author accepted manuscript. The final version is available from the Society for Industrial and Applied Mathematics via the DOI in this record.Abstract. One approach to tackle the challenge of efficient implementations for parallel PDE simulations on dynamically changing grids is the usage of space-filling curves (SFC). While SFC algorithms possess advantageous properties such as low memory requirements and close-to-optimal partitioning approaches with linear complexity, they require efficient communication strategies for keeping and utilizing the connectivity information, in particular for dynamically changing grids. Our approach is to use a sparse communication graph to store the connectivity information and to transfer data block-wise. This permits efficient generation of multiple partitions per memory context (denoted by clustering) which - in combination with a run-length encoding (RLE) - directly leads to elegant solutions for shared, distributed and hybrid parallelization and allows cluster-based optimizations. While previous work focused on specific aspects, we present in this paper an overall compact summary of the stack-RLE clustering approach completed by aspects on the vertex-based communication that ease up understanding the approach. The central contribution of this work is the proof of suitability of the stack-RLE clustering approach for an efficient realization of different, relevant building blocks of Scientific Computing methodology and real-life CSE applications: We show 95% strong scalability for small-scale scalability benchmarks on 512 cores and weak scalability of over 90% on 8192 cores for finite-volume solvers and changing grid structure in every time step; optimizations of simulation data backends by writer tasks; comparisons of analytical benchmarks to analyze the adaptivity criteria; and a Tsunami simulation as a representative real-world showcase of a wave propagation for our approach which reduces the overall workload by 95% for parallel fully-adaptive mesh refinement and, based on a comparison with SFC-ordered regular grid cells, reduces the computation time by a factor of 7.6 with improved results and a factor of 62.2 with results of similar accuracy of buoy station dataThis work was partly supported by the German Research Foundation (DFG) as part of the Transregional Collaborative Research Centre “Invasive Computing” (SFB/TR 89)

    Scalable Algorithms for Parallel Tree-based Adaptive Mesh Refinement with General Element Types

    Get PDF
    In this thesis, we develop, discuss and implement algorithms for scalable parallel tree-based adaptive mesh refinement (AMR) using space-filling curves (SFCs). We create an AMR software that works independently of the used element type, such as for example lines, triangles, tetrahedra, quadrilaterals, hexahedra, and prisms. For triangular and tetrahedral elements (simplices) with red-refinement (1:4 in 2D, 1:8 in 3D), we develop a new SFC, the tetrahedral Morton space-filling curve (TM-SFC). Its construction is similar to the Morton index for quadrilaterals/hexa- hedra, as it is also based on bitwise interleaving the coordinates of a certain vertex of the simplex, the anchor node. Additionally, we interleave with a new piece of information, the so called type. For these simplices, we develop element local algorithms such as constructing the parent, children, or face-neighbors of a simplex, and show that most of them are constant-time operations independent of the refinement level. With SFC based partitioning it is possible that the mesh elements that are parti- tioned to one process do not form a face-connected domain. We prove the following upper bounds for the number of face-connected components of segments of the TM-SFC: With a maximum refine- ment level of L, the number of face-connected components is bounded by 2(L − 1) in 2D and 2L + 1 in 3D. Additionally, we perform a numerical investigation of the distribution of lengths of SFC segments. Furthermore, we develop a new approach to partition and repartition a coarse (input) mesh among the processes. Compared to previous methods it optimizes for fine mesh load-balance and reduces the parallel communication of coarse mesh data. We discuss the coarse mesh repartitioning algorithm and demonstrate that our method repartitions a coarse mesh of 371e9 trees on 917,504 processes (405,000 trees per process) on the Juqueen supercomputer in 1.2 seconds. We develop an AMR concept that works independently of the element type; achieving this independence by strictly distinguishing between functions that oper- ate on the whole mesh (high-level) and functions that locally operate on a single element or a small set of elements (low-level). We discuss a new approach to generate and manage ghost elements that fits into our element-type independent approach. We define and describe the necessary low-level algorithms. Our main idea is the computation of tree-to-tree face-neighbors of an element via the explicit construction of the element's face as a lower dimensional element. In order to optimize the runtime of this method we enhance the algorithm with a top-down search method from Isaac, Burstedde, Wilcox, and Ghattas, and demonstrate how it speeds up the computation by factors of 10 to 20 achieving runtimes comparable to state-of-the art implementations with fixed element types. With the ghost algorithm we build a straight-forward ripple version of the 2:1 balance algorithm. This is not an optimized version but it serves as a feasibility study for our element-type independent approach. We implement all algorithms that we develop in this thesis in the new AMR library t8code. Our modular approach allows us to reuse existing software, which we demonstrate by using the library p4est for quadrilateral and hexahedral elements. In a concurrent Bachelor's thesis by David Knapp (INS, Bonn) the necessary low-level algorithms for prisms were developed. With t8code we demonstrate that we can create, adapt, (re-)partition, and balance meshes, as well as create and manage a ghost layer. In various tests we show excellent strong and weak scaling behavior of our algorithms on up to 917,504 parallel processes on the Juqueen and Mira supercomputers using up to 858e9 mesh elements. We conclude this thesis by demonstrating how an application can be coupled with the AMR routines. We implement a finite volume based advection solver using t8code and show applications with triangular, quadrilateral, tetrahedral, and hexahedral elements, as well as 2D and 3D hybrid meshes, the latter consisting of tetrahedra, hexahedra, and prisms. Overall, we develop and demonstrate a new simplicial SFC and create a fast and scalable tree-based AMR software that offers a flexibility and generality that was previously not available

    Diamond-based models for scientific visualization

    Get PDF
    Hierarchical spatial decompositions are a basic modeling tool in a variety of application domains including scientific visualization, finite element analysis and shape modeling and analysis. A popular class of such approaches is based on the regular simplex bisection operator, which bisects simplices (e.g. line segments, triangles, tetrahedra) along the midpoint of a predetermined edge. Regular simplex bisection produces adaptive simplicial meshes of high geometric quality, while simplifying the extraction of crack-free, or conforming, approximations to the original dataset. Efficient multiresolution representations for such models have been achieved in 2D and 3D by clustering sets of simplices sharing the same bisection edge into structures called diamonds. In this thesis, we introduce several diamond-based approaches for scientific visualization. We first formalize the notion of diamonds in arbitrary dimensions in terms of two related simplicial decompositions of hypercubes. This enables us to enumerate the vertices, simplices, parents and children of a diamond. In particular, we identify the number of simplices involved in conforming updates to be factorial in the dimension and group these into a linear number of subclusters of simplices that are generated simultaneously. The latter form the basis for a compact pointerless representation for conforming meshes generated by regular simplex bisection and for efficiently navigating the topological connectivity of these meshes. Secondly, we introduce the supercube as a high-level primitive on such nested meshes based on the atomic units within the underlying triangulation grid. We propose the use of supercubes to associate information with coherent subsets of the full hierarchy and demonstrate the effectiveness of such a representation for modeling multiresolution terrain and volumetric datasets. Next, we introduce Isodiamond Hierarchies, a general framework for spatial access structures on a hierarchy of diamonds that exploits the implicit hierarchical and geometric relationships of the diamond model. We use an isodiamond hierarchy to encode irregular updates to a multiresolution isosurface or interval volume in terms of regular updates to diamonds. Finally, we consider nested hypercubic meshes, such as quadtrees, octrees and their higher dimensional analogues, through the lens of diamond hierarchies. This allows us to determine the relationships involved in generating balanced hypercubic meshes and to propose a compact pointerless representation of such meshes. We also provide a local diamond-based triangulation algorithm to generate high-quality conforming simplicial meshes

    What I’m doing or try to do... Sierpi(nski) & more

    No full text
    Cluster-based parallelization strategy of dynamically adaptive grids using Sierpinski space-filling curve. Content: Cluster-based parallel software concept, refinement and coarsening in each time step, cluster-based optimization strategies, run-length encoded communication methods, possible extensions to SFC cuts, first results on Tsunami simulations on parallel dynamically adaptive grids, etc. <br><br>Related work:<br>* Jörn Behrens and Jens Zimmermann, Parallelizing an unstructured grid generator with a space-filling curve approach, in Euro-Par 2000 Parallel Processing<br>* Frank Günther, Miriam Mehl, Markus Pögl, and Christoph Zenger, A cache-aware algorithm for PDEs on hierarchical data structures based on space-filling curves, SIAM Journal on Scientific Computing, 28 (2006)<br>* Michael Bader, Kaveh Rahnema, and Csaba Attila Vigh, Memory-efficient sierpinski-order traversals on dynamically adaptive, recursively structured triangular grids, in Applied Parallel and Scientific Computing<br>* Hans Sagan, Space-filling curves, vol. 18, Springer-Verlag New York, 1994<br>* David L George, Augmented riemann solvers for the shallow water equations over variable topography with steady states and inundation, Journal of Computational Physics<br

    Space-Filling Curves Continued: Sierpinski

    No full text
    A new massive-splitting parallelization concept using Sierpinski space-filling curves with dynamic adaptive mesh refinement is presented. We developed a new parallel software concept (independent storage of partitions) for shared-memory systems using a run-length encoded communication and several other beneficial advantages.<br><br>Related work:<br>* Jörn Behrens and Jens Zimmermann, Parallelizing an unstructured grid generator with a space-filling curve approach, in Euro-Par 2000 Parallel Processing<br>* Frank Günther, Miriam Mehl, Markus Pögl, and Christoph Zenger, A cache-aware algorithm for PDEs on hierarchical data structures based on space-filling curves, SIAM Journal on Scientific Computing, 28 (2006)<br>* Michael Bader, Kaveh Rahnema, and Csaba Attila Vigh, Memory-efficient sierpinski-order traversals on dynamically adaptive, recursively structured triangular grids, in Applied Parallel and Scientific Computing<br>* Hans Sagan, Space-filling curves, vol. 18, Springer-Verlag New York, 1994 * David L George, Augmented riemann solvers for the shallow water equations over variable topography with steady states and inundation, Journal of Computational Physics<br

    Sierpi Framework

    No full text
    In this talk, we present algorithms for efficient parallel shallow-water simulations on dynamically adaptive grids. We allow multiple partitions per memory context with all data stored independently and use a new run-length encoded concept for efficient data exchange. First of all, this allows OpenMP/TBB parallelization as well as optimizations such as reordering and skipping. We also discuss the MPI data migration strategy with this new parallelization concept. This was presented to Prof. Michael Bader and other members of his group (including Oliver Meister and Kaveh Rahnema) as part of a visit of guest researchers from Hamburg.<br>Related work:<br>* Jörn Behrens and Jens Zimmermann, Parallelizing an unstructured grid generator with a space-filling curve approach, in Euro-Par 2000 Parallel Processing<br>* Frank Günther, Miriam Mehl, Markus Pögl, and Christoph Zenger, A cache-aware algorithm for PDEs on hierarchical data structures based on space-filling curves, SIAM Journal on Scientific Computing, 28 (2006)<br>* Michael Bader, Kaveh Rahnema, and Csaba Attila Vigh, Memory-efficient sierpinski-order traversals on dynamically adaptive, recursively structured triangular grids, in Applied Parallel and Scientific Computing * Hans Sagan, Space-filling curves, vol. 18, Springer-Verlag New York, 1994<br
    corecore