146 research outputs found

    One machine, one minute, three billion tetrahedra

    Full text link
    This paper presents a new scalable parallelization scheme to generate the 3D Delaunay triangulation of a given set of points. Our first contribution is an efficient serial implementation of the incremental Delaunay insertion algorithm. A simple dedicated data structure, an efficient sorting of the points and the optimization of the insertion algorithm have permitted to accelerate reference implementations by a factor three. Our second contribution is a multi-threaded version of the Delaunay kernel that is able to concurrently insert vertices. Moore curve coordinates are used to partition the point set, avoiding heavy synchronization overheads. Conflicts are managed by modifying the partitions with a simple rescaling of the space-filling curve. The performances of our implementation have been measured on three different processors, an Intel core-i7, an Intel Xeon Phi and an AMD EPYC, on which we have been able to compute 3 billion tetrahedra in 53 seconds. This corresponds to a generation rate of over 55 million tetrahedra per second. We finally show how this very efficient parallel Delaunay triangulation can be integrated in a Delaunay refinement mesh generator which takes as input the triangulated surface boundary of the volume to mesh

    Single-Strip Triangulation of Manifolds with Arbitrary Topology

    Full text link
    Triangle strips have been widely used for efficient rendering. It is NP-complete to test whether a given triangulated model can be represented as a single triangle strip, so many heuristics have been proposed to partition models into few long strips. In this paper, we present a new algorithm for creating a single triangle loop or strip from a triangulated model. Our method applies a dual graph matching algorithm to partition the mesh into cycles, and then merges pairs of cycles by splitting adjacent triangles when necessary. New vertices are introduced at midpoints of edges and the new triangles thus formed are coplanar with their parent triangles, hence the visual fidelity of the geometry is not changed. We prove that the increase in the number of triangles due to this splitting is 50% in the worst case, however for all models we tested the increase was less than 2%. We also prove tight bounds on the number of triangles needed for a single-strip representation of a model with holes on its boundary. Our strips can be used not only for efficient rendering, but also for other applications including the generation of space filling curves on a manifold of any arbitrary topology.Comment: 12 pages, 10 figures. To appear at Eurographics 200

    An Adaptive Mesh MPI Framework for Iterative C++ Programs

    Get PDF
    Computational Science and Engineering (CSE) applications often exhibit the pattern of adaptive mesh applications. Adaptive mesh algorithm starts with a coarse base-level grid structure covering entire computational domain. As the computation intensified, individual grid points are tagged for refinement. Such tagged grid points are dynamically overlayed with finer grid points. Similarly if the level of refinement in a cell is greater than required, all such regions are replaced with coarser grids. These refinements proceed recursively. We have developed an object-oriented framework enabling time-stepped adaptive mesh application developers to convert their sequential applications to MPI applications in few easy steps. We present in this thesis our positive experience converting such application using our framework. In addition to the MPI support, framework does the grid expansion/contraction and load balancing making the application developer’s life easier

    PERFORMANCE EVALUATION AND OPTIMIZATION OF THE UNSTRUCTURED CFD CODE UNCLE

    Get PDF
    Numerous advancements made in the field of computational sciences have made CFD a viable solution to the modern day fluid dynamics problems. Progress in computer performance allows us to solve a complex flow field in practical CPU time. Commodity clusters are also gaining popularity as computational research platform for various CFD communities. This research focuses on evaluating and enhancing the performance of an in-house, unstructured, 3D CFD code on modern commodity clusters. The fundamental idea is to tune the codes to optimize the cache behavior of the node on commodity clusters to achieve enhanced code performance. Accordingly, this work presents discussion of various available techniques for data access optimization and detailed description of those which yielded improved code performance. These techniques were tested on various steady, unsteady, laminar, and turbulent test cases and the results are presented. The critical hardware parameters which influenced the code performance were identified. A detailed study investigating the effect of these parameters on the code performance was conducted and the results are presented. The successful single node improvements were also efficiently tested on parallel platform. The modified version of the code was also ported to different hardware architectures with successful results. Loop blocking is established as a predictor of code performance

    Parallel Eulerian-Lagrangian Method with Adaptive Mesh Refinement for Moving Boundary Computation

    Full text link
    Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/106477/1/AIAA2013-370.pd

    Evaluation of an efficient etack-RLE clustering concept for dynamically adaptive grids

    Get PDF
    This is the author accepted manuscript. The final version is available from the Society for Industrial and Applied Mathematics via the DOI in this record.Abstract. One approach to tackle the challenge of efficient implementations for parallel PDE simulations on dynamically changing grids is the usage of space-filling curves (SFC). While SFC algorithms possess advantageous properties such as low memory requirements and close-to-optimal partitioning approaches with linear complexity, they require efficient communication strategies for keeping and utilizing the connectivity information, in particular for dynamically changing grids. Our approach is to use a sparse communication graph to store the connectivity information and to transfer data block-wise. This permits efficient generation of multiple partitions per memory context (denoted by clustering) which - in combination with a run-length encoding (RLE) - directly leads to elegant solutions for shared, distributed and hybrid parallelization and allows cluster-based optimizations. While previous work focused on specific aspects, we present in this paper an overall compact summary of the stack-RLE clustering approach completed by aspects on the vertex-based communication that ease up understanding the approach. The central contribution of this work is the proof of suitability of the stack-RLE clustering approach for an efficient realization of different, relevant building blocks of Scientific Computing methodology and real-life CSE applications: We show 95% strong scalability for small-scale scalability benchmarks on 512 cores and weak scalability of over 90% on 8192 cores for finite-volume solvers and changing grid structure in every time step; optimizations of simulation data backends by writer tasks; comparisons of analytical benchmarks to analyze the adaptivity criteria; and a Tsunami simulation as a representative real-world showcase of a wave propagation for our approach which reduces the overall workload by 95% for parallel fully-adaptive mesh refinement and, based on a comparison with SFC-ordered regular grid cells, reduces the computation time by a factor of 7.6 with improved results and a factor of 62.2 with results of similar accuracy of buoy station dataThis work was partly supported by the German Research Foundation (DFG) as part of the Transregional Collaborative Research Centre “Invasive Computing” (SFB/TR 89)

    Automatic parallel octree grid generation software with an extensible solver framework and a focus on urban simulation

    Get PDF
    The development of an automatic, dynamic, parallel, Cartesian, linear forest-of-octree grid generator and partial differential equation (PDE) solver framework is presented. This research is bundled into an application programmed with C++ which uses MPI for distributed parallelism. The application is named paros which stands for PARallel Octree Solver. In its current implementation, the application provides a \u27zeroth\u27 order representation of the target geometry, and as such, no cut-cell algorithm, projection method, or immersed boundary condition are implemented. In this case, \u27zeroth\u27 order means that no geometry is ever exactly represented in the final computational mesh: an octree element is either completely in the domain or entirely outside of it. Any element that contains or is intersected by a geometry facet is removed from the final mesh which results in a \u27blocky\u27 or \u27stepped\u27 geometry representation and simplifies boundary computations. The computational octree mesh creation is completely parallel and automated. The algorithm is dynamic in the sense that it is repartitioned dynamically throughout the grid generation process to maintain optimal load balancing during all phases of the mesh genesis. A linear octree data structure is used to store the octree mesh elements and is leveraged for optimal load balancing. An additional hierarchical octree is used to significantly improve algorithms that suffer from this linear storage paradigm. This work focuses on, but is not limited to, applications related to urban simulations and may be applied to plume/contaminant propagation. Within the PDE solution framework a cell-centered, incompressible, unsteady, Navier Stokes solver with an energy term to account for thermal buoyancy is implemented and validated using canonical test cases. Turbulence closure is implemented in the form of the Smagorinski large eddy simulation (LES) model. The parallel grid generation and solution process is tested on a large scale cityscape geometry and shown to be robust and efficient. Additionally, an implementation of the compressible Navier-Stokes equations is coded within the framework. The framework is extensible such that adding other types of numerical PDE solvers should not be difficult. Other features including adaptive mesh refinement (AMR) and contaminant transport functionality are included

    HPCCP/CAS Workshop Proceedings 1998

    Get PDF
    This publication is a collection of extended abstracts of presentations given at the HPCCP/CAS (High Performance Computing and Communications Program/Computational Aerosciences Project) Workshop held on August 24-26, 1998, at NASA Ames Research Center, Moffett Field, California. The objective of the Workshop was to bring together the aerospace high performance computing community, consisting of airframe and propulsion companies, independent software vendors, university researchers, and government scientists and engineers. The Workshop was sponsored by the HPCCP Office at NASA Ames Research Center. The Workshop consisted of over 40 presentations, including an overview of NASA's High Performance Computing and Communications Program and the Computational Aerosciences Project; ten sessions of papers representative of the high performance computing research conducted within the Program by the aerospace industry, academia, NASA, and other government laboratories; two panel sessions; and a special presentation by Mr. James Bailey

    A fault tolerant grid generation technique

    Get PDF
    Automatic and parallel mesh generation has been highlighted as a bottleneck for large scale automated Computational Fluid Dynamics analysis. The desire for large scale automated CFD is driven by the growing computational capabilities in large scale supercomputers. Unfortunately, as compute clusters grow in size, they also suffer more failures. Left unchecked, the increased frequency of failures may stymie any efforts to fully utilize these machines. This work aims to tackle one component required for automated large scale engineering analysis by developing a fault tolerant mesh generator. The mesh generator uses a novel com- munication layer written using the transport layer ZeroMQ and is made fault tolerant through an integrated in-memory checkpoint and recovery strategy. Benefits of using in-memory checkpoints vs traditional in-disk checkpoints are discussed. By relying on in-memory checkpointing, it is demonstrated that the mesh generator to be capable of generating Cartesian meshes in parallel. The generator continues to operate even while the compute cluster it is running suffers failures. The generator is shown to be high performing, including being capable of generating an 8.6 billion element mesh in just over 1 minute while creating multiple in-memory checkpoints
    • …
    corecore