777 research outputs found

    SPH-EXA: Enhancing the Scalability of SPH codes Via an Exascale-Ready SPH Mini-App

    Full text link
    Numerical simulations of fluids in astrophysics and computational fluid dynamics (CFD) are among the most computationally-demanding calculations, in terms of sustained floating-point operations per second, or FLOP/s. It is expected that these numerical simulations will significantly benefit from the future Exascale computing infrastructures, that will perform 10^18 FLOP/s. The performance of the SPH codes is, in general, adversely impacted by several factors, such as multiple time-stepping, long-range interactions, and/or boundary conditions. In this work an extensive study of three SPH implementations SPHYNX, ChaNGa, and XXX is performed, to gain insights and to expose any limitations and characteristics of the codes. These codes are the starting point of an interdisciplinary co-design project, SPH-EXA, for the development of an Exascale-ready SPH mini-app. We implemented a rotating square patch as a joint test simulation for the three SPH codes and analyzed their performance on a modern HPC system, Piz Daint. The performance profiling and scalability analysis conducted on the three parent codes allowed to expose their performance issues, such as load imbalance, both in MPI and OpenMP. Two-level load balancing has been successfully applied to SPHYNX to overcome its load imbalance. The performance analysis shapes and drives the design of the SPH-EXA mini-app towards the use of efficient parallelization methods, fault-tolerance mechanisms, and load balancing approaches.Comment: arXiv admin note: substantial text overlap with arXiv:1809.0801

    Simulation of a flowing snow avalanche using molecular dynamics

    Get PDF
    This paper presents an approach for the modeling and simulation of a flowing snow avalanche, which is formed of dry and liquefied snow that slides down a slope, using molecular dynamics and the discrete element method. A particle system is utilized as a base method for the simulation and marching cubes with real-time shaders are employed for rendering. A uniform grid-based neighbor search algorithm is used for collision detection for interparticle and particleterrain interactions. A mass-spring model of the collision resolution is employed to mimic the compressibility of the snow and particle attraction forces are put into use between the particles and terrain surface. In order to achieve greater performance, general purpose GPU language and multithreaded programming are utilized for collision detection and resolution. The results are displayed with different combinations of rendering methods for the realistic representation of the flowing avalanche. © TÜB̄TAK

    Parallel cloth simulation using OpenMp and CUDA

    Get PDF
    The widespread availability of parallel computing architectures has lead to research regarding algorithms and techniques that best exploit available parallelism. In addition to the CPU parallelism available; the GPU has emerged as a parallel computational device. The goal of this study was to explore the combined use of CPU and GPU parallelism by developing a hybrid parallel CPU/GPU cloth simulation application. In order to evaluate the benefits of the hybrid approach, the application was first developed in sequential CPU form, followed by a parallel CPU form. The application uses Backward Euler implicit time integration to solve the differential equations of motion associated with the physical system. The Conjugate Gradient (CG) algorithm is used to determine the solution vector for the system of equations formed by the Backward Euler approach. The matrix/vector, vector/vector, and vector/scalar operations required by CG are handled by calls to BLAS level 1 and level 2 functions. In the sequential CPU and parallel CPU versions, the Intel Math Kernel Library implementation of BLAS is used. In the hybrid parallel CPU/GPU version, the Nvidia CUDA based BLAS implementation (CUBLAS) is used. In the parallel CPU and hybrid implementations, OpenMP directives are used to parallelize the force application loop that traverses the list of forces acting on the system. Runtimes were collected for each version of the application while simulating cloth meshes with particle resolutions of 20x20, 40x40, and 60x60. The performance of each version was compared at each mesh resolution. The level of performance degradation experienced when transitioning to the larger mesh sizes was also determined. The hybrid parallel CPU/GPU implementation yielded the highest frame rate for the 40x40 and 60x60 meshes. The parallel CPU implementation yielded the highest frame rate for the 20x20 mesh. The performance of the hybrid parallel CPU/GPU implementation degraded the least as it transitioned to the two larger mesh sizes. The results of this study will potentially lead to further research regarding the use of GPUs to perform the matrix/vector operations associated with the CG algorithm under more complex cloth simulation scenarios

    Parallel packing code for propellant microstructure analysis

    Get PDF
    In recent years, packing codes have become a successful alternative to experimental data collection for microstructure investigation of heterogeneous materials. Composite solid rocket propellants are interesting representatives of this category, consisting of a mix of fuel and oxidizer powders embedded in a polymeric binder. Their macroscopic properties are strictly dependent on the peculiar microstructure, which influences mechanical, combustion, as well as physical features. This work addresses algorithm development, validation, and scalability of POLIPack, a parallel packing code based on the Lubachevsky–Stillinger algorithm, developed at the Space Propulsion Laboratory (SPLab) of Politecnico di Milano. The application can reproduce the organization of spheres of any diameter inside a cube with periodic boundary. In addition to the general code description, the paper identifies a collision condition not addressed by the original Lubachevsky's algorithm (here called back impact), introduces a novel post-impact handling granting a minimum separation velocity between particles, and presents a parallelization approach based on OpenMP shared memory paradigm. Monomodal and bimodal packs have been compared to experimental data through statistic descriptors and packing maps

    Simple and Robust Boolean Operations for Triangulated Surfaces

    Full text link
    Boolean operations of geometric models is an essential issue in computational geometry. In this paper, we develop a simple and robust approach to perform Boolean operations on closed and open triangulated surfaces. Our method mainly has two stages: (1) We firstly find out candidate intersected-triangles pairs based on Octree and then compute the inter-section lines for all pairs of triangles with parallel algorithm; (2) We form closed or open intersection-loops, sub-surfaces and sub-blocks quite robustly only according to the cleared and updated topology of meshes while without coordinate computations for geometric enti-ties. A novel technique instead of inside/outside classification is also proposed to distinguish the resulting union, subtraction and intersection. Several examples have been given to illus-trate the effectiveness of our approach.Comment: Novel method for determining Union, Subtraction and Intersectio
    corecore