777 research outputs found
SPH-EXA: Enhancing the Scalability of SPH codes Via an Exascale-Ready SPH Mini-App
Numerical simulations of fluids in astrophysics and computational fluid
dynamics (CFD) are among the most computationally-demanding calculations, in
terms of sustained floating-point operations per second, or FLOP/s. It is
expected that these numerical simulations will significantly benefit from the
future Exascale computing infrastructures, that will perform 10^18 FLOP/s. The
performance of the SPH codes is, in general, adversely impacted by several
factors, such as multiple time-stepping, long-range interactions, and/or
boundary conditions. In this work an extensive study of three SPH
implementations SPHYNX, ChaNGa, and XXX is performed, to gain insights and to
expose any limitations and characteristics of the codes. These codes are the
starting point of an interdisciplinary co-design project, SPH-EXA, for the
development of an Exascale-ready SPH mini-app. We implemented a rotating square
patch as a joint test simulation for the three SPH codes and analyzed their
performance on a modern HPC system, Piz Daint. The performance profiling and
scalability analysis conducted on the three parent codes allowed to expose
their performance issues, such as load imbalance, both in MPI and OpenMP.
Two-level load balancing has been successfully applied to SPHYNX to overcome
its load imbalance. The performance analysis shapes and drives the design of
the SPH-EXA mini-app towards the use of efficient parallelization methods,
fault-tolerance mechanisms, and load balancing approaches.Comment: arXiv admin note: substantial text overlap with arXiv:1809.0801
Simulation of a flowing snow avalanche using molecular dynamics
This paper presents an approach for the modeling and simulation of a flowing snow avalanche, which is formed of dry and liquefied snow that slides down a slope, using molecular dynamics and the discrete element method. A particle system is utilized as a base method for the simulation and marching cubes with real-time shaders are employed for rendering. A uniform grid-based neighbor search algorithm is used for collision detection for interparticle and particleterrain interactions. A mass-spring model of the collision resolution is employed to mimic the compressibility of the snow and particle attraction forces are put into use between the particles and terrain surface. In order to achieve greater performance, general purpose GPU language and multithreaded programming are utilized for collision detection and resolution. The results are displayed with different combinations of rendering methods for the realistic representation of the flowing avalanche. © TÜB̄TAK
Parallel cloth simulation using OpenMp and CUDA
The widespread availability of parallel computing architectures has lead to research regarding algorithms and techniques that best exploit available parallelism. In addition to the CPU parallelism available; the GPU has emerged as a parallel computational device. The goal of this study was to explore the combined use of CPU and GPU parallelism by developing a hybrid parallel CPU/GPU cloth simulation application. In order to evaluate the benefits of the hybrid approach, the application was first developed in sequential CPU form, followed by a parallel CPU form. The application uses Backward Euler implicit time integration to solve the differential equations of motion associated with the physical system. The Conjugate Gradient (CG) algorithm is used to determine the solution vector for the system of equations formed by the Backward Euler approach. The matrix/vector, vector/vector, and vector/scalar operations required by CG are handled by calls to BLAS level 1 and level 2 functions. In the sequential CPU and parallel CPU versions, the Intel Math Kernel Library implementation of BLAS is used. In the hybrid parallel CPU/GPU version, the Nvidia CUDA based BLAS implementation (CUBLAS) is used. In the parallel CPU and hybrid implementations, OpenMP directives are used to parallelize the force application loop that traverses the list of forces acting on the system. Runtimes were collected for each version of the application while simulating cloth meshes with particle resolutions of 20x20, 40x40, and 60x60. The performance of each version was compared at each mesh resolution. The level of performance degradation experienced when transitioning to the larger mesh sizes was also determined. The hybrid parallel CPU/GPU implementation yielded the highest frame rate for the 40x40 and 60x60 meshes. The parallel CPU implementation yielded the highest frame rate for the 20x20 mesh. The performance of the hybrid parallel CPU/GPU implementation degraded the least as it transitioned to the two larger mesh sizes. The results of this study will potentially lead to further research regarding the use of GPUs to perform the matrix/vector operations associated with the CG algorithm under more complex cloth simulation scenarios
Parallel packing code for propellant microstructure analysis
In recent years, packing codes have become a successful alternative to experimental data collection for microstructure investigation of heterogeneous materials. Composite solid rocket propellants are interesting representatives of this category, consisting of a mix of fuel and oxidizer powders embedded in a polymeric binder. Their macroscopic properties are strictly dependent on the peculiar microstructure, which influences mechanical, combustion, as well as physical features. This work addresses algorithm development, validation, and scalability of POLIPack, a parallel packing code based on the Lubachevsky–Stillinger algorithm, developed at the Space Propulsion Laboratory (SPLab) of Politecnico di Milano. The application can reproduce the organization of spheres of any diameter inside a cube with periodic boundary. In addition to the general code description, the paper identifies a collision condition not addressed by the original Lubachevsky's algorithm (here called back impact), introduces a novel post-impact handling granting a minimum separation velocity between particles, and presents a parallelization approach based on OpenMP shared memory paradigm. Monomodal and bimodal packs have been compared to experimental data through statistic descriptors and packing maps
Simple and Robust Boolean Operations for Triangulated Surfaces
Boolean operations of geometric models is an essential issue in computational
geometry. In this paper, we develop a simple and robust approach to perform
Boolean operations on closed and open triangulated surfaces. Our method mainly
has two stages: (1) We firstly find out candidate intersected-triangles pairs
based on Octree and then compute the inter-section lines for all pairs of
triangles with parallel algorithm; (2) We form closed or open
intersection-loops, sub-surfaces and sub-blocks quite robustly only according
to the cleared and updated topology of meshes while without coordinate
computations for geometric enti-ties. A novel technique instead of
inside/outside classification is also proposed to distinguish the resulting
union, subtraction and intersection. Several examples have been given to
illus-trate the effectiveness of our approach.Comment: Novel method for determining Union, Subtraction and Intersectio
- …