194 research outputs found
Dynamic task fusion for a block-structured finite volume solver over a dynamically adaptive mesh with local time stepping
Load balancing of generic wave equation solvers over dynamically adaptive meshes with local time stepping is dicult, as the load changes with every time step. Task-based programming promises to mitigate the load balancing problem. We study a Finite Volume code over dynamically adaptive block-structured meshes for two astrophysics simulations, where the patches (blocks) dene tasks. They are classied into urgent and low priority tasks. Urgent tasks are algorithmically latencysensitive. They are processed directly as part of our bulk-synchronous mesh traversals. Non-urgent tasks are held back in an additional task queue on top of the task runtime system. If they lack global side-eects, i.e. do not alter the global solver state, we can generate optimised compute kernels for these tasks. Furthermore, we propose to use the additional queue to merge tasks without side-eects into task assemblies, and to balance out imbalanced bulk synchronous processing phases
SFC-based Communication Metadata Encoding for Adaptive Mesh
This volume of the series âAdvances in Parallel Computingâ contains the proceedings of the International Conference on Parallel Programming â ParCo 2013 â held from 10 to 13 September 2013 in Garching, Germany. The conference was hosted by the Technische UniversitĂ€t MĂŒnchen (Department of Informatics) and the Leibniz Supercomputing Centre.The present paper studies two adaptive mesh refinement (AMR) codes
whose grids rely on recursive subdivison in combination with space-filling curves
(SFCs). A non-overlapping domain decomposition based upon these SFCs yields
several well-known advantageous properties with respect to communication demands,
balancing, and partition connectivity. However, the administration of the
meta data, i.e. to track which partitions exchange data in which cardinality, is nontrivial
due to the SFCâs fractal meandering and the dynamic adaptivity. We introduce
an analysed tree grammar for the meta data that restricts it without loss of
information hierarchically along the subdivision tree and applies run length encoding.
Hence, its meta data memory footprint is very small, and it can be computed
and maintained on-the-fly even for permanently changing grids. It facilitates a forkjoin
pattern for shared data parallelism. And it facilitates replicated data parallelism
tackling latency and bandwidth constraints respectively due to communication in
the background and reduces memory requirements by avoiding adjacency information
stored per element. We demonstrate this at hands of shared and distributed
parallelized domain decompositions.This work was supported by the German Research Foundation (DFG) as part of the
Transregional Collaborative Research Centre âInvasive Computing (SFB/TR 89). It is
partially based on work supported by Award No. UK-c0020, made by the King Abdullah
University of Science and Technology (KAUST)
Generation of initial molecular dynamics configurations in arbitrary geometries and in parallel
A computational pre-processing tool for generating initial configurations of molecules for molecular dynamics simulations in geometries described by a mesh of unstructured arbitrary polyhedra is described. The mesh is divided into separate zones and each can be filled with a single crystal lattice of atoms. Each zone is filled by creating an expanding cube of crystal unit cells, initiated from an anchor point for the lattice. Each unit cell places the appropriate atoms for the user-specified crystal structure and orientation. The cube expands until the entire zone is filled with the lattice; zones with concave and disconnected volumes may be filled. When the mesh is spatially decomposed into portions for distributed parallel processing, each portion may be filled independently, meaning that the entire molecular system never needs to fit onto a single processor, allowing very large systems to be created. The computational time required to fill a zone with molecules scales linearly with the number of cells in the zone for a fixed number of molecules, and better than linearly with the number of molecules for a fixed number of mesh cells. Our tool, molConfig, has been implemented in the open source C++ code OpenFOAM
Recommended from our members
Computational Fluid Dynamics with Embedded Cut Cells on Graphics Hardware
The advent of general purpose computing on graphics cards has led to significant software speedup in many fields. Designing code for GPUs, however, requires careful consideration of the underlying hardware. This thesis explores the implementation of fluid dynamics simulations featuring embedded cut cells using the CUDA programming platform. We demonstrate efficient generation and handling of geometric surface data in rectilinear computational grids. This is added to a split Euler solver to define piecewise linear cut cells describing solid surfaces in fluid flows. To reduce the memory footprint of embedded boundaries, we present a system of compressed data structures. The software is extended to run on multiple graphics cards and shows good scaling.
Simulating embedded boundaries requires a description of object surfaces. We implement a fast and robust narrow band signed distance field generator for graphics cards based on the Characteristic/Scan Conversion algorithm for stereolithography files. The thesis presents an augmented approach to handle commonly occurring complex configurations and we show that the method is correct for all closed surfaces. We discuss efficient feature construction and work scheduling and demonstrate high-speed distance generation for complex geometries.
At the core of our simulation implementation is a split Euler solver for high-speed flow. We present a one-dimensional method that achieves coalesced memory access and uses shared memory caching to best harness the potential of GPU hardware. Multidimensional simulations use a framework of data transposes to align data with sweep dimensions to maintain optimal memory access. Analysis of the solver shows that compute resources are used efficiently.
The solver is extended to include cut cells describing solid boundaries in the domain. We present a compression and mapping method to reduce the memory footprint of the surface information. The cut cell solver is validated with different flow regimes and we simulate shock wave interaction with complex geometries to demonstrate the stability of the implementation.
We conclude with multi-card parallelisation and analyse existing literature on domain segmentation and GPU communication. We present a system of domain splitting and message passing with overlapping compute and communication streams. A comparison of naiÌve and GPU-aware Open MPI shows the benefits of using CUDA specific library calls. The complete software pipeline demonstrates good scaling for up to thirty-two cards on a GPU cluster
Efficient GPU Offloading with OpenMP for a Hyperbolic Finite Volume Solver on Dynamically Adaptive Meshes
We identify and show how to overcome an OpenMP bottleneck in the administration of GPU memory. It arises for a wave equation solver on dynamically adaptive block-structured Cartesian meshes, which keeps all CPU threads busy and allows all of them to offload sets of patches to the GPU. Our studies show that multithreaded, concurrent, non-deterministic access to the GPU leads to performance breakdowns, since the GPU memory bookkeeping as offered through OpenMPâs map clause, i.e., the allocation and freeing, becomes another runtime challenge besides expensive data transfer and actual computation. We, therefore, propose to retain the memory management responsibility on the host: A caching mechanism acquires memory on the accelerator for all CPU threads, keeps hold of this memory and hands it out to the offloading threads upon demand. We show that this user-managed, CPU-based memory administration helps us to overcome the GPU memory bookkeeping bottleneck and speeds up the time-to-solution of Finite Volume kernels by more than an order of magnitude
Efficient GPU Offloading with OpenMP for a Hyperbolic Finite Volume Solver on Dynamically Adaptive Meshes
We identify and show how to overcome an OpenMP bottleneck in the administration of GPU memory. It arises for a wave equation solver on dynamically adaptive block-structured Cartesian meshes, which keeps all CPU threads busy and allows all of them to offload sets of patches to the GPU. Our studies show that multithreaded, concurrent, non-deterministic access to the GPU leads to performance breakdowns, since the GPU memory bookkeeping as offered through OpenMPâs map clause, i.e., the allocation and freeing, becomes another runtime challenge besides expensive data transfer and actual computation. We, therefore, propose to retain the memory management responsibility on the host: A caching mechanism acquires memory on the accelerator for all CPU threads, keeps hold of this memory and hands it out to the offloading threads upon demand. We show that this user-managed, CPU-based memory administration helps us to overcome the GPU memory bookkeeping bottleneck and speeds up the time-to-solution of Finite Volume kernels by more than an order of magnitude
Study of interpolation methods for high-accuracy computations on overlapping grids
Overset strategy can be an efficient way to keep high-accuracy discretization by decomposing a complex geometry in topologically simple subdomains. Apart from the grid assembly algorithm, the key point of overset technique lies in the interpolation processes which ensure the communications between the overlapping grids. The family of explicit Lagrange and optimized interpolation schemes is studied. The a priori interpolation error is analyzed in the Fourier space, and combined with the error of the chosen discretization to highlight the modification of the numerical error. When high-accuracy algorithms are used an optimization of the interpolation coefficients can enhance the resolvality, which can be useful when high-frequency waves or small turbulent scales need to be supported by a grid. For general curvilinear grids in more than one space dimension, a mapping in a computational space followed by a tensorization of 1-D interpolations is preferred to a direct evaluation of the coefficient in the physical domain. A high-order extension of the isoparametric mapping is accurate and robust since it avoids the inversion of a matrix which may be ill-conditioned. A posteriori error analyses indicate that the interpolation stencil size must be tailored to the accuracy of the discretization scheme. For well discretized wavelengthes, the results show that the choice of a stencil smaller than the stencil of the corresponding finite-difference scheme can be acceptable. Besides the gain of optimization to capture high-frequency phenomena is also underlined. Adding order constraints to the optimization allows an interesting trade-off when a large range of scales is considered. Finally, the ability of the present overset strategy to preserve accuracy is illustrated by the diffraction of an acoustic source by two cylinders, and the generation of acoustic tones in a rotorâstator interaction. Some recommandations are formulated in the closing section
Spectral/hp element methods: recent developments, applications, and perspectives
The spectral/hp element method combines the geometric flexibility of the
classical h-type finite element technique with the desirable numerical
properties of spectral methods, employing high-degree piecewise polynomial
basis functions on coarse finite element-type meshes. The spatial approximation
is based upon orthogonal polynomials, such as Legendre or Chebychev
polynomials, modified to accommodate C0-continuous expansions. Computationally
and theoretically, by increasing the polynomial order p, high-precision
solutions and fast convergence can be obtained and, in particular, under
certain regularity assumptions an exponential reduction in approximation error
between numerical and exact solutions can be achieved. This method has now been
applied in many simulation studies of both fundamental and practical
engineering flows. This paper briefly describes the formulation of the
spectral/hp element method and provides an overview of its application to
computational fluid dynamics. In particular, it focuses on the use the
spectral/hp element method in transitional flows and ocean engineering.
Finally, some of the major challenges to be overcome in order to use the
spectral/hp element method in more complex science and engineering applications
are discussed
Chaste: a test-driven approach to software development for biological modelling
Chaste (âCancer, heart and soft-tissue environmentâ) is a software library and a set of test suites for computational simulations in the domain of biology. Current functionality has arisen from modelling in the fields of cancer, cardiac physiology and soft-tissue mechanics. It is released under the LGPL 2.1 licence.\ud
\ud
Chaste has been developed using agile programming methods. The project began in 2005 when it was reasoned that the modelling of a variety of physiological phenomena required both a generic mathematical modelling framework, and a generic computational/simulation framework. The Chaste project evolved from the Integrative Biology (IB) e-Science Project, an inter-institutional project aimed at developing a suitable IT infrastructure to support physiome-level computational modelling, with a primary focus on cardiac and cancer modelling
- âŠ