194 research outputs found

    Dynamic task fusion for a block-structured finite volume solver over a dynamically adaptive mesh with local time stepping

    Get PDF
    Load balancing of generic wave equation solvers over dynamically adaptive meshes with local time stepping is dicult, as the load changes with every time step. Task-based programming promises to mitigate the load balancing problem. We study a Finite Volume code over dynamically adaptive block-structured meshes for two astrophysics simulations, where the patches (blocks) dene tasks. They are classied into urgent and low priority tasks. Urgent tasks are algorithmically latencysensitive. They are processed directly as part of our bulk-synchronous mesh traversals. Non-urgent tasks are held back in an additional task queue on top of the task runtime system. If they lack global side-eects, i.e. do not alter the global solver state, we can generate optimised compute kernels for these tasks. Furthermore, we propose to use the additional queue to merge tasks without side-eects into task assemblies, and to balance out imbalanced bulk synchronous processing phases

    SFC-based Communication Metadata Encoding for Adaptive Mesh

    Get PDF
    This volume of the series “Advances in Parallel Computing” contains the proceedings of the International Conference on Parallel Programming – ParCo 2013 – held from 10 to 13 September 2013 in Garching, Germany. The conference was hosted by the Technische UniversitĂ€t MĂŒnchen (Department of Informatics) and the Leibniz Supercomputing Centre.The present paper studies two adaptive mesh refinement (AMR) codes whose grids rely on recursive subdivison in combination with space-filling curves (SFCs). A non-overlapping domain decomposition based upon these SFCs yields several well-known advantageous properties with respect to communication demands, balancing, and partition connectivity. However, the administration of the meta data, i.e. to track which partitions exchange data in which cardinality, is nontrivial due to the SFC’s fractal meandering and the dynamic adaptivity. We introduce an analysed tree grammar for the meta data that restricts it without loss of information hierarchically along the subdivision tree and applies run length encoding. Hence, its meta data memory footprint is very small, and it can be computed and maintained on-the-fly even for permanently changing grids. It facilitates a forkjoin pattern for shared data parallelism. And it facilitates replicated data parallelism tackling latency and bandwidth constraints respectively due to communication in the background and reduces memory requirements by avoiding adjacency information stored per element. We demonstrate this at hands of shared and distributed parallelized domain decompositions.This work was supported by the German Research Foundation (DFG) as part of the Transregional Collaborative Research Centre “Invasive Computing (SFB/TR 89). It is partially based on work supported by Award No. UK-c0020, made by the King Abdullah University of Science and Technology (KAUST)

    Generation of initial molecular dynamics configurations in arbitrary geometries and in parallel

    Get PDF
    A computational pre-processing tool for generating initial configurations of molecules for molecular dynamics simulations in geometries described by a mesh of unstructured arbitrary polyhedra is described. The mesh is divided into separate zones and each can be filled with a single crystal lattice of atoms. Each zone is filled by creating an expanding cube of crystal unit cells, initiated from an anchor point for the lattice. Each unit cell places the appropriate atoms for the user-specified crystal structure and orientation. The cube expands until the entire zone is filled with the lattice; zones with concave and disconnected volumes may be filled. When the mesh is spatially decomposed into portions for distributed parallel processing, each portion may be filled independently, meaning that the entire molecular system never needs to fit onto a single processor, allowing very large systems to be created. The computational time required to fill a zone with molecules scales linearly with the number of cells in the zone for a fixed number of molecules, and better than linearly with the number of molecules for a fixed number of mesh cells. Our tool, molConfig, has been implemented in the open source C++ code OpenFOAM

    Efficient GPU Offloading with OpenMP for a Hyperbolic Finite Volume Solver on Dynamically Adaptive Meshes

    Get PDF
    We identify and show how to overcome an OpenMP bottleneck in the administration of GPU memory. It arises for a wave equation solver on dynamically adaptive block-structured Cartesian meshes, which keeps all CPU threads busy and allows all of them to offload sets of patches to the GPU. Our studies show that multithreaded, concurrent, non-deterministic access to the GPU leads to performance breakdowns, since the GPU memory bookkeeping as offered through OpenMP’s map clause, i.e., the allocation and freeing, becomes another runtime challenge besides expensive data transfer and actual computation. We, therefore, propose to retain the memory management responsibility on the host: A caching mechanism acquires memory on the accelerator for all CPU threads, keeps hold of this memory and hands it out to the offloading threads upon demand. We show that this user-managed, CPU-based memory administration helps us to overcome the GPU memory bookkeeping bottleneck and speeds up the time-to-solution of Finite Volume kernels by more than an order of magnitude

    Efficient GPU Offloading with OpenMP for a Hyperbolic Finite Volume Solver on Dynamically Adaptive Meshes

    Get PDF
    We identify and show how to overcome an OpenMP bottleneck in the administration of GPU memory. It arises for a wave equation solver on dynamically adaptive block-structured Cartesian meshes, which keeps all CPU threads busy and allows all of them to offload sets of patches to the GPU. Our studies show that multithreaded, concurrent, non-deterministic access to the GPU leads to performance breakdowns, since the GPU memory bookkeeping as offered through OpenMP’s map clause, i.e., the allocation and freeing, becomes another runtime challenge besides expensive data transfer and actual computation. We, therefore, propose to retain the memory management responsibility on the host: A caching mechanism acquires memory on the accelerator for all CPU threads, keeps hold of this memory and hands it out to the offloading threads upon demand. We show that this user-managed, CPU-based memory administration helps us to overcome the GPU memory bookkeeping bottleneck and speeds up the time-to-solution of Finite Volume kernels by more than an order of magnitude

    Study of interpolation methods for high-accuracy computations on overlapping grids

    Get PDF
    Overset strategy can be an efficient way to keep high-accuracy discretization by decomposing a complex geometry in topologically simple subdomains. Apart from the grid assembly algorithm, the key point of overset technique lies in the interpolation processes which ensure the communications between the overlapping grids. The family of explicit Lagrange and optimized interpolation schemes is studied. The a priori interpolation error is analyzed in the Fourier space, and combined with the error of the chosen discretization to highlight the modification of the numerical error. When high-accuracy algorithms are used an optimization of the interpolation coefficients can enhance the resolvality, which can be useful when high-frequency waves or small turbulent scales need to be supported by a grid. For general curvilinear grids in more than one space dimension, a mapping in a computational space followed by a tensorization of 1-D interpolations is preferred to a direct evaluation of the coefficient in the physical domain. A high-order extension of the isoparametric mapping is accurate and robust since it avoids the inversion of a matrix which may be ill-conditioned. A posteriori error analyses indicate that the interpolation stencil size must be tailored to the accuracy of the discretization scheme. For well discretized wavelengthes, the results show that the choice of a stencil smaller than the stencil of the corresponding finite-difference scheme can be acceptable. Besides the gain of optimization to capture high-frequency phenomena is also underlined. Adding order constraints to the optimization allows an interesting trade-off when a large range of scales is considered. Finally, the ability of the present overset strategy to preserve accuracy is illustrated by the diffraction of an acoustic source by two cylinders, and the generation of acoustic tones in a rotor–stator interaction. Some recommandations are formulated in the closing section

    Spectral/hp element methods: recent developments, applications, and perspectives

    Get PDF
    The spectral/hp element method combines the geometric flexibility of the classical h-type finite element technique with the desirable numerical properties of spectral methods, employing high-degree piecewise polynomial basis functions on coarse finite element-type meshes. The spatial approximation is based upon orthogonal polynomials, such as Legendre or Chebychev polynomials, modified to accommodate C0-continuous expansions. Computationally and theoretically, by increasing the polynomial order p, high-precision solutions and fast convergence can be obtained and, in particular, under certain regularity assumptions an exponential reduction in approximation error between numerical and exact solutions can be achieved. This method has now been applied in many simulation studies of both fundamental and practical engineering flows. This paper briefly describes the formulation of the spectral/hp element method and provides an overview of its application to computational fluid dynamics. In particular, it focuses on the use the spectral/hp element method in transitional flows and ocean engineering. Finally, some of the major challenges to be overcome in order to use the spectral/hp element method in more complex science and engineering applications are discussed

    Chaste: a test-driven approach to software development for biological modelling

    Get PDF
    Chaste (‘Cancer, heart and soft-tissue environment’) is a software library and a set of test suites for computational simulations in the domain of biology. Current functionality has arisen from modelling in the fields of cancer, cardiac physiology and soft-tissue mechanics. It is released under the LGPL 2.1 licence.\ud \ud Chaste has been developed using agile programming methods. The project began in 2005 when it was reasoned that the modelling of a variety of physiological phenomena required both a generic mathematical modelling framework, and a generic computational/simulation framework. The Chaste project evolved from the Integrative Biology (IB) e-Science Project, an inter-institutional project aimed at developing a suitable IT infrastructure to support physiome-level computational modelling, with a primary focus on cardiac and cancer modelling
    • 

    corecore