631 research outputs found

    Optimisation of patch distribution strategies for AMR applications

    Get PDF
    As core counts increase in the world's most powerful supercomputers, applications are becoming limited not only by computational power, but also by data availability. In the race to exascale, efficient and effective communication policies are key to achieving optimal application performance. Applications using adaptive mesh refinement (AMR) trade off communication for computational load balancing, to enable the focused computation of specific areas of interest. This class of application is particularly susceptible to the communication performance of the underlying architectures, and are inherently difficult to scale efficiently. In this paper we present a study of the effect of patch distribution strategies on the scalability of an AMR code. We demonstrate the significance of patch placement on communication overheads, and by balancing the computation and communication costs of patches, we develop a scheme to optimise performance of a specific, industry-strength, benchmark application

    Dynamic task fusion for a block-structured finite volume solver over a dynamically adaptive mesh with local time stepping

    Get PDF
    Load balancing of generic wave equation solvers over dynamically adaptive meshes with local time stepping is dicult, as the load changes with every time step. Task-based programming promises to mitigate the load balancing problem. We study a Finite Volume code over dynamically adaptive block-structured meshes for two astrophysics simulations, where the patches (blocks) dene tasks. They are classied into urgent and low priority tasks. Urgent tasks are algorithmically latencysensitive. They are processed directly as part of our bulk-synchronous mesh traversals. Non-urgent tasks are held back in an additional task queue on top of the task runtime system. If they lack global side-eects, i.e. do not alter the global solver state, we can generate optimised compute kernels for these tasks. Furthermore, we propose to use the additional queue to merge tasks without side-eects into task assemblies, and to balance out imbalanced bulk synchronous processing phases

    Survey on Additive Manufacturing, Cloud 3D Printing and Services

    Full text link
    Cloud Manufacturing (CM) is the concept of using manufacturing resources in a service oriented way over the Internet. Recent developments in Additive Manufacturing (AM) are making it possible to utilise resources ad-hoc as replacement for traditional manufacturing resources in case of spontaneous problems in the established manufacturing processes. In order to be of use in these scenarios the AM resources must adhere to a strict principle of transparency and service composition in adherence to the Cloud Computing (CC) paradigm. With this review we provide an overview over CM, AM and relevant domains as well as present the historical development of scientific research in these fields, starting from 2002. Part of this work is also a meta-review on the domain to further detail its development and structure

    Self-adaptive isogeometric spatial discretisations of the first and second-order forms of the neutron transport equation with dual-weighted residual error measures and diffusion acceleration

    Get PDF
    As implemented in a new modern-Fortran code, NURBS-based isogeometric analysis (IGA) spatial discretisations and self-adaptive mesh refinement (AMR) algorithms are developed in the application to the first-order and second-order forms of the neutron transport equation (NTE). These AMR algorithms are shown to be computationally efficient and numerically accurate when compared to standard approaches. IGA methods are very competitive and offer certain unique advantages over standard finite element methods (FEM), not least of all because the numerical analysis is performed over an exact representation of the underlying geometry, which is generally available in some computer-aided design (CAD) software description. Furthermore, mesh refinement can be performed within the analysis program at run-time, without the need to revisit any ancillary mesh generator. Two error measures are described for the IGA-based AMR algorithms, both of which can be employed in conjunction with energy-dependent meshes. The first heuristically minimises any local contributions to the global discretisation error, as per some appropriate user-prescribed norm. The second employs duality arguments to minimise important local contributions to the error as measured in some quantity of interest; this is commonly known as a dual-weighted residual (DWR) error measure and it demands the solution to both the forward (primal) and the adjoint (dual) NTE. Finally, convergent and stable diffusion acceleration and generalised minimal residual (GMRes) algorithms, compatible with the aforementioned AMR algorithms, are introduced to accelerate the convergence of the within-group self-scattering sources for scattering-dominated problems for the first and second-order forms of the NTE. A variety of verification benchmark problems are analysed to demonstrate the computational performance and efficiency of these acceleration techniques.Open Acces

    ICASE/LaRC Workshop on Adaptive Grid Methods

    Get PDF
    Solution-adaptive grid techniques are essential to the attainment of practical, user friendly, computational fluid dynamics (CFD) applications. In this three-day workshop, experts gathered together to describe state-of-the-art methods in solution-adaptive grid refinement, analysis, and implementation; to assess the current practice; and to discuss future needs and directions for research. This was accomplished through a series of invited and contributed papers. The workshop focused on a set of two-dimensional test cases designed by the organizers to aid in assessing the current state of development of adaptive grid technology. In addition, a panel of experts from universities, industry, and government research laboratories discussed their views of needs and future directions in this field

    The Peano software---parallel, automaton-based, dynamically adaptive grid traversals

    Get PDF
    We discuss the design decisions, design alternatives, and rationale behind the third generation of Peano, a framework for dynamically adaptive Cartesian meshes derived from spacetrees. Peano ties the mesh traversal to the mesh storage and supports only one element-wise traversal order resulting from space-filling curves. The user is not free to choose a traversal order herself. The traversal can exploit regular grid subregions and shared memory as well as distributed memory systems with almost no modifications to a serial application code. We formalize the software design by means of two interacting automata—one automaton for the multiscale grid traversal and one for the application-specific algorithmic steps. This yields a callback-based programming paradigm. We further sketch the supported application types and the two data storage schemes realized before we detail high-performance computing aspects and lessons learned. Special emphasis is put on observations regarding the used programming idioms and algorithmic concepts. This transforms our report from a “one way to implement things” code description into a generic discussion and summary of some alternatives, rationale, and design decisions to be made for any tree-based adaptive mesh refinement software

    The cosmological simulation code GADGET-2

    Full text link
    We discuss the cosmological simulation code GADGET-2, a new massively parallel TreeSPH code, capable of following a collisionless fluid with the N-body method, and an ideal gas by means of smoothed particle hydrodynamics (SPH). Our implementation of SPH manifestly conserves energy and entropy in regions free of dissipation, while allowing for fully adaptive smoothing lengths. Gravitational forces are computed with a hierarchical multipole expansion, which can optionally be applied in the form of a TreePM algorithm, where only short-range forces are computed with the `tree'-method while long-range forces are determined with Fourier techniques. Time integration is based on a quasi-symplectic scheme where long-range and short-range forces can be integrated with different timesteps. Individual and adaptive short-range timesteps may also be employed. The domain decomposition used in the parallelisation algorithm is based on a space-filling curve, resulting in high flexibility and tree force errors that do not depend on the way the domains are cut. The code is efficient in terms of memory consumption and required communication bandwidth. It has been used to compute the first cosmological N-body simulation with more than 10^10 dark matter particles, reaching a homogeneous spatial dynamic range of 10^5 per dimension in a 3D box. It has also been used to carry out very large cosmological SPH simulations that account for radiative cooling and star formation, reaching total particle numbers of more than 250 million. We present the algorithms used by the code and discuss their accuracy and performance using a number of test problems. GADGET-2 is publicly released to the research community.Comment: submitted to MNRAS, 31 pages, 20 figures (reduced resolution), code available at http://www.mpa-garching.mpg.de/gadge
    corecore