46 research outputs found

    Towards scalable adaptive mesh refinement on future parallel architectures

    Get PDF
    In the march towards exascale, supercomputer architectures are undergoing a significant change. Limited by power consumption and heat dissipation, future supercomputers are likely to be built around a lower-power many-core model. This shift in supercomputer design will require sweeping code changes in order to take advantage of the highly-parallel architectures. Evolving or rewriting legacy applications to perform well on these machines is a significant challenge. Mini-applications, small computer programs that represent the performance characteristics of some larger application, can be used to investigate new programming models and improve the performance of the legacy application by proxy. These applications, being both easy to modify and representative, are essential for establishing a path to move legacy applications into the exascale era. The focus of the work presented in this thesis is the design, development and employment of a new mini-application, CleverLeaf, for shock hydro- dynamics with block-structured adaptive mesh refinement (AMR). We report on the development of CleverLeaf, and show how the fresh start provided by a mini-application can be used to develop an application that is flexible, accurate, and easy to employ in the investigation of exascale architectures. We also detail the development of the first reported resident parallel block-structured AMR library for Graphics Processing Units (GPUs). Extending the SAMRAI library using the CUDA programming model, we develop datatypes that store data only in GPU memory, as well the necessary operators for moving and interpolating data on an adaptive mesh. We show that executing AMR simulations on a GPU is up to 4.8⇥ faster than a CPU, and demonstrate scalability on over 4,000 nodes using a combination of CUDA and MPI. Finally, we show how mini-applications can be employed to improve the performance of production applications on existing parallel architectures by selecting the optimal application configuration. Using CleverLeaf, we identify the most appropriate configurations on three contemporary supercomputer architectures. Selecting the best parameters for our application can reduce run-time by up to 82% and reduce memory usage by up to 32%

    Resident block-structured adaptive mesh refinement on thousands of graphics processing units

    Get PDF
    Block-structured adaptive mesh refinement (AMR) is a technique that can be used when solving partial differential equations to reduce the number of cells necessary to achieve the required accuracy in areas of interest. These areas (shock fronts, material interfaces, etc.) are recursively covered with finer mesh patches that are grouped into a hierarchy of refinement levels. Despite the potential for large savings in computational requirements and memory usage without a corresponding reduction in accuracy, AMR adds overhead in managing the mesh hierarchy, adding complex communication and data movement requirements to a simulation. In this paper, we describe the design and implementation of a resident GPU-based AMR library, including: the classes used to manage data on a mesh patch, the routines used for transferring data between GPUs on different nodes, and the data-parallel operators developed to coarsen and refine mesh data. We validate the performance and accuracy of our implementation using three test problems and two architectures: an 8 node cluster, and 4,196 nodes of Oak Ridge National Laboratory’s Titan supercomputer. Our GPU-based AMR hydrodynamics code performs up to 4.87x faster than the CPU-based implementation, and is scalable on 4,196 K20x GPUs using a combination of MPI and CUDA

    Parallel block structured adaptive mesh refinement on graphics processing units.

    Get PDF
    Block-structured adaptive mesh refinement is a technique that can be used when solving partial differential equations to reduce the number of zones necessary to achieve the required accuracy in areas of interest. These areas (shock fronts, material interfaces, etc.) are recursively covered with finer mesh patches that are grouped into a hierarchy of refinement levels. Despite the potential for large savings in computational requirements and memory usage without a corresponding reduction in accuracy, AMR adds overhead in managing the mesh hierarchy, adding complex communication and data movement requirements to a simulation. In this paper, we describe the design and implementation of a native GPU-based AMR library, including: the classes used to manage data on a mesh patch, the routines used for transferring data between GPUs on different nodes, and the data-parallel operators developed to coarsen and refine mesh data. We validate the performance and accuracy of our implementation using three test problems and two architectures: an eight-node cluster, and over four thousand nodes of Oak Ridge National Laboratory’s Titan supercomputer. Our GPU-based AMR hydrodynamics code performs up to 4.87x faster than the CPU-based implementation, and has been scaled to over four thousand GPUs using a combination of MPI and CUDA

    Exploiting spatiotemporal locality for fast call stack traversal

    Get PDF
    In the approach to exascale, scalable tools are becoming increasingly necessary to support parallel applications. Evaluating an application’s call stack is a vital technique for a wide variety of profilers and debuggers, and can create a significant performance overhead. In this paper we present a heuristic technique to reduce the overhead of frequent call stack evaluations. We use this technique to estimate the similarity between successive call stacks, removing the need for full call stack traversal and eliminating a significant portion of the performance overhead. We demonstrate this technique applied to a parallel memory tracing toolkit, WMTools, and analyse the performance gains and accuracy

    Towards portable performance for explicit hydrodynamics codes

    Get PDF
    Significantly increasing intra-node parallelism is widely recognised as being a key prerequisite for reaching exascale levels of computational performance. In future exascale systems it is likely that this performance improvement will be realised by increasing the parallelism available in traditional CPU devices and using massively-parallel hardware accelerators. The MPI programming model is starting to reach its scalability limit and is unable to take advantage of hardware accelerators; consequently, HPC centres (such as AWE) will have to decide how to develop their existing applications to best take advantage of future HPC system architectures. This work seeks to evaluate OpenCL as a candidate technology for implementing an alternative hybrid programming model, and whether it is able to deliver improved code portability whilst also maintaining or improving performance. On certain platforms the performance of our OpenCL imple- mentation is within 4% of an optimised native version

    Capitalising on Diversity: Espousal of Māori Values in the Workplace

    Get PDF
    This study investigated the relationship between organisational espousal of cultural group values and organisational commitment and citizenship behaviours. The study focused on Ma–ori employees, and their perceptions of the extent to which their organisation espoused some of the central values of Te Ao Ma–ori (the Ma–ori world), specifically manaakitanga (caring), whakawhanaungatanga (relationships), wairuatanga (spirituality), auahatanga (creativity) and kaitiakitanga (guardianship). Furthermore, the role of identification with the Ma–ori culture was investigated as a potential moderator of the relationship between organisational espousal of each of the Ma–ori values and the outcome variables. The methodology was tested in a sample of 91 Ma–ori employees from Ma–ori-led organisations. The participants completed an anonymous online survey. The data was analysed using moderated hierarchical regression analysis. Organisational espousal of the composite Ma–ori values wairuatanga and whakamana tangata was reciprocated with organisational commitment. Although no significant main effects were found with respect to Ma–ori values and organisational citizenship behaviours the interaction of identification with Ma–ori culture with Ma–ori values influenced this outcome; those with lower identification with Ma–ori culture, and who also perceived that their organisation did not espouse Ma–ori values, reported lower levels of organisational citizenship behaviours. Taken together, the results suggest that organisations benefit in being aware of, and incorporating values of the cultural groups represented in the workforce with their overall practices, as this is manifested in higher commitment and citizenship behaviours among employees
    corecore