8 research outputs found

    Performance evaluation of a two-dimensional lattice Boltzmann solver using CUDA and PGAS UPC based parallelisation

    Get PDF
    The Unified Parallel C (UPC) language from the Partitioned Global Address Space (PGAS) family unifies the advantages of shared and local memory spaces and offers a relatively straightforward code parallelisation with the Central Processing Unit (CPU). In contrast, the Computer Unified Device Architecture (CUDA) development kit gives a tool to make use of the Graphics Processing Unit (GPU). We provide a detailed comparison between these novel techniques through the parallelisation of a two-dimensional lattice Boltzmann method based fluid flow solver. Our comparison between the CUDA and UPC parallelisation takes into account the required conceptual effort, the performance gain, and the limitations of the approaches from the application oriented developers’ point of view. We demonstrated that UPC led to competitive efficiency with the local memory implementation. However, the performance of the shared memory code fell behind our expectations, and we concluded that the investigated UPC compilers could not efficiently treat the shared memory space. The CUDA implementation proved to be more complex compared to the UPC approach mainly because of the complicated memory structure of the graphics card which also makes GPUs suitable for the parallelisation of the lattice Boltzmann method

    Validation and verification of a 2D lattice Boltzmann solver for incompressible fluid flow

    Get PDF
    The lattice Boltzmann method (LBM) is becoming increasingly popular in the fluid mechanics society because it provides a relatively easy implementation for an incompressible fluid flow solver. Furthermore the particle based LBM can be applied in microscale flows where the continuum based Navier-Stokes solvers fail. Here we present the validation and verification of a two-dimensional in-house lattice Boltzmann solver with two different collision models, namely the BGKW and the MRT models [1]. Five different cases were studied, namely: (i) a channel flow was investigated, the results were compared to the analytical solution, and the convergence properties of the collision models were determined; (ii) the lid-driven cavity problem was examined [2] and the flow features and the velocity profiles were compared to existing simulation results at three different Reynolds number; (iii) the flow in a backward-facing step geometry was validated against experimental data [3]; (iv) the flow in a sudden expansion geometry was compared to experimental data at two different Reynolds numbers [4]; and finally (v) the flow around a cylinder was studied at higher Reynolds number in the turbulent regime. The first four test cases showed that both the BGKW and the MRT models were capable of giving qualitatively and quantitatively good results for these laminar flow cases. The simulations around a cylinder highlighted that the BGKW model becomes unstable for high Reynolds numbers but the MRT model still remains suitable to capture the turbulent von Karman vortex street. The in-house LBM code has been developed in C and has also been parallelised for GPU architectures using CUDA [5] and for CPU architectures using the Partitioned Global Address Space model with UPC [6

    Validation and verification of a 2D lattice Boltzmann solver for incompressible fluid flow

    Get PDF
    The lattice Boltzmann method (LBM) is becoming increasingly popular in the fluid mechanics society because it provides a relatively easy implementation for an incompressible fluid flow solver. Furthermore the particle based LBM can be applied in microscale flows where the continuum based Navier-Stokes solvers fail. Here we present the validation and verification of a two-dimensional in-house lattice Boltzmann solver with two different collision models, namely the BGKW and the MRT models [1]. Five different cases were studied, namely: (i) a channel flow was investigated, the results were compared to the analytical solution, and the convergence properties of the collision models were determined; (ii) the lid-driven cavity problem was examined [2] and the flow features and the velocity profiles were compared to existing simulation results at three different Reynolds number; (iii) the flow in a backward-facing step geometry was validated against experimental data [3]; (iv) the flow in a sudden expansion geometry was compared to experimental data at two different Reynolds numbers [4]; and finally (v) the flow around a cylinder was studied at higher Reynolds number in the turbulent regime. The first four test cases showed that both the BGKW and the MRT models were capable of giving qualitatively and quantitatively good results for these laminar flow cases. The simulations around a cylinder highlighted that the BGKW model becomes unstable for high Reynolds numbers but the MRT model still remains suitable to capture the turbulent von Karman vortex street. The in-house LBM code has been developed in C and has also been parallelised for GPU architectures using CUDA [5] and for CPU architectures using the Partitioned Global Address Space model with UPC [6

    Parallel computing 2011, ParCo 2011: book of abstracts

    Get PDF
    This book contains the abstracts of the presentations at the conference Parallel Computing 2011, 30 August - 2 September 2011, Ghent, Belgiu

    Model-centric task debugging at scale

    Get PDF
    Chapter 1, Introduction, presents state of the art debugging techniques in high-performance computing. The lack of information out of the programming model, these traditional debugging tools suffer, motivated the model-centric debugging approach. Chapter 2, Technical Background: Parallel Programming Models & Tools, exemplifies the programming models used in the scope of my work. The differences between those models are illustrated, and for the most popular programming models in HPC, examples are attached in this chapter. The chapter also describes Temanejo, the toolchain's front-end, which supports the application developer during his actions. In the following chapter (Chapter 4), Design: Events & Requests in Ayudame, the theory of task" and dependency" representation is stated. The chapter includes the design of different information types, which are later on used for the communication between a programming model and the model-centric debugging approach. In chapter 5, Design: Communication Back-end Ayudame, the design of the back-end tool infrastructure is described in detail. This also includes the problems occurring during the design process and their specific solutions. The concept of a multi-process environment and the usage of different programming models at the same time is also part of this chapter. The following chapter (Chapter 6), Instrumentation of Runtime Systems, briefly describes the information exchange between a programming model and the model-centric debugging approach. The different ways of monitoring and controlling an application through its programming model are illustrated. In chapter 7, Case Study: Performance Debugging, the model-centric debugging approach is used for optimising an application. All necessary optimisation steps are described in detail, with the help of mock-ups. Additionally, a description of the different optimised versions is included in this chapter. The evaluation, done on different hardware architectures, is presented and discussed. This includes not only the behaviour of the versions on different platforms but also architecture specific issues

    The readying of applications for heterogeneous computing

    Get PDF
    High performance computing is approaching a potentially significant change in architectural design. With pressures on the cost and sheer amount of power, additional architectural features are emerging which require a re-think to the programming models deployed over the last two decades. Today's emerging high performance computing (HPC) systems are maximising performance per unit of power consumed resulting in the constituent parts of the system to be made up of a range of different specialised building blocks, each with their own purpose. This heterogeneity is not just limited to the hardware components but also in the mechanisms that exploit the hardware components. These multiple levels of parallelism, instruction sets and memory hierarchies, result in truly heterogeneous computing in all aspects of the global system. These emerging architectural solutions will require the software to exploit tremendous amounts of on-node parallelism and indeed programming models to address this are emerging. In theory, the application developer can design new software using these models to exploit emerging low power architectures. However, in practice, real industrial scale applications last the lifetimes of many architectural generations and therefore require a migration path to these next generation supercomputing platforms. Identifying that migration path is non-trivial: With applications spanning many decades, consisting of many millions of lines of code and multiple scientific algorithms, any changes to the programming model will be extensive and invasive and may turn out to be the incorrect model for the application in question. This makes exploration of these emerging architectures and programming models using the applications themselves problematic. Additionally, the source code of many industrial applications is not available either due to commercial or security sensitivity constraints. This thesis highlights this problem by assessing current and emerging hard- ware with an industrial strength code, and demonstrating those issues described. In turn it looks at the methodology of using proxy applications in place of real industry applications, to assess their suitability on the next generation of low power HPC offerings. It shows there are significant benefits to be realised in using proxy applications, in that fundamental issues inhibiting exploration of a particular architecture are easier to identify and hence address. Evaluations of the maturity and performance portability are explored for a number of alternative programming methodologies, on a number of architectures and highlighting the broader adoption of these proxy applications, both within the authors own organisation, and across the industry as a whole

    WTEC Panel Report on International Assessment of Research and Development in Simulation-Based Engineering and Science

    Full text link
    corecore