4 research outputs found

    Runtime support for irregular computation in MPI-based applications

    Get PDF
    In recent years there are increasing number of applications that have been using irregular computation models in various domains, such as computational chemistry, bioinformatics, nuclear reactor simulation and social network analysis. Due to the irregular and data-dependent communication patterns and sparse data structures involved in those applications, the traditional parallel programming model and runtime need to be carefully designed and implemented in order to accommodate the performance and scalability requirements of those irregular applications on large-scale systems. The Message Passing Interface (MPI) is the industry standard communication library for high performance computing. However, whether MPI can serve as a suitable programming model / runtime for irregular applications or not is one of the most debated aspects in the community. The goal of this thesis is to investigate the suitability of MPI to irregular applications. This thesis consists of two subtopics. The first subtopic focuses on improving MPI runtime to support the irregular applications from perspective of scalability and performance. The first three parts in this subtopic focus on MPI one-sided communication. In the first part, we present a thorough survey of current MPI one-sided implementations and illustrate scalability limitations in those implementations. In the second part, we propose a new design and implementation of MPI one-sided communication, called ScalaRMA, to effectively address those scalability limitations. The third part in this subtopic focuses on various issuing strategies in MPI one-sided communication. We propose an adaptive issuing strategy which can adaptively choose between delayed issuing strategy and eager issuing strategy in MPI runtime to achieve high performance based on current communication volume in MPI-based application. The last part in this subtopic is to tackle the scalability limitations in the virtual connection (VC) objects in MPI implementation. We propose a scalable design to reduce the memory consumption of VC objects in MPI runtime. The second subtopic of this thesis focuses on improving MPI programming model to better support the irregular applications. Traditional two-sided data movement model in MPI standard designed for scientific computation provides a paradigm for user to specify how to move the data between processes, however, it does not provide interface to flexibly manage the computation, which means user needs to explicitly manage where the computation should be performed. This model is not well suited for irregular applications which involve irregular and data-dependent communication pattern. In this work, we combine Active Messages (AM), an alternative programming paradigm which is more suitable for irregular computations, with traditional MPI data movement model, and propose a generalized MPI-interoperable Active Messages framework (MPI-AM). The framework allows MPI-based applications to incrementally use AMs only when necessary, avoiding rewriting the entire MPI-based application. Such framework integrates data movement and computation together in the programming model and MPI can coordinate the computation and communication in a much more flexible manner. In this subtopic, we propose several strategies including message streaming, buffer management and asynchronous processing, in order to efficiently handle AMs inside MPI. We also propose subtle correctness semantics of MPI-AM to define how AMs can work correctly with other MPI messages in the system, from perspectives of memory consistency, concurrency, ordering and atomicity

    A novel approach to reduce the computation time for CFD : hybrid LES-RANS modelling on parallel computers

    Get PDF
    Large Eddy Simulation is a method of obtaining high accuracy computational results for modelling fluid flow. Unfortunately it is computationally expensive limiting it to users of large parallel machines. However, it may be that the use of LES leads to an over-resolution of the problem because the bulk of the computational domain could be adequately modelled using the Reynolds averaged approach. A study has been undertaken to assess the feasibility, both in accuracy and computational efficiency of using a parallel computer to solve both LES and RANS type turbulence models on the same domain for the problem flow over a circular cylinder at Reynolds number 3 900 To do this the domain has been created and then divided into two sub-domains, one for the LES model and one for the kappa-epsilon turbulence model. The hybrid model has been developed specifically for a parallel computing environment and the user is able to allocate modelling techniques to processors in a way which enables expansion of the model to any number of processors. Computational experimentation has shown that the combination of the Smagorinsky model can be used to capture the vortex shedding from the cylinder and the information successfully passed to the kappa - epsilon model for the dissipation of the vortices further downstream. The results have been compared to high accuracy LES results and with both kappa - epsilon and Smagorinsky LES computations on the same domain. The hybrid models developed compare well with the Smagorinsky model capturing the vortex shedding with the correct periodicity. Suggestions for future work have been made to develop this idea further, and to investigate the possibility of using the technology for the modelling of mixing and fast chemical reactions based on the more accurate prediction of the turbulence levels in the LES sub-domain.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    A novel approach to reduce the computation time for CFD : hybrid LES-RANS modelling on parallel computers

    Get PDF
    Large Eddy Simulation is a method of obtaining high accuracy computational results for modelling fluid flow. Unfortunately it is computationally expensive limiting it to users of large parallel machines. However, it may be that the use of LES leads to an over-resolution of the problem because the bulk of the computational domain could be adequately modelled using the Reynolds averaged approach. A study has been undertaken to assess the feasibility, both in accuracy and computational efficiency of using a parallel computer to solve both LES and RANS type turbulence models on the same domain for the problem flow over a circular cylinder at Reynolds number 3 900 To do this the domain has been created and then divided into two sub-domains, one for the LES model and one for the kappa-epsilon turbulence model. The hybrid model has been developed specifically for a parallel computing environment and the user is able to allocate modelling techniques to processors in a way which enables expansion of the model to any number of processors. Computational experimentation has shown that the combination of the Smagorinsky model can be used to capture the vortex shedding from the cylinder and the information successfully passed to the kappa - epsilon model for the dissipation of the vortices further downstream. The results have been compared to high accuracy LES results and with both kappa - epsilon and Smagorinsky LES computations on the same domain. The hybrid models developed compare well with the Smagorinsky model capturing the vortex shedding with the correct periodicity. Suggestions for future work have been made to develop this idea further, and to investigate the possibility of using the technology for the modelling of mixing and fast chemical reactions based on the more accurate prediction of the turbulence levels in the LES sub-domain.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
    corecore