567 research outputs found

    Improved Multi-GPU parallelization of a Lagrangian Transport Model

    Full text link
    This report highlights our work on improving GPU parallelization by supporting compute nodes with multiple GPUs. However, since the default support for multi-GPUs in OpenACC is limited[6], the current implementation allows each MPI process to access only a single GPU. Thus, the only way to take full advantage of multi-GPU nodes in the current version is to launch multiple processes, which increases resource contention. We investigated the benefits of having only one process offload to all available GPU devices.Comment: Technical Repor

    The ESCAPE project : Energy-efficient Scalable Algorithms for Weather Prediction at Exascale

    Get PDF
    In the simulation of complex multi-scale flows arising in weather and climate modelling, one of the biggest challenges is to satisfy strict service requirements in terms of time to solution and to satisfy budgetary constraints in terms of energy to solution, without compromising the accuracy and stability of the application. These simulations require algorithms that minimise the energy footprint along with the time required to produce a solution, maintain the physically required level of accuracy, are numerically stable, and are resilient in case of hardware failure. The European Centre for Medium-Range Weather Forecasts (ECMWF) led the ESCAPE (Energy-efficient Scalable Algorithms for Weather Prediction at Exascale) project, funded by Horizon 2020 (H2020) under the FET-HPC (Future and Emerging Technologies in High Performance Computing) initiative. The goal of ESCAPE was to develop a sustainable strategy to evolve weather and climate prediction models to next-generation computing technologies. The project partners incorporate the expertise of leading European regional forecasting consortia, university research, experienced high-performance computing centres, and hardware vendors. This paper presents an overview of the ESCAPE strategy: (i) identify domain-specific key algorithmic motifs in weather prediction and climate models (which we term Weather & Climate Dwarfs), (ii) categorise them in terms of computational and communication patterns while (iii) adapting them to different hardware architectures with alternative programming models, (iv) analyse the challenges in optimising, and (v) find alternative algorithms for the same scheme. The participating weather prediction models are the following: IFS (Integrated Forecasting System); ALARO, a combination of AROME (Application de la Recherche a l'Operationnel a Meso-Echelle) and ALADIN (Aire Limitee Adaptation Dynamique Developpement International); and COSMO-EULAG, a combination of COSMO (Consortium for Small-scale Modeling) and EULAG (Eulerian and semi-Lagrangian fluid solver). For many of the weather and climate dwarfs ESCAPE provides prototype implementations on different hardware architectures (mainly Intel Skylake CPUs, NVIDIA GPUs, Intel Xeon Phi, Optalysys optical processor) with different programming models. The spectral transform dwarf represents a detailed example of the co-design cycle of an ESCAPE dwarf. The dwarf concept has proven to be extremely useful for the rapid prototyping of alternative algorithms and their interaction with hardware; e.g. the use of a domain-specific language (DSL). Manual adaptations have led to substantial accelerations of key algorithms in numerical weather prediction (NWP) but are not a general recipe for the performance portability of complex NWP models. Existing DSLs are found to require further evolution but are promising tools for achieving the latter. Measurements of energy and time to solution suggest that a future focus needs to be on exploiting the simultaneous use of all available resources in hybrid CPU-GPU arrangements

    Electromagnetic Radiation

    Get PDF
    The application of electromagnetic radiation in modern life is one of the most developing technologies. In this timely book, the authors comprehensively treat two integrated aspects of electromagnetic radiation, theory and application. It covers a wide scope of practical topics, including medical treatment, telecommunication systems, and radiation effects. The book sections have clear presentation, some state of the art examples, which makes this book an indispensable reference book for electromagnetic radiation applications

    Improving Scientist Productivity, Architecture Portability, and Performance in ParFlow

    Get PDF
    Legacy scientific applications represent significant investments by universities, engineers, and researchers and contain valuable implementations of key scientific computations. Over time hardware architectures have changed. Adapting existing code to new architectures is time consuming, expensive, and increases code complexity. The increase in complexity negatively affects the scientific impact of the applications. There is an immediate need to reduce complexity. We propose using abstractions to manage and reduce code complexity, improving scientific impact of applications. This thesis presents a set of abstractions targeting boundary conditions in iterative solvers. Many scientific applications represent physical phenomena as a set of partial differential equations (PDEs). PDEs are structured around steady state and boundary condition equations, starting from initial conditions. The proposed abstractions separate architecture specific implementation details from the primary computation. We use ParFlow to demonstrate the effectiveness of the abstractions. ParFlow is a hydrologic and geoscience application that simulates surface and subsurface water flow. The abstractions have enabled ParFlow developers to successfully add new boundary conditions for the first time in 15 years, and have enabled an experimental OpenMP version of ParFlow that is transparent to computational scientists. This is achieved without requiring expensive rewrites of key computations or major codebase changes; improving developer productivity, enabling hardware portability, and allowing transparent performance optimizations

    Algorithms for Advection on Hybrid Parallel Computers

    Get PDF
    Current climate models have a limited ability to increase spatial resolution because numerical stability requires the time step to decrease. I describe initial experiments with two independent but complementary strategies for attacking this time barrier . First I describe computational experiments exploring the performance improvements from overlapping computation and communication on hybrid parallel computers. My test case is explicit time integration of linear advection with constant uniform velocity in a three-dimensional periodic domain. I present results for Fortran implementations using various combinations of MPI, OpenMP, and CUDA, with and without overlap of computation and communication. Second I describe a semi-Lagrangian method for tracer transport that is stable for arbitrary Courant numbers, along with a parallel implementation discretized on the cubed sphere. It shows optimal accuracy at Courant numbers of 10-20, more than an order of magnitude higher than explicit methods. Finally I describe the development and stability analyses of the time integrators and advection methods I used for my experiments. I develop explicit single-step methods with stability up to Courant numbers of one in each dimension, hybrid explicit-implict methods with stability for arbitrary Courant numbers, and interpolation operators that enable the arbitrary stability of semi-Lagrangian methods

    Improvements in the Scalability of the NASA Goddard Multiscale Modeling Framework for Hurricane Climate Studies

    Get PDF
    Improving our understanding of hurricane inter-annual variability and the impact of climate change (e.g., doubling CO2 and/or global warming) on hurricanes brings both scientific and computational challenges to researchers. As hurricane dynamics involves multiscale interactions among synoptic-scale flows, mesoscale vortices, and small-scale cloud motions, an ideal numerical model suitable for hurricane studies should demonstrate its capabilities in simulating these interactions. The newly-developed multiscale modeling framework (MMF, Tao et al., 2007) and the substantial computing power by the NASA Columbia supercomputer show promise in pursuing the related studies, as the MMF inherits the advantages of two NASA state-of-the-art modeling components: the GEOS4/fvGCM and 2D GCEs. This article focuses on the computational issues and proposes a revised methodology to improve the MMF's performance and scalability. It is shown that this prototype implementation enables 12-fold performance improvements with 364 CPUs, thereby making it more feasible to study hurricane climate

    Study of Parallel Programming Models on Computer Clusters with Accelerators

    Get PDF
    In order to reach exascale computing capability, accelerators have become a crucial part in developing supercomputers. This work examines the potential of two latest acceleration technologies, Intel Many Integrated Core (MIC) Architecture and Graphics Processing Units (GPUs). This thesis applies three benchmarks under 3 different configurations, MPI+CPU, MPI+GPU, and MPI+MIC. The benchmarks include intensely communicating application, loosely communicating application, and embarrassingly parallel application. This thesis also carries out a detailed study on the scalability and performance of MIC processors under two programming models, i.e., offload model and native model, on the Beacon computer cluster. According to different benchmarks, the results demonstrate different performance and scalability between GPU and MIC. (1) For embarrassingly parallel case, GPU-based parallel implementation on Keeneland computer cluster has a better performance than other accelerators. However, MIC-based parallel implementation shows a better scalability than the implementation on GPU. The performances of native model and offload model on MIC are very close. (2) For loosely communicating case, the performances on GPU and MIC are very close. The MIC-based parallel implementation still demonstrates a strong scalability when using 120 MIC processors in computation. (3) For the intensely communicating case, the MPI implementations on CPUs and GPUs both have a strong scalability. GPUs can consistently outperform other accelerators. However, the MIC-based implementation cannot scale quite well. The performance of different models on MIC is different from the performance of embarrassingly parallel case. Native model can consistently outperform the offload model by ~10 times. And there is not much performance gain when allocating more MIC processors. The increase of communication cost will offset the performance gain from the reduced workload on each MIC core. This work also tests the performance capabilities and scalability by changing the number of threads on each MIC card form 10 to 60. When using different number of threads for the intensely communicating case, it shows different capabilities of the MIC based offload model. The scalability can hold when the number of threads increases from 10 to 30, and the computation time reduces with a smaller rate from 30 threads to 50 threads. When using 60 threads, the computation time will increase. The reason is that the communication overhead will offset the performance gain when 60 threads are deployed on a single MIC card

    An analysis of the feasibility and benefits of GPU/multicore acceleration of the Weather Research and Forecasting model

    Get PDF
    There is a growing need for ever more accurate climate and weather simulations to be delivered in shorter timescales, in particular, to guard against severe weather events such as hurricanes and heavy rainfall. Due to climate change, the severity and frequency of such events – and thus the economic impact – are set to rise dramatically. Hardware acceleration using graphics processing units (GPUs) or Field-Programmable Gate Arrays (FPGAs) could potentially result in much reduced run times or higher accuracy simulations. In this paper, we present the results of a study of the Weather Research and Forecasting (WRF) model undertaken in order to assess if GPU and multicore acceleration of this type of numerical weather prediction (NWP) code is both feasible and worthwhile. The focus of this paper is on acceleration of code running on a single compute node through offloading of parts of the code to an accelerator such as a GPU. The governing equations set of the WRF model is based on the compressible, non-hydrostatic atmospheric motion with multi-physics processes. We put this work into context by discussing its more general applicability to multi-physics fluid dynamics codes: in many fluid dynamics codes, the numerical schemes of the advection terms are based on finite differences between neighboring cells, similar to the WRF code. For fluid systems including multi-physics processes, there are many calls to these advection routines. This class of numerical codes will benefit from hardware acceleration. We studied the performance of the original code of the WRF model and proposed a simple model for comparing multicore CPU and GPU performance. Based on the results of extensive profiling of representative WRF runs, we focused on the acceleration of the scalar advection module. We discuss the implementation of this module as a data-parallel kernel in both OpenCL and OpenMP. We show that our data-parallel kernel version of the scalar advection module runs up to seven times faster on the GPU compared with the original code on the CPU. However, as the data transfer cost between GPU and CPU is very high (as shown by our analysis), there is only a small speed-up (two times) for the fully integrated code. We show that it would be possible to offset the data transfer cost through GPU acceleration of a larger portion of the dynamics code. In order to carry out this research, we also developed an extensible software system for integrating OpenCL code into large Fortran code bases such as WRF. This is one of the main contributions of our work. We discuss the system to show how it allows the replacement of the sections of the original codebase with their OpenCL counterparts with minimal changes – literally only a few lines – to the original code. Our final assessment is that, even with the current system architectures, accelerating WRF – and hence also other, similar types of multi-physics fluid dynamics codes – with a factor of up to five times is definitely an achievable goal. Accelerating multi-physics fluid dynamics codes including NWP codes is vital for its application to weather forecasting, environmental pollution warning, and emergency response to the dispersion of hazardous materials. Implementing hardware acceleration capability for fluid dynamics and NWP codes is a prerequisite for up-to-date and future computer architectures
    corecore