1,558 research outputs found

    Array languages and the N-body problem

    Get PDF
    This paper is a description of the contributions to the SICSA multicore challenge on many body planetary simulation made by a compiler group at the University of Glasgow. Our group is part of the Computer Vision and Graphics research group and we have for some years been developing array compilers because we think these are a good tool both for expressing graphics algorithms and for exploiting the parallelism that computer vision applications require. We shall describe experiments using two languages on two different platforms and we shall compare the performance of these with reference C implementations running on the same platforms. Finally we shall draw conclusions both about the viability of the array language approach as compared to other approaches used in the challenge and also about the strengths and weaknesses of the two, very different, processor architectures we used

    An analysis of the feasibility and benefits of GPU/multicore acceleration of the Weather Research and Forecasting model

    Get PDF
    There is a growing need for ever more accurate climate and weather simulations to be delivered in shorter timescales, in particular, to guard against severe weather events such as hurricanes and heavy rainfall. Due to climate change, the severity and frequency of such events – and thus the economic impact – are set to rise dramatically. Hardware acceleration using graphics processing units (GPUs) or Field-Programmable Gate Arrays (FPGAs) could potentially result in much reduced run times or higher accuracy simulations. In this paper, we present the results of a study of the Weather Research and Forecasting (WRF) model undertaken in order to assess if GPU and multicore acceleration of this type of numerical weather prediction (NWP) code is both feasible and worthwhile. The focus of this paper is on acceleration of code running on a single compute node through offloading of parts of the code to an accelerator such as a GPU. The governing equations set of the WRF model is based on the compressible, non-hydrostatic atmospheric motion with multi-physics processes. We put this work into context by discussing its more general applicability to multi-physics fluid dynamics codes: in many fluid dynamics codes, the numerical schemes of the advection terms are based on finite differences between neighboring cells, similar to the WRF code. For fluid systems including multi-physics processes, there are many calls to these advection routines. This class of numerical codes will benefit from hardware acceleration. We studied the performance of the original code of the WRF model and proposed a simple model for comparing multicore CPU and GPU performance. Based on the results of extensive profiling of representative WRF runs, we focused on the acceleration of the scalar advection module. We discuss the implementation of this module as a data-parallel kernel in both OpenCL and OpenMP. We show that our data-parallel kernel version of the scalar advection module runs up to seven times faster on the GPU compared with the original code on the CPU. However, as the data transfer cost between GPU and CPU is very high (as shown by our analysis), there is only a small speed-up (two times) for the fully integrated code. We show that it would be possible to offset the data transfer cost through GPU acceleration of a larger portion of the dynamics code. In order to carry out this research, we also developed an extensible software system for integrating OpenCL code into large Fortran code bases such as WRF. This is one of the main contributions of our work. We discuss the system to show how it allows the replacement of the sections of the original codebase with their OpenCL counterparts with minimal changes – literally only a few lines – to the original code. Our final assessment is that, even with the current system architectures, accelerating WRF – and hence also other, similar types of multi-physics fluid dynamics codes – with a factor of up to five times is definitely an achievable goal. Accelerating multi-physics fluid dynamics codes including NWP codes is vital for its application to weather forecasting, environmental pollution warning, and emergency response to the dispersion of hazardous materials. Implementing hardware acceleration capability for fluid dynamics and NWP codes is a prerequisite for up-to-date and future computer architectures

    Software Performance Engineering using Virtual Time Program Execution

    Get PDF
    In this thesis we introduce a novel approach to software performance engineering that is based on the execution of code in virtual time. Virtual time execution models the timing-behaviour of unmodified applications by scaling observed method times or replacing them with results acquired from performance model simulation. This facilitates the investigation of "what-if" performance predictions of applications comprising an arbitrary combination of real code and performance models. The ability to analyse code and models in a single framework enables performance testing throughout the software lifecycle, without the need to to extract performance models from code. This is accomplished by forcing thread scheduling decisions to take into account the hypothetical time-scaling or model-based performance specifications of each method. The virtual time execution of I/O operations or multicore targets is also investigated. We explore these ideas using a Virtual EXecution (VEX) framework, which provides performance predictions for multi-threaded applications. The language-independent VEX core is driven by an instrumentation layer that notifies it of thread state changes and method profiling events; it is then up to VEX to control the progress of application threads in virtual time on top of the operating system scheduler. We also describe a Java Instrumentation Environment (JINE), demonstrating the challenges involved in virtual time execution at the JVM level. We evaluate the VEX/JINE tools by executing client-side Java benchmarks in virtual time and identifying the causes of deviations from observed real times. Our results show that VEX and JINE transparently provide predictions for the response time of unmodified applications with typically good accuracy (within 5-10%) and low simulation overheads (25-50% additional time). We conclude this thesis with a case study that shows how models and code can be integrated, thus illustrating our vision on how virtual time execution can support performance testing throughout the software lifecycle

    Discrete event simulation for the purpose of real-time performance evaluation of distributed hardware-in-the-loop simulators for autonomous driving vehicle validation

    Get PDF
    Hardware-in-the-loop test benches are distributed computer systems including software, hardware and networking devices, which require strict real-time guarantees. To guarantee strict real-time of the simulator the performance needs to be evaluated. To evaluate the timing performance a discrete event simulation model is built up. The input modeling is based on measurements from the real system in a prototype phase. The results of the simulation model are validated with measurements from a prototype of the real system. The workload is increased until the streaming source becomes unstable, by either exceeding a certain limit of bytes or exceeding the number of parallel software processes running on the cores of the central processing unit. To evaluate the performance beyond these limits, the discrete event simulation model needs to be enriched by a scheduler and a hardware model. To provide real-time guarantees an analytical model needs to be built up

    Graphite: A Distributed Parallel Simulator for Multicores

    Get PDF
    This paper introduces the open-source Graphite distributed parallel multicore simulator infrastructure. Graphite is designed from the ground up for exploration of future multicore processors containing dozens, hundreds, or even thousands of cores. It provides high performance for fast design space exploration and software development for future processors. Several techniques are used to achieve this performance including: direct execution, multi-machine distribution, analytical modeling, and lax synchronization. Graphite is capable of accelerating simulations by leveraging several machines. It can distribute simulation of an off-the-shelf threaded application across a cluster of commodity Linux machines with no modification to the source code. It does this by providing a single, shared address space and consistent single-process image across machines. Graphite is designed to be a simulation framework, allowing different component models to be easily replaced to either model different architectures or tradeoff accuracy for performance. We evaluate Graphite from a number of perspectives and demonstrate that it can simulate target architectures containing over 1000 cores on ten 8-core servers. Performance scales well as more machines are added with near linear speedup in many cases. Simulation slowdown is as low as 41x versus native execution for some applications. The Graphite infrastructure and existing models will be released as open-source software to allow the community to simulate their own architectures and extend and improve the framework
    corecore