Verifying a Systematic Application to Accelerator Roadmap using Shallow Water Wave Equations

Abstract

With the advent of parallel computing, a number of hardware architectures have become available for data parallel applications. Every architecture is unique with respect to characteristics such as floating point operations per second, memory bandwidth and synchronization costs. Data parallel applications possess inherent parallelism that needs to be studied and the hardware that can best exploit this parallelism can be identified and selected for large-scale implementation. The application that I have considered for my thesis is - numerical solution of shallow water wave equations using finite difference method. These equations are a set of partial differential equations that model the propagation of disturbances in water and other incompressible liquids. This application fits in the category of a Synchronous Iterative Algorithm (SIA) and hence, the Synchronous Iterative GPGPU Execution (SIGE) model can be directly applied for performance modeling. In the high performance computing community, Graphical Processing Units (GPUs) and Field Programmable Gate Arrays (FPGAs) have become highly popular architectures. Homogeneous clusters comprising of multiple processors and heterogeneous clusters that have nodes consisting of both CPU and GPU, are the architectures of interest for this thesis. An initial or high level comparison between the two architectures is performed with regards to the chosen application using a technique known as the Initial Application to Accelerator (A2A) mapping which ranks which architecture delivers the best performance with respect to execution time for large scale implementation. The subsequent part of the thesis will focus on a low level abstraction of the application of interest to accurately predict the runtime using the multi-level SIGE performance-modeling suite. Through this abstraction, performance modeling of the computation and communication portion of the application is undertaken. The behavior of the computation and communication portions is captured through several instrumented iterations of the application and regression analysis is performed on the execution times. The predicted run time is the sum of the computation and communication run time predictions and is validated by executing the application at higher data sizes. The thesis concludes with the pros and cons of applying the A2A fitness model and the low level abstraction for run time prediction to the chosen application. A critique of the SIGE model is presented and a Strength, Weakness, Opportunities (SWO) analysis is presented

    Similar works