6 research outputs found

    A Graphical User Interface for Composing Parallel Computation in the STAPL Skeleton Library

    Get PDF
    Parallel programming is a quickly growing field in computer science. It involves splitting the computation among multiple processors to decrease the run time of programs. The computations assigned to a processor can depend on the results of another computation. This dependence in- troduces a partial ordering between tasks that requires coordination of the execution of the tasks assigned to each processor. OpenMP and MPI are current heavily utilized approaches and require the use of low level primitives to express very simple scientific applications. Newer environments, such as STAPL, Charm++, and Chapel, among others, raise the level of abstraction, but the challenge of specifying the flow of data between computations remains. However, graphical user interfaces (GUIs) can simplify this task. The purpose of this project is to create a GUI that al- lows a user to specify a parallel application written in STAPL by composing high-level components and by defining the flow of data between them. The idea is that the user creates the layout of the code using shapes and lines, which produce the composition on an underlying layer, eliminating the need to write complex composition specifications directly in the code

    A Graphical User Interface for Composing Parallel Computation in the STAPL Skeleton Library

    Get PDF
    Parallel programming is a quickly growing field in computer science. It involves splitting the computation among multiple processors to decrease the run time of programs. The computations assigned to a processor can depend on the results of another computation. This dependence in- troduces a partial ordering between tasks that requires coordination of the execution of the tasks assigned to each processor. OpenMP and MPI are current heavily utilized approaches and require the use of low level primitives to express very simple scientific applications. Newer environments, such as STAPL, Charm++, and Chapel, among others, raise the level of abstraction, but the challenge of specifying the flow of data between computations remains. However, graphical user interfaces (GUIs) can simplify this task. The purpose of this project is to create a GUI that al- lows a user to specify a parallel application written in STAPL by composing high-level components and by defining the flow of data between them. The idea is that the user creates the layout of the code using shapes and lines, which produce the composition on an underlying layer, eliminating the need to write complex composition specifications directly in the code

    Interpreting and Visualizing Performance Portability Metrics

    Get PDF

    Fast Automatic Heuristic Construction Using Active Learning

    Get PDF
    Building effective optimization heuristics is a challenging task which often takes developers several months if not years to complete. Predictive modelling has recently emerged as a promising solution, automatically constructing heuristics from training data. However, obtaining this data can take months per platform. This is becoming an ever more critical problem and if no solution is found we shall be left with out of date heuristics which cannot extract the best performance from modern machines. In this work, we present a low-cost predictive modelling approach for automatic heuristic construction which significantly reduces this training overhead. Typically in supervised learning the training instances are randomly selected to evaluate regardless of how much useful information they carry. This wastes effort on parts of the space that contribute little to the quality of the produced heuristic. Our approach, on the other hand, uses active learning to select and only focus on the most useful training examples. We demonstrate this technique by automatically constructing a model to determine on which device to execute four parallel programs at differing problem dimensions for a representative Cpu–Gpu based heterogeneous system. Our methodology is remarkably simple and yet effective, making it a strong candidate for wide adoption. At high levels of classification accuracy the average learning speed-up is 3x, as compared to the state-of-the-art

    Optimizing the Performance of Directive-based Programming Model for GPGPUs

    Get PDF
    Accelerators have been deployed on most major HPC systems. They are considered to improve the performance of many applications. Accelerators such as GPUs have an immense potential in terms of high compute capacity but programming these devices is a challenge. OpenCL, CUDA and other vendor-specific models for accelerator programming definitely offer high performance, but these are low-level models that demand excellent programming skills; moreover, they are time consuming to write and debug. In order to simplify GPU programming, several directive-based programming models have been proposed, including HMPP, PGI accelerator model and OpenACC. OpenACC has now become established as the de facto standard. We evaluate and compare these models involving several scientific applications. To study the implementation challenges and the principles and techniques of directive- based models, we built an open source OpenACC compiler on top of a main stream compiler framework (OpenUH as a branch of Open64). In this dissertation, we present the required techniques to parallelize and optimize the applications ported with OpenACC programming model. We apply both user-level optimizations in the applications and compiler and runtime-driven optimizations. The compiler optimization focuses on the parallelization of reduction operations inside nested parallel loops. To fully utilize all GPU resources, we also extend the OpenACC model to support multiple GPUs in a single node. Our application porting experience also revealed the challenge of choosing good loop schedules. The default loop schedule chosen by the compiler may not produce the best performance, so the user has to manually try different loop schedules to improve the performance. To solve this issue, we developed a locality-aware auto-tuning framework which is based on the proposed memory access cost model to help the compiler choose optimal loop schedules and guide the user to choose appropriate loop schedules.Computer Science, Department o
    corecore