336 research outputs found

    The Scalability-Efficiency/Maintainability-Portability Trade-off in Simulation Software Engineering: Examples and a Preliminary Systematic Literature Review

    Full text link
    Large-scale simulations play a central role in science and the industry. Several challenges occur when building simulation software, because simulations require complex software developed in a dynamic construction process. That is why simulation software engineering (SSE) is emerging lately as a research focus. The dichotomous trade-off between scalability and efficiency (SE) on the one hand and maintainability and portability (MP) on the other hand is one of the core challenges. We report on the SE/MP trade-off in the context of an ongoing systematic literature review (SLR). After characterizing the issue of the SE/MP trade-off using two examples from our own research, we (1) review the 33 identified articles that assess the trade-off, (2) summarize the proposed solutions for the trade-off, and (3) discuss the findings for SSE and future work. Overall, we see evidence for the SE/MP trade-off and first solution approaches. However, a strong empirical foundation has yet to be established; general quantitative metrics and methods supporting software developers in addressing the trade-off have to be developed. We foresee considerable future work in SSE across scientific communities.Comment: 9 pages, 2 figures. Accepted for presentation at the Fourth International Workshop on Software Engineering for High Performance Computing in Computational Science and Engineering (SEHPCCSE 2016

    GSWO: A Programming Model for GPU-enabled Parallelization of Sliding Window Operations in Image Processing

    Get PDF
    Sliding Window Operations (SWOs) are widely used in image processing applications. They often have to be performed repeatedly across the target image, which can demand significant computing resources when processing large images with large windows. In applications in which real-time performance is essential, running these filters on a CPU often fails to deliver results within an acceptable timeframe. The emergence of sophisticated graphic processing units (GPUs) presents an opportunity to address this challenge. However, GPU programming requires a steep learning curve and is error-prone for novices, so the availability of a tool that can produce a GPU implementation automatically from the original CPU source code can provide an attractive means by which the GPU power can be harnessed effectively. This paper presents a GPUenabled programming model, called GSWO, which can assist GPU novices by converting their SWO-based image processing applications from the original C/C++ source code to CUDA code in a highly automated manner. This model includes a new set of simple SWO pragmas to generate GPU kernels and to support effective GPU memory management. We have implemented this programming model based on a CPU-to-GPU translator (C2GPU). Evaluations have been performed on a number of typical SWO image filters and applications. The experimental results show that the GSWO model is capable of efficiently accelerating these applications, with improved applicability and a speed-up of performance compared to several leading CPU-to- GPU source-to-source translators

    ACC Saturator: Automatic Kernel Optimization for Directive-Based GPU Code

    Full text link
    Automatic code optimization is a complex process that typically involves the application of multiple discrete algorithms that modify the program structure irreversibly. However, the design of these algorithms is often monolithic, and they require repetitive implementation to perform similar analyses due to the lack of cooperation. To address this issue, modern optimization techniques, such as equality saturation, allow for exhaustive term rewriting at various levels of inputs, thereby simplifying compiler design. In this paper, we propose equality saturation to optimize sequential codes utilized in directive-based programming for GPUs. Our approach simultaneously realizes less computation, less memory access, and high memory throughput. Our fully-automated framework constructs single-assignment forms from inputs to be entirely rewritten while keeping dependencies and extracts optimal cases. Through practical benchmarks, we demonstrate a significant performance improvement on several compilers. Furthermore, we highlight the advantages of computational reordering and emphasize the significance of memory-access order for modern GPUs

    Improving Utility of GPU in Accelerating Industrial Applications with User-centred Automatic Code Translation

    Get PDF
    SMEs (Small and medium-sized enterprises), particularly those whose business is focused on developing innovative produces, are limited by a major bottleneck on the speed of computation in many applications. The recent developments in GPUs have been the marked increase in their versatility in many computational areas. But due to the lack of specialist GPU (Graphics processing units) programming skills, the explosion of GPU power has not been fully utilized in general SME applications by inexperienced users. Also, existing automatic CPU-to-GPU code translators are mainly designed for research purposes with poor user interface design and hard-to-use. Little attentions have been paid to the applicability, usability and learnability of these tools for normal users. In this paper, we present an online automated CPU-to-GPU source translation system, (GPSME) for inexperienced users to utilize GPU capability in accelerating general SME applications. This system designs and implements a directive programming model with new kernel generation scheme and memory management hierarchy to optimize its performance. A web-service based interface is designed for inexperienced users to easily and flexibly invoke the automatic resource translator. Our experiments with non-expert GPU users in 4 SMEs reflect that GPSME system can efficiently accelerate real-world applications with at least 4x and have a better applicability, usability and learnability than existing automatic CPU-to-GPU source translators

    Static Graphs for Coding Productivity in OpenACC

    Get PDF
    The main contribution of this work is to increase the coding productivity for GPU programming by using the concept of Static Graphs. To do so, we have combined the new CUDA Graph API with the OpenACC programming model. We use as test cases a well-known and widely used problems in HPC and AI: the Particle Swarm Optimization. We complement the OpenACC functionality with the use of CUDA Graph, achieving accelerations of more than one order of magnitude, and a performance very close to a reference and optimized CUDA code. Finally, we propose a new specification to incorporate the concept of Static Graphs into the OpenACC specification.This project has received funding from the EPEEC project from the European Union’s Horizon 2020 Research and Innovation program under grant agreement No. 801051.Peer ReviewedPostprint (author's final draft
    • …
    corecore