753 research outputs found

    Evaluation of DVFS techniques on modern HPC processors and accelerators for energy-aware applications

    Get PDF
    Energy efficiency is becoming increasingly important for computing systems, in particular for large scale HPC facilities. In this work we evaluate, from an user perspective, the use of Dynamic Voltage and Frequency Scaling (DVFS) techniques, assisted by the power and energy monitoring capabilities of modern processors in order to tune applications for energy efficiency. We run selected kernels and a full HPC application on two high-end processors widely used in the HPC context, namely an NVIDIA K80 GPU and an Intel Haswell CPU. We evaluate the available trade-offs between energy-to-solution and time-to-solution, attempting a function-by-function frequency tuning. We finally estimate the benefits obtainable running the full code on a HPC multi-GPU node, with respect to default clock frequency governors. We instrument our code to accurately monitor power consumption and execution time without the need of any additional hardware, and we enable it to change CPUs and GPUs clock frequencies while running. We analyze our results on the different architectures using a simple energy-performance model, and derive a number of energy saving strategies which can be easily adopted on recent high-end HPC systems for generic applications

    Loo.py: transformation-based code generation for GPUs and CPUs

    Full text link
    Today's highly heterogeneous computing landscape places a burden on programmers wanting to achieve high performance on a reasonably broad cross-section of machines. To do so, computations need to be expressed in many different but mathematically equivalent ways, with, in the worst case, one variant per target machine. Loo.py, a programming system embedded in Python, meets this challenge by defining a data model for array-style computations and a library of transformations that operate on this model. Offering transformations such as loop tiling, vectorization, storage management, unrolling, instruction-level parallelism, change of data layout, and many more, it provides a convenient way to capture, parametrize, and re-unify the growth among code variants. Optional, deep integration with numpy and PyOpenCL provides a convenient computing environment where the transition from prototype to high-performance implementation can occur in a gradual, machine-assisted form
    • …
    corecore