194 research outputs found

    Explicit uncore frequency scaling for energy optimisation policies with EAR in Intel architectures

    Get PDF
    EAR is an energy management framework which offers three main services: energy accounting, energy control and energy optimisation. The latter is done through the EAR runtime library (EARL). EARL is a dynamic, transparent, and lightweight runtime library that provides energy optimisation and control. It implements energy optimisation policies that selects the optimal CPU frequency based on runtime application characteristics and policy settings. Given that EARL defines a policy API and a plugin mechanism, different policies can be easily evaluated. In this paper we propose and evaluate the utilisation of explicit Uncore Frequency Scaling (explicit UFS) in Intel architectures to increase the energy savings opportunities in the cases where the hardware cannot select the optimal frequency for the Integrated Memory Controller (IMC). We extended the min_energy_to_solution policy to select the CPU and IMC frequencies and we executed and evaluated it with some kernels and six real applications. Results showed an average energy saving of 9% with an average time penalty of 3%. On some use cases, the impact of explicit UFS compared with HW UFS was up to 8% of extra energy savings.This work has been funded by the BSC-Lenovo collaboration agreement.Peer ReviewedPostprint (author's final draft

    Covariance tracking: architecture optimizations for embedded systems

    Get PDF

    Evaluating directive-based programming models on Wave Propagation Kernels

    Get PDF
    HPC systems have become mandatory to tackle the ever-increasing challenges imposed by new exploration areas around the world. The requirement for more HPC resources depends on the complexity of the area under exploration, yet the larger the HPC system, the more the energy consumption involved. Reduction of overall power consumption in HPC facilities, lead technologies vendors to introduce many-core devices and heterogeneous computing to the supercomputers, thus, forcing exploration codes to be ported to such new architectures. As the Oil & Gas industry has more than 30 years of legacy code, the effort to adapt it could be huge. To this extent, several programming models emerged, e.g. high-level directive-based programming models, such as OpenMP, OpenACC, and OmpSs rely on specifying to the compiler the parallelism directives to release users from manually decomposing and processing the parallel regions. The results show that it is possible to obtain a parallel code for current heterogeneous HPC architectures investing a few hours or days. The obtained speedup is at least an order of magnitude w.r.t. a sequential code. However, we provide parallelism inside a single computational node, and a wider study for evaluating the costs of porting and parallelizing across computational nodes is pending.Authors thank Repsol for the permission to publish the present research, carried out at the Repsol-BSC Research Center. This work has received funding from the European Union’s Horizon 2020 Programme (2014-2020) and from the Brazilian Ministry of Science, Technology and Innovation through Rede Nacional de Pesquisa (RNP) under the HPC4E Project (www.hpc4e.eu), grant agreement n.◦ 689772.Peer ReviewedPostprint (author's final draft

    Characterizing Power and Energy Efficiency of Legion Data-Centric Runtime and Applications on Heterogeneous High-Performance Computing Systems

    Get PDF
    The traditional parallel programming models require programmers to explicitly specify parallelism and data movement of the underlying parallel mechanisms. Different from the traditional computation-centric programming, Legion provides a data-centric programming model for extracting parallelism and data movement. In this chapter, we aim to characterize the power and energy consumption of running HPC applications on Legion. We run benchmark applications on compute nodes equipped with both CPU and GPU, and measure the execution time, power consumption and CPU/GPU utilization. Additionally, we test the message passing interface (MPI) version of these applications and compare the performance and power consumption of high-performance computing (HPC) applications using the computation-centric and data-centric programming models. Experimental results indicate Legion applications outperforms MPI applications on both performance and energy efficiency, i.e., Legion applications can be 9.17 times as fast as MPI applications and use only 9.2% energy. Legion effectively explores the heterogeneous architecture and runs applications tasks on GPU. As far as we know, this is the first study to understand the power and energy consumption of Legion programming and runtime infrastructure. Our findings will enable HPC system designers and operators to develop and tune the performance of data-centric HPC applications with constraints on power and energy consumption

    Optimisation of computational fluid dynamics applications on multicore and manycore architectures

    Get PDF
    This thesis presents a number of optimisations used for mapping the underlying computational patterns of finite volume CFD applications onto the architectural features of modern multicore and manycore processors. Their effectiveness and impact is demonstrated in a block-structured and an unstructured code of representative size to industrial applications and across a variety of processor architectures that make up contemporary high-performance computing systems. The importance of vectorization and the ways through which this can be achieved is demonstrated in both structured and unstructured solvers together with the impact that the underlying data layout can have on performance. The utility of auto-tuning for ensuring performance portability across multiple architectures is demonstrated and used for selecting optimal parameters such as prefetch distances for software prefetching or tile sizes for strip mining/loop tiling. On the manycore architectures, running more than one thread per physical core is found to be crucial for good performance on processors with in-order core designs but not required on out-of-order architectures. For architectures with high-bandwidth memory packages, their exploitation, whether explicitly or implicitly, is shown to be imperative for best performance. The implementation of all of these optimisations led to application speed-ups ranging between 2.7X and 3X on the multicore CPUs and 5.7X to 24X on the manycore processors.Open Acces

    A Unified Infrastructure for Monitoring and Tuning the Energy Efficiency of HPC Applications

    Get PDF
    High Performance Computing (HPC) has become an indispensable tool for the scientific community to perform simulations on models whose complexity would exceed the limits of a standard computer. An unfortunate trend concerning HPC systems is that their power consumption under high-demanding workloads increases. To counter this trend, hardware vendors have implemented power saving mechanisms in recent years, which has increased the variability in power demands of single nodes. These capabilities provide an opportunity to increase the energy efficiency of HPC applications. To utilize these hardware power saving mechanisms efficiently, their overhead must be analyzed. Furthermore, applications have to be examined for performance and energy efficiency issues, which can give hints for optimizations. This requires an infrastructure that is able to capture both, performance and power consumption information concurrently. The mechanisms that such an infrastructure would inherently support could further be used to implement a tool that is able to do both, measuring and tuning of energy efficiency. This thesis targets all steps in this process by making the following contributions: First, I provide a broad overview on different related fields. I list common performance measurement tools, power measurement infrastructures, hardware power saving capabilities, and tuning tools. Second, I lay out a model that can be used to define and describe energy efficiency tuning on program region scale. This model includes hardware and software dependent parameters. Hardware parameters include the runtime overhead and delay for switching power saving mechanisms as well as a contemplation of their scopes and the possible influence on application performance. Thus, in a third step, I present methods to evaluate common power saving mechanisms and list findings for different x86 processors. Software parameters include their performance and power consumption characteristics as well as the influence of power-saving mechanisms on these. To capture software parameters, an infrastructure for measuring performance and power consumption is necessary. With minor additions, the same infrastructure can later be used to tune software and hardware parameters. Thus, I lay out the structure for such an infrastructure and describe common components that are required for measuring and tuning. Based on that, I implement adequate interfaces that extend the functionality of contemporary performance measurement tools. Furthermore, I use these interfaces to conflate performance and power measurements and further process the gathered information for tuning. I conclude this work by demonstrating that the infrastructure can be used to manipulate power-saving mechanisms of contemporary x86 processors and increase the energy efficiency of HPC applications
    corecore