Search CORE

738 research outputs found

Power-aware load balancing of large scale MPI applications

Author: Corbalán González Julita
Etinski Maja
Labarta Mancho Jesús José
Valero Cortés Mateo
Veidenbaum Alex
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

Power consumption is a very important issue for HPC community, both at the level of one application or at the level of whole workload. Load imbalance of a MPI application can be exploited to save CPU energy without penalizing the execution time. An application is load imbalanced when some nodes are assigned more computation than others. The nodes with less computation can be run at lower frequency since otherwise they have to wait for the nodes with more computation blocked in MPI calls. A technique that can be used to reduce the speed is Dynamic Voltage Frequency Scaling (DVFS). Dynamic power dissipation is proportional to the product of the frequency and the square of the supply voltage, while static power is proportional to the supply voltage. Thus decreasing voltage and/or frequency results in power reduction. Furthermore, over-clocking can be applied in some CPUs to reduce overall execution time. This paper investigates the impact of using different gear sets , over-clocking, and application and platform propreties to reduce CPU power. A new algorithm applying DVFS and CPU over-clocking is proposed that reduces execution time while achieving power savings comparable to prior work. The results show that it is possible to save up to 60% of CPU energy in applications with high load imbalance. Our results show that six gear sets achieve, on average, results close to the continuous frequency set that has been used as a baseline.Peer ReviewedPostprint (published version

Crossref

UPCommons. Portal del coneixement obert de la UPC

A Three Step Blind Approach for Improving HPC Systems' Energy Performance

Author: K Choi
KW Cameron
R Bolze
VW Freeh
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

International audienceNowadays, there is no doubt that energy consumption has become a limiting factor in the design and operation of high performance computing (HPC) systems. This is evidenced by the rise of efforts both from the academia and the industry to reduce the energy consumption of those systems. Unlike hardware solutions, software initiatives targeting HPC systems' energy consumption reduction despite their effectiveness are often limited for reasons including: (i) the program specific nature of the solution proposed; (ii) the need of deep understanding of applications at hand; (iii) proposed solutions are often difficult to use by novices and/or are designed for single task environments. This paper propose a three step blind system-wide, application independent, fine-grain, and easy to use (user friendly) methodology for improving energy performance of HPC systems. The methodology typically breaks into phase detection, phase characterization, and phase identification and system reconfiguration. And it is blind in the sense that it does not require any knowledge from users. It relies upon reconfigurable capabilities offered by the majority of HPC subsystems -- including the processor, storage, memory, and communication subsystems -- to reduce the overall energy consumption of the system (excluding network equipments) at runtime. We also present an implementation of our methodology through which we demonstrate its effectiveness via static analyses and experiments using benchmarks representative of HPC workloads

HAL-ENS-LYON

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

INRIA a CCSD electronic archive server

Open Archive Toulouse Archive Ouverte

Hal-Diderot

HAL-Rennes 1

A Workload-Aware, Eco-Friendly Daemon for Cluster Computing

Author: Feng W.
Huang S.
Publication venue
Publication date: 01/01/2008
Field of study

This paper presents an eco-friendly daemon that reduces power consumption while better maintaining high performance via a novel behavioral quantification of workload. Specifically, our behavioral quantification achieves a more accurate workload characterization than previous approaches by inferring "processor stall cycles due to off-chip activities." This quantification, in turn, provides a foundation upon which we construct an interval-based, power-aware, run-time algorithm that is implemented within a system-wide daemon. We then evaluate our power-aware daemon in a cluster-computing environment with the NAS Parallel Benchmarks. The results indicate that our novel behavioral quantification of workload allows our power-aware daemon to more tightly control performance while delivering substantial energy savings

Computer Science Technical Reports @Virginia Tech

CiteSeerX

Power Saving Experiments for Large Scale Global Optimization

Author: Cameron Kirk W.
Cao Zhenwei
Easterling David R.
Feng Wu-Chun
Li Dong
Watson Layne T.
Publication venue
Publication date: 01/01/2009
Field of study

Green computing, an emerging ﬁeld of research that seeks to reduce excess power consumption in high performance computing (HPC), is gaining popularity among researchers. Research in this ﬁeld often relies on simulation or only uses a small cluster, typically 8 or 16 nodes, because of the lack of hardware support. In contrast, System G at Virginia Tech is a 2592 processor supercomputer equipped with power aware components suitable for large scale green computing research. DIRECT is a deterministic global optimization algorithm, implemented in the mathematical software package VTDIRECT95. This paper explores the potential energy savings for the parallel implementation of DIRECT, called pVTdirect, when used with a large scale computational biology application, parameter estimation for a budding yeast cell cycle model, on System G. Two power aware approaches for pVTdirect are developed and compared against the CPUSPEED power saving system tool. The results show that knowledge of the parallel workload of the underlying application is beneficial for power management

Computer Science Technical Reports @Virginia Tech

Phase-Based Application-Driven Hierarchical Power Management on the Single-chip Cloud Computer

Author: Marcelo Cintra
Matthias Gries
Michael Kauschke
Nikolas Ioannou
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

To improve energy efficiency processors allow for Dy-namic Voltage and Frequency Scaling (DVFS), which enables changing their performance and power consumption on-the-fly. Many-core architectures, such as the Single-chip Cloud Computer (SCC) experimental processor from Intel Labs, have DVFS infrastructures that scale by having many more independent voltage and frequency domains on-die than to-day’s multi-cores. This paper proposes a novel, hierarchical, and transparent client-server power management scheme applicable to such architectures. The scheme tries to minimize energy consump-tion within a performance window taking into consideration not only the local information for cores within frequency do-mains but also information that spans multiple frequency and voltage domains. We implement our proposed hierarchical power control using a novel application-driven phase detection and predic-tion approach for Message Passing Interface (MPI) applica-tions, a natural choice on the SCC with its fast on-chip net-work and its non-coherentmemory hierarchy. This phase pre-dictor operates as the front-end to the hierarchical DVFS con-troller, providing the necessary DVFS scheduling points. Experimental results with SCC hardware show that our ap-proach provides significant improvement of the EnergyDelay Product (EDP) of as much as 27.2%, and 11.4 % on average, with an average increase in execution time of 7.7 % over a baseline version without DVFS. These improvements come from both improved phase prediction accuracy and more ef-fective DVFS control of the domains, compared to existing approaches

CiteSeerX

Crossref

RUNTIME METHODS TO IMPROVE ENERGY EFFICIENCY IN SUPERCOMPUTING APPLICATIONS

Author: Bhalachandra Sridutt
Publication venue: University of North Carolina at Chapel Hill Graduate School
Publication date: 01/01/2018
Field of study

Energy efficiency in supercomputing is critical to limit operating costs and carbon footprints. While the energy efficiency of future supercomputing centers needs to improve at all levels, the energy consumed by the processing units is a large fraction of the total energy consumed by High Performance Computing (HPC) systems. HPC applications use a parallel programming paradigm like the Message Passing Interface (MPI) to coordinate computation and communication among thousands of processors. With dynamically-changing factors both in hardware and software affecting energy usage of processors, there exists a need for power monitoring and regulation at runtime to achieve savings in energy. This dissertation highlights an adaptive runtime framework that enables processors with core-specific power control by dynamically adapting to workload characteristics to reduce power with little or no performance impact. Two opportunities to improve the energy efficiency of processors running MPI applications are identified - computational workload imbalance and waiting on memory. Monitoring of performance and power regulation is performed by the framework transparently within the MPI runtime system, eliminating the need for code changes to MPI applications. The effect of enforcing power limits (capping) on processors is also investigated. Experiments on 32 nodes (1024 cores) show that in presence of workload imbalance, the runtime reduces Central Processing Unit (CPU) frequency on cores not on the critical path, thereby reducing power and hence energy usage without deteriorating performance. Using this runtime, six MPI mini-applications and a full MPI application show an overall 20% decrease in energy use with less than 1% increase in execution time. In addition, the lowering of frequency on non-critical cores reduces run-to-run performance variation and improves performance. For the full application, an average speedup of 11% is seen, while the power is lowered by about 31% for an energy savings of up to 42%. Another experiment on 16 nodes (256 cores) that are power capped also shows performance improvement along with power reduction. Thus, energy optimization can also be a performance optimization. For applications that are limited by memory access times, memory metrics identified facilitate lowering of power by up to 32% without adversely impacting performance.Doctor of Philosoph

Carolina Digital Repository

BSLD threshold driven parallel job scheduling for energy efficient HPC centers

Author: Corbalán González Julita
Etinski Maja
Labarta Mancho Jesús José
Valero Cortés Mateo
Publication venue
Publication date: 01/01/2009
Field of study

Recently, power awareness in high performance computing (HPC) community has increased significantly. While CPU power reduction of HPC applications using Dynamic Voltage Frequency Scaling (DVFS) has been explored thoroughly, CPU power management for large scale parallel systems at system level has left unexplored. In this paper we propose a power-aware parallel job scheduler assuming DVFS enabled clusters. Traditional parallel job schedulers determine when a job will be run, power aware ones should assign CPU frequency which it will be run at. We have introduced two adjustable thresholds in order to enable fine grain energy performance trade-off control. Since our power reduction approach is policy independent it can be added to any parallel job scheduling policy. Furthermore, we have done an analysis of HPC system dimension. Running an application at lower frequency on more processors can be more energy efficient than running it at the highest CPU frequency on less processors. This paper investigates whether having more DVFS enabled processors and same load can lead to better energy efficiency and performance. Five workload logs from systems in production use with up to 9 216 processors are simulated to evaluate the proposed algorithm and the dimensioning problem. Our approach decreases CPU energy by 7%- 18% on average depending on allowed job performance penalty. Applying the same frequency scaling algorithm on 20% larger system, CPU energy needed to execute same load can be decreased by almost 30% while having same or better job performance.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Dynamic Power Management for Reactive Stream Processing on the SCC Tiled Architecture

Author: B Rountree
C Grelck
C Poellabauer
D Gusfield
DW Marquardt
E Bini
G Chen
I Buck
J-J Chen
Jens Knoop
JH Anderson
L Wang
M Bambagini
Michael Zolda
MY Lim
N Ioannou
N Kappiah
Nilesh Karavadara
P Gschwandtner
Q Cai
Raimund Kirner
V Nguyen
VTN Nguyen
Vu Thien Nga Nguyen
W Thies
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.Dynamic voltage and frequency scaling} (DVFS) is a means to adjust the computing capacity and power consumption of computing systems to the application demands. DVFS is generally useful to provide a compromise between computing demands and power consumption, especially in the areas of resource-constrained computing systems. Many modern processors support some form of DVFS. In this article we focus on the development of an execution framework that provides light-weight DVFS support for reactive stream-processing systems (RSPS). RSPS are a common form of embedded control systems, operating in direct response to inputs from their environment. At the execution framework we focus on support for many-core scheduling for parallel execution of concurrent programs. We provide a DVFS strategy for RSPS that is simple and lightweight, to be used for dynamic adaptation of the power consumption at runtime. The simplicity of the DVFS strategy became possible by sole focus on the application domain of RSPS. The presented DVFS strategy does not require specific assumptions about the message arrival rate or the underlying scheduling method. While DVFS is a very active field, in contrast to most existing research, our approach works also for platforms like many-core processors, where the power settings typically cannot be controlled individually for each computational unit. We also support dynamic scheduling with variable workload. While many research results are provided with simulators, in our approach we present a parallel execution framework with experiments conducted on real hardware, using the SCC many-core processor. The results of our experimental evaluation confirm that our simple DVFS strategy provides potential for significant energy saving on RSPS.Peer reviewe

Crossref

Springer - Publisher Connector

University of Hertfordshire Research Archive

Evaluation of DVFS techniques on modern HPC processors and accelerators for energy-aware applications

Author: Biferale
Biferale
Biferale
Calore
Calore
Calore
Calore
Calore
Crimi
Dick
Etinski
Ge
Khabi
Lim
Mantovani
Mazouz
Peraza
Sbragaglia
Scagliarini
Succi
Sundriyal
Williams
Wittmann
Publication venue: 'Wiley'
Publication date: 01/01/2017
Field of study

Energy efficiency is becoming increasingly important for computing systems, in particular for large scale HPC facilities. In this work we evaluate, from an user perspective, the use of Dynamic Voltage and Frequency Scaling (DVFS) techniques, assisted by the power and energy monitoring capabilities of modern processors in order to tune applications for energy efficiency. We run selected kernels and a full HPC application on two high-end processors widely used in the HPC context, namely an NVIDIA K80 GPU and an Intel Haswell CPU. We evaluate the available trade-offs between energy-to-solution and time-to-solution, attempting a function-by-function frequency tuning. We finally estimate the benefits obtainable running the full code on a HPC multi-GPU node, with respect to default clock frequency governors. We instrument our code to accurately monitor power consumption and execution time without the need of any additional hardware, and we enable it to change CPUs and GPUs clock frequencies while running. We analyze our results on the different architectures using a simple energy-performance model, and derive a number of energy saving strategies which can be easily adopted on recent high-end HPC systems for generic applications

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università di Ferrara