Search CORE

267 research outputs found

Taming Energy Consumption Variations in Systems Benchmarking

Author: Belgaid Mohammed Chakib
Ournani Zakaria
Penhoat Joël
Rouvoy Romain
Rust Pierre
Seinturier Lionel
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/04/2020
Field of study

International audienceThe past decade witnessed the inclusion of power measurements to evaluate the energy efficiency of software systems, thus making energy a prime indicator along with performance. Nevertheless, measuring the energy consumption of a software system remains a tedious task for practitioners. In particular, the energy measurement process may be subject to a lot of variations that hinder the relevance of potential comparisons. While the state of the art mostly acknowledged the impact of hardware factors (chip printing process, CPU temperature), this paper investigates the impact of controllable factors on these variations. More specifically, we conduct an empirical study of multiple controllable parameters that one can easily tune to tame the energy consumption variations when benchmarking software systems.To better understand the causes of such variations, we ran more than a 1000 experiments on more than 100 machines with different workloads and configurations. The main factors we studied encompass: experimental protocol, CPU features (C-states, Turbo~Boost, core pinning) and generations, as well as the operating system. Our experiments showed that, for some workloads, it is possible to tighten the energy variation by up to 30×. Finally, we summarize our results as guidelines to tame energy consumption variations. We argue that the guidelines we deliver are the minimal requirements to be considered prior to any energy efficiency evaluatio

Crossref

INRIA a CCSD electronic archive server

Exploring hardware overprovisioning in power-constrained, high performance computing

Author
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

Crossref

An Analysis of Variation Between Cores For Intel Xeon Phi Knights Corner And Xeon Phi Knights Landing

Author: Robinson Jamar
Publication venue: Clemson University Libraries
Publication date: 01/05/2017
Field of study

As we move towards exascale computing, the efficiency of application performance and energy utilization, must be optimized by redefining architectural features and application performance analysis. This research analyzes the performance per core of 8 applications on Intel Xeon Phi Knights Corner (KNC) and Knights Landing (KNL) to determine if performance variation within cores can lead to performance and energy improvements. Our results showed that KNC architecture\u27s core vary in performance, leading to faster inner core performance as a result of memory characteristics and core utilization. It also shows that cores 17, 34, and 51 on the KNL architectures performs consistently slower than other cores, with core 0 performing either faster, slower or within the average performance time all the cores. A power performance study was then done utilizing different core configurations on the KNC. The results show that by targeting inner cores for applications that exhibit better inner core performance, a maximum energy reduction of 16.4% compared to a con- figuration using all cores was possible with its optimal thread configuration. Energy reduction was achieved with along with a 2% reduction in the fastest execution time of the same application. Our results also show how application characteristics lead to different core variation performances on KNC and KNL Xeon Phi architectures

Clemson University: TigerPrints

Recommended from our members

Characterization and management of voltage noise in multi-core, multi-threaded processors

Author: Kim Youngtaek
Publication venue
Publication date: 14/07/2014
Field of study

textReliability is one of the important issues of recent microprocessor design. Processors must provide correct behavior as users expect, and must not fail at any time. However, unreliable operation can be caused by excessive supply voltage fluctuations due to an inductive part in a microprocessor power distribution network. This voltage fluctuation issue is referred to as inductive or di/dt noise, and requires thorough analysis and sophisticated design solutions. This dissertation proposes an automated stressmark generation framework to characterize di/dt noise effect, and suggests a practical solution for management of di/dt effects while achieving performance and energy goals. First, the di/dt noise issue is analyzed from theory to a practical view. Inductance is a parasitic part in power distribution network for microprocessor, and its characteristics such as resonant frequencies are reviewed. Then, it is shown that supply voltage fluctuation from resonant behavior is much harmful than single event voltage fluctuations. Voltage fluctuations caused by standard benchmarks such as SPEC CPU2006, PARSEC, Linpack, etc. are studied. Next, an AUtomated DI/dT stressmark generation framework, referred to as AUDIT, is proposed to identify maximum voltage droop in a microprocessor power distribution network. The di/dt stressmark generated from AUDIT framework is an instruction sequence, which draws periodic high and low current pulses that maximize voltage fluctuations including voltage droops. AUDIT uses a Genetic Algorithm in scheduling and optimizing candidate instruction sequences to create a maximum voltage droop. In addition, AUDIT provides with both simulation and hardware measurement methods for finding maximum voltage droops in different design and verification stages of a processor. Failure points in hardware due to voltage droops are analyzed. Finally, a hardware technique, floating-point (FP) issue throttling, is examined, which provides a reduction in worst case voltage droop. This dissertation shows the impact of floating point throttling on voltage droop, and translates this reduction in voltage droop to an increase in operating frequency because additional guardband is no longer required to guard against droops resulting from heavy floating point usage. This dissertation presents two techniques to dynamically determine when to tradeoff FP throughput for reduced voltage margin and increased frequency. These techniques can work in software level without any modification of existing hardware.Electrical and Computer Engineerin

Texas ScholarWorks

Coordinating Resource Use in Open Distributed Systems

Author: Zhao Xinghui
Publication venue: 'University of Saskatchewan Library'
Publication date
Field of study

In an open distributed system, computational resources are peer-owned, and distributed over time and space. The system is open to interactions with its environment, and the resources can dynamically join or leave the system, or can be discovered at runtime. This dynamicity leads to opportunities to carry out computations without statically owned resources, harnessing the collective compute power of the resources connected by the Internet. However, realizing this potential requires efficient and scalable resource discovery, coordination, and control, which present challenges in a dynamic, open environment. In this thesis, I present an approach to address these challenges by separating the functionality concerns of concurrent computations from those of coordinating their resource use, with the purpose of reducing programming complexity, and aiding development of correct, efficient, and resource-aware concurrent programs. As a first step towards effectively coordinating distributed resources, I developed DREAM, a Distributed Resource Estimation and Allocation Model, which enables computations to reason about future availability of resources. I then developed a fine-grained resource coordination scheme for distributed computations. The coordination scheme integrates DREAM-based resource reasoning into a distributed scheduler, for deciding and enforcing fine-grained resource-use schedules for distributed computations. To control the overhead caused by the coordination, a tuner is implemented which explicitly balances the overhead of the control mechanisms against the extent of control exercised. The effectiveness and performance of the resource coordination approach have been evaluated using a number of case studies. Experimental results show that the approach can effectively schedule computations for supporting various types of coordination objectives, such as ensuring Quality-of-Service, power-efficient execution, and dynamic load balancing. The overhead caused by the coordination mechanism is relatively modest, and adjustable through the tuner. In addition, the coordination mechanism does not add extra programming complexity to computations

eCommons@USASK

University of Saskatchewan Research Archive

Coordinated power management in heterogeneous processors

Author: Paul Indrani
Publication venue: Georgia Institute of Technology
Publication date: 08/06/2015
Field of study

Coordinated Power Management in Heterogeneous Processors Indrani Paul 164 pages Directed by Dr. Sudhakar Yalamanchili With the end of Dennard scaling, the scaling of device feature size by itself no longer guarantees sustaining the performance improvement predicted by Moore’s Law. As industry moves to increasingly small feature sizes, performance scaling will become dominated by the physics of the computing environment and in particular by the transient behavior of interactions between power delivery, power management and thermal fields. Consequently, performance scaling must be improved by managing interactions between physical properties, which we refer to as processor physics, and system level performance metrics, thereby improving the overall efficiency of the system. The industry shift towards heterogeneous computing is in large part motivated by energy efficiency. While such tightly coupled systems benefit from reduced latency and improved performance, they also give rise to new management challenges due to phenomena such as physical asymmetry in thermal and power signatures between the diverse elements and functional asymmetry in performance. Power-performance tradeoffs in heterogeneous processors are determined by coupled behaviors between major components due to the i) on-die integration, ii) programming model and the iii) processor physics. Towards this end, this thesis demonstrates the needs for coordinated management of functional and physical resources of a heterogeneous system across all major compute and memory elements. It shows that the interactions among performance, power delivery and different types of coupling phenomena are not an artifact of an architecture instance, but is fundamental to the operation of many core and heterogeneous architectures. Managing such coupling effects is a central focus of this dissertation. This awareness has the potential to exert significant influence over the design of future power and performance management algorithms. The high-level contributions of this thesis are i) in-depth examination of characteristics and performance demands of emerging applications using hardware measurements and analysis from state-of-the-art heterogeneous processors and high-performance GPUs, ii) analysis of the effects of processor physics such as power and thermals on system level performance, iii) identification of a key set of run-time metrics that can be used to manage these effects, and iv) development and detailed evaluation of online coordinated power management techniques to optimize system level global metrics in heterogeneous CPU-GPU-memory processors.Ph.D

Scholarly Materials And Research @ Georgia Tech

Reliability-oriented resource management for High-Performance Computing

Author: Agosta Giovanni
Campi Alessandro
Ciesielski Sebastian
Fornaciari William
Kulczewski Michal
Massari Giuseppe
Peta Miriam
Piatek Wojciech
Reghenzani Federico
Terraneo Federico
Publication venue: 'Elsevier BV'
Publication date: 01/01/2023
Field of study

Reliability is an increasingly pressing issue for High-Performance Computing systems, as failures are a threat to large-scale applications, for which an even single run may incur significant energy and billing costs. Currently, application developers need to address reliability explicitly, by integrating application-specific checkpoint/restore mechanisms. However, the application alone cannot exploit system knowledge, which is not the case for system-wide resource management systems. In this paper, we propose a reliability-oriented policy that can increase significantly component reliability by combining checkpoint/restore mechanisms exploitation and proactive resource management policies

Archivio istituzionale della ricerca - Politecnico di Milano

An analytical methodology to derive power models based on hardware and software metrics

Author: Julian Kunkel
Konstantinos Chasapis
Manuel F. Dolz
R Ge
Sandra Catalán
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

The use of models to predict the power consumption of a system is an appealing alternative to wattmeters since they avoid hardware costs and are easy to deploy. In this paper, we present an analytical methodology to build models with a reduced number of features in order to estimate power consumption at node level. We aim at building simple power models by performing a per-component analysis (CPU, memory, network, I/O) through the execution of four standard benchmarks. While they are executed, information from all the available hardware counters and resource utilization metrics provided by the system is collected. Based on correlations among the recorded metrics and their correlation with the instantaneous power, our methodology allows (i) to identify the significant metrics; and (ii) to assign weights to the selected metrics in order to derive reduced models. The reduction also aims at extracting models that are based on a set of hardware counters and utilization metrics that can be obtained simultaneously and, thus, can be gathered and computed on-line. The utility of our procedure is validated using real-life applications on an Intel Sandy Bridge architecture

Central Archive at the University of Reading

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositori Institucional de la Universitat Jaume I

Modeling and optimization of high-performance many-core systems for energy-efficient and reliable computing

Author: Meng Jie
Publication venue: Boston University
Publication date: 01/01/2013
Field of study

Thesis (Ph.D.)--Boston UniversityMany-core systems, ranging from small-scale many-core processors to large-scale high performance computing (HPC) data centers, have become the main trend in computing system design owing to their potential to deliver higher throughput per watt. However, power densities and temperatures increase following the growth in the performance capacity, and bring major challenges in energy efficiency, cooling costs, and reliability. These challenges require a joint assessment of performance, power, and temperature tradeoffs as well as the design of runtime optimization techniques that monitor and manage the interplay among them. This thesis proposes novel modeling and runtime management techniques that evaluate and optimize the performance, energy, and reliability of many-core systems. We first address the energy and thermal challenges in 3D-stacked many-core processors. 3D processors with stacked DRAM have the potential to dramatically improve performance owing to lower memory access latency and higher bandwidth. However, the performance increase may cause 3D systems to exceed the power budgets or create thermal hot spots. In order to provide an accurate analysis and enable the design of efficient management policies, this thesis introduces a simulation framework to jointly analyze performance, power, and temperature for 3D systems. We then propose a runtime optimization policy that maximizes the system performance by characterizing the application behavior and predicting the operating points that satisfy the power and thermal constraints. Our policy reduces the energy-delay product (EDP) by up to 61.9% compared to existing strategies. Performance, cooling energy, and reliability are also critical aspects in HPC data centers. In addition to causing reliability degradation, high temperatures increase the required cooling energy. Communication cost, on the other hand, has a significant impact on system performance in HPC data centers. This thesis proposes a topology-aware technique that maximizes system reliability by selecting between workload clustering and balancing. Our policy improves the system reliability by up to 123.3% compared to existing temperature balancing approaches. We also introduce a job allocation methodology to simultaneously optimize the communication cost and the cooling energy in a data center. Our policy reduces the cooling cost by 40% compared to cooling-aware and performance-aware policies, while achieving comparable performance to performance-aware policy

Boston University Institutional Repository (OpenBU)

Understanding microarchitectural effects on the performance of parallel applications

Author: Bruns Carsten
Touati Sid
Publication venue: HAL CCSD
Publication date: 22/03/2021
Field of study

International audienceSince several years, classical multiprocessor systems have evolved to multicores, which tightly integrate multiple CPU cores on a single die or package. This technological shift leads to sharing of microarchitectural resources between the individual cores, which has direct implications on the performance of parallel applications. It consequently makes understanding and tuning these significantly harder, besides the already complex issues of parallel programming. In this work, we empirically analyze various microarchitectural effects on the performance of parallel applications, through repeatable experiments. We show their importance, besides the effects described by Amdahl's law and synchronization or communication considerations. In addition to the classification of shared resources into storage and bandwidth resources of Abel et al. [1], we view the physical temperature and power budget also as a shared resource. Dynamic Voltage and Frequency Scaling (DVFS) over a wide range is needed to meet these constraints in multicores, thus it is a very important factor for performance nowadays. Our work aims to gain a better understanding of performance-limiting factors in high performance multicores, it shall serve as a basis to avoid them and to find solutions to tune parallel applications

INRIA a CCSD electronic archive server