Search CORE

25 research outputs found

Energy Measurements of High Performance Computing Systems: From Instrumentation to Analysis

Author: Ilsche Thomas
Publication venue
Publication date: 31/07/2020
Field of study

Energy efficiency is a major criterion for computing in general and High Performance Computing in particular. When optimizing for energy efficiency, it is essential to measure the underlying metric: energy consumption. To fully leverage energy measurements, their quality needs to be well-understood. To that end, this thesis provides a rigorous evaluation of various energy measurement techniques. I demonstrate how the deliberate selection of instrumentation points, sensors, and analog processing schemes can enhance the temporal and spatial resolution while preserving a well-known accuracy. Further, I evaluate a scalable energy measurement solution for production HPC systems and address its shortcomings. Such high-resolution and large-scale measurements present challenges regarding the management of large volumes of generated metric data. I address these challenges with a scalable infrastructure for collecting, storing, and analyzing metric data. With this infrastructure, I also introduce a novel persistent storage scheme for metric time series data, which allows efficient queries for aggregate timelines. To ensure that it satisfies the demanding requirements for scalable power measurements, I conduct an extensive performance evaluation and describe a productive deployment of the infrastructure. Finally, I describe different approaches and practical examples of analyses based on energy measurement data. In particular, I focus on the combination of energy measurements and application performance traces. However, interweaving fine-grained power recordings and application events requires accurately synchronized timestamps on both sides. To overcome this obstacle, I develop a resilient and automated technique for time synchronization, which utilizes crosscorrelation of a specifically influenced power measurement signal. Ultimately, this careful combination of sophisticated energy measurements and application performance traces yields a detailed insight into application and system energy efficiency at full-scale HPC systems and down to millisecond-range regions.:1 Introduction 2 Background and Related Work 2.1 Basic Concepts of Energy Measurements 2.1.1 Basics of Metrology 2.1.2 Measuring Voltage, Current, and Power 2.1.3 Measurement Signal Conditioning and Analog-to-Digital Conversion 2.2 Power Measurements for Computing Systems 2.2.1 Measuring Compute Nodes using External Power Meters 2.2.2 Custom Solutions for Measuring Compute Node Power 2.2.3 Measurement Solutions of System Integrators 2.2.4 CPU Energy Counters 2.2.5 Using Models to Determine Energy Consumption 2.3 Processing of Power Measurement Data 2.3.1 Time Series Databases 2.3.2 Data Center Monitoring Systems 2.4 Influences on the Energy Consumption of Computing Systems 2.4.1 Processor Power Consumption Breakdown 2.4.2 Energy-Efficient Hardware Configuration 2.5 HPC Performance and Energy Analysis 2.5.1 Performance Analysis Techniques 2.5.2 HPC Performance Analysis Tools 2.5.3 Combining Application and Power Measurements 2.6 Conclusion 3 Evaluating and Improving Energy Measurements 3.1 Description of the Systems Under Test 3.2 Instrumentation Points and Measurement Sensors 3.2.1 Analog Measurement at Voltage Regulators 3.2.2 Instrumentation with Hall Effect Transducers 3.2.3 Modular Instrumentation of DC Consumers 3.2.4 Optimal Wiring for Shunt-Based Measurements 3.2.5 Node-Level Instrumentation for HPC Systems 3.3 Analog Signal Conditioning and Analog-to-Digital Conversion 3.3.1 Signal Amplification 3.3.2 Analog Filtering and Analog-To-Digital Conversion 3.3.3 Integrated Solutions for High-Resolution Measurement 3.4 Accuracy Evaluation and Calibration 3.4.1 Synthetic Workloads for Evaluating Power Measurements 3.4.2 Improving and Evaluating the Accuracy of a Single-Node Measuring System 3.4.3 Absolute Accuracy Evaluation of a Many-Node Measuring System 3.5 Evaluating Temporal Granularity and Energy Correctness 3.5.1 Measurement Signal Bandwidth at Different Instrumentation Points 3.5.2 Retaining Energy Correctness During Digital Processing 3.6 Evaluating CPU Energy Counters 3.6.1 Energy Readouts with RAPL 3.6.2 Methodology 3.6.3 RAPL on Intel Sandy Bridge-EP 3.6.4 RAPL on Intel Haswell-EP and Skylake-SP 3.7 Conclusion 4 A Scalable Infrastructure for Processing Power Measurement Data 4.1 Requirements for Power Measurement Data Processing 4.2 Concepts and Implementation of Measurement Data Management 4.2.1 Message-Based Communication between Agents 4.2.2 Protocols 4.2.3 Application Programming Interfaces 4.2.4 Efficient Metric Time Series Storage and Retrieval 4.2.5 Hierarchical Timeline Aggregation 4.3 Performance Evaluation 4.3.1 Benchmark Hardware Specifications 4.3.2 Throughput in Symmetric Configuration with Replication 4.3.3 Throughput with Many Data Sources and Single Consumers 4.3.4 Temporary Storage in Message Queues 4.3.5 Persistent Metric Time Series Request Performance 4.3.6 Performance Comparison with Contemporary Time Series Storage Solutions 4.3.7 Practical Usage of MetricQ 4.4 Conclusion 5 Energy Efficiency Analysis 5.1 General Energy Efficiency Analysis Scenarios 5.1.1 Live Visualization of Power Measurements 5.1.2 Visualization of Long-Term Measurements 5.1.3 Integration in Application Performance Traces 5.1.4 Graphical Analysis of Application Power Traces 5.2 Correlating Power Measurements with Application Events 5.2.1 Challenges for Time Synchronization of Power Measurements 5.2.2 Reliable Automatic Time Synchronization with Correlation Sequences 5.2.3 Creating a Correlation Signal on a Power Measurement Channel 5.2.4 Processing the Correlation Signal and Measured Power Values 5.2.5 Common Oversampling of the Correlation Signals at Different Rates 5.2.6 Evaluation of Correlation and Time Synchronization 5.3 Use Cases for Application Power Traces 5.3.1 Analyzing Complex Power Anomalies 5.3.2 Quantifying C-State Transitions 5.3.3 Measuring the Dynamic Power Consumption of HPC Applications 5.4 Conclusion 6 Summary and Outloo

Technische Universität Dresden: Qucosa

HAEC-SIM: A Simulation Framework for Highly Adaptive Energy-Efficient Computing Platforms

Author: Bielert Mario
Ciorba Florina M.
Feldhoff Kim
Ilsche Thomas
Nagel Wolfgang E.
Publication venue: 'European Alliance for Innovation n.o.'
Publication date: 01/01/2015
Field of study

This work presents a new trace-based parallel discrete event simulation framework designed for predicting the behavior of a novel computing platform running energy-aware parallel applications. Discrete event traces capture the runtime be- havior of parallel applications on existing systems and form the basis for the simulation. The simulation framework pro- cesses the events of the input trace by applying simulation models that modify event properties. Thus, the output are again event traces that describe the predicted application behavior on the simulated target platform. Both input and simulated traces can be visualized and analyzed with estab- lished tools. The modular design of the framework enables the simulation of different aspects such as temporal perfor- mance and energy efficiency by applying distinct simulation models e.g.: (i) A performance model for communication that allows to evaluate the target communication topology and link properties. (ii) An energy model for computations that is based on measurements of current hardware. We showcase the potential of this simulation by simulating the execution of benchmark applications to explore design al- ternatives of highly adaptive and energy-efficient computing applications and platforms

Crossref

edoc

Directory of Open Access Journals

Design Evaluation of a Performance Analysis Trace Repository

Author: Grunzke Richard
Hartmann Volker
Ilsche Thomas
Jejkal Thomas
Knüpfer Andreas
Nagel Wolfgang E.
Neumann Maximilian
Stotzka Rainer
Publication venue: Elsevier
Publication date: 01/01/2017
Field of study

Crossref

KITopen

Extending the Functionality of Score-P through Plugins: Interfaces and Use Cases

Author: Hackenberg Daniel
Ilsche Thomas
Nagel Wolfgang E.
Schuchart Joseph
Schöne Robert
Tschüter Ronny
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Performance measurement and runtime tuning tools are both vital in the HPC software ecosystem and use similar techniques: the analyzed application is interrupted at specific events and information on the current system state is gathered to be either recorded or used for tuning. One of the established performance measurement tools is Score-P. It supports numerous HPC platforms and parallel programming paradigms. To extend Score-P with support for different back-ends, create a common framework for measurement and tuning of HPC applications, and to enable the re-use of common software components such as implemented instrumentation techniques, this paper makes the following contributions: (I) We describe the Score-P metric plugin interface, which enables programmers to augment the event stream with metric data from supplementary data sources that are otherwise not accessible for Score-P. (II) We introduce the flexible Score-P substrate plugin interface that can be used for custom processing of the event stream according to the specific requirements of either measurement, analysis, or runtime tuning tasks. (III) We provide examples for both interfaces that extend Score-P’s functionality for monitoring and tuning purposes

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Crossref

Technische Universität Dresden: Qucosa

Verification of Resilient Communication Models for the Simulation of a Highly Adaptive Energy-Efficient Computer

Author: Bielert Mario
Ciorba Florina M.
Feldhoff Kim
Franz Elke
Ilsche Thomas
Nagel Wolfgang E.
Pfennig Stefan
Publication venue: 27th ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC 2015)
Publication date: 01/01/2015
Field of study

Delivering high performance in an energy-efficient manner is of great importance in conducting research in computational sciences and in daily use of technology. From a computing perspective, a novel concept (the HAEC Box) has been proposed that utilizes innovative ideas of optical and wireless chip-to-chip communication to allow a new level of runtime adaptivity for future computers, which is required to achieve high performance and energy efficiency. HAEC-SIM is an integrated simulation environment designed for the study of the performance and energy costs of the HAEC Box running communication-intensive applications. In this work, we conduct a verification of the implementation of three resilient communication models in HAEC-SIM. The verification involves two NAS Parallel Benchmarks and their simulated execution on a 3D torus system with 16x16x16 nodes with Infiniband links. The simulation results are consistent with those of an independent implementation. Thus, the HAEC-SIM based simulations are accurate in this regard. Delivering high performance in an energy-efficient manner is of great importance in conducting research in computational sciences and in daily use of technology. From a computing perspective, a novel concept (the HAEC Box) has been proposed that utilizes innovative ideas of optical and wireless chip-to-chip communication to allow a new level of runtime adaptivity for future computers, which is required to achieve high performance and energy efficiency. HAEC-SIM is an integrated simulation environment designed for the study of the performance and energy costs of the HAEC Box running communication-intensive applications.In this work, we conduct a verification of the implementation of three resilient communication models in HAEC-SIM. The verification involves two NAS Parallel Benchmarks and their simulated execution on a 3D torus system with 16x16x16 nodes with Infiniband links. The simulation results are consistent with those of an independent implementation.Thus, the HAEC-SIM based simulations are accurate in this regard

edoc

Energy-Efficient Databases Using Sweet Spot Frequencies

Author: Aßmann Uwe
Cardoso Jorge
Götz Sebastian
Ilsche Thomas
Kissinger Thomas
Lehner Wolfgang
Nagel Wolfgang E.
Schill Alexander
Spillner Josef
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/01/2023
Field of study

Database management systems (DBMS) are typically tuned for high performance and scalability. Nevertheless, carbon footprint and energy efficiency are also becoming increasing concerns. Unfortunately, existing studies mainly present theoretical contributions but fall short on proposing practical techniques. These could be used by administrators or query optimizers to increase the energy efficiency of the DBMS. Thus, this paper explores the effect of so-called sweet spots, which are energy-efficient CPU frequencies, on the energy required to execute queries. From our findings, we derive the Sweet Spot Technique, which relies on identifying energy-efficient sweet spots and the optimal number of threads that minimizes energy consumption for a query or an entire database workload. The technique is simple and has a practical implementation leading to energy savings of up to 50% compared to using the nominal frequency and maximum number of threads

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Technische Universität Dresden: Qucosa

Dynamic Fine-Grained Scheduling for Energy-Efficient Main-Memory Queries

Author: Ailamaki Anastasia
Ilsche Thomas
Kissinger Thomas
Lehner Wolfgang
Liarou Erietta
Porobic Danica
Psaroudakis Iraklis
Tözün Pinar
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

Power and cooling costs are some of the highest costs in data centers today, which make improvement in energy efficiency crucial. Energy efficiency is also a major design point for chips that power whole ranges of computing devices. One important goal in this area is energy proportionality, arguing that the system's power consumption should be proportional to its performance. Currently, a major trend among server processors, which stems from the design of chips for mobile devices, is the inclusion of advanced power management techniques, such as dynamic voltage-frequency scaling, clock gating, and turbo modes. A lot of recent work on energy efficiency of database management systems is focused on coarse-grained power management at the granularity of multiple machines and whole queries. These techniques, however, cannot efficiently adapt to the frequently fluctuating behavior of contemporary workloads. In this paper, we argue that databases should employ a fine-grained approach by dynamically scheduling tasks using precise hardware models. These models can be produced by calibrating operators under different combinations of scheduling policies, parallelism, and memory access strategies. The models can be employed at run-time for dynamic scheduling and power management in order to improve the overall energy efficiency. We experimentally show that energy efficiency can be improved by up to 4x for fundamental memory-intensive database operations, such as scans

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Energy Measurements of High Performance Computing Systems: From Instrumentation to Analysis

Author: Ilsche Thomas
Publication venue
Publication date: 31/07/2020
Field of study

HSSS - Hochschulschriftenserver der SLUB