5,578 research outputs found
Systematic energy characterization of CMP/SMT processor systems via automated micro-benchmarks
Microprocessor-based systems today are composed of multi-core, multi-threaded processors with complex cache hierarchies and gigabytes of main memory. Accurate characterization of such a system, through predictive pre-silicon modeling and/or diagnostic postsilicon measurement based analysis are increasingly cumbersome and error prone. This is especially true of energy-related characterization studies. In this paper, we take the position that automated micro-benchmarks generated with particular objectives in mind hold the key to obtaining accurate energy-related characterization. As such, we first present a flexible micro-benchmark generation framework (MicroProbe) that is used to probe complex multi-core/multi-threaded systems with a variety and range of energy-related queries in mind. We then present experimental results centered around an
IBM POWER7 CMP/SMT system to demonstrate how the systematically generated micro-benchmarks can be used to answer three
specific queries: (a) How to project application-specific (and if needed, phase-specific) power consumption with component-wise breakdowns? (b) How to measure energy-per-instruction (EPI) values for the target machine? (c) How to bound the worst-case (maximum) power consumption in order to determine safe, but practical (i.e. affordable) packaging or cooling solutions? The solution approaches to the above problems are all new. Hardware measurement
based analysis shows superior power projection accuracy (with error margins of less than 2.3% across SPEC CPU2006) as well as max-power stressing capability (with 10.7% increase in processor power over the very worst-case power seen during the execution of SPEC CPU2006 applications).Peer ReviewedPostprint (author’s final draft
On the accuracy and usefulness of analytic energy models for contemporary multicore processors
This paper presents refinements to the execution-cache-memory performance
model and a previously published power model for multicore processors. The
combination of both enables a very accurate prediction of performance and
energy consumption of contemporary multicore processors as a function of
relevant parameters such as number of active cores as well as core and Uncore
frequencies. Model validation is performed on the Sandy Bridge-EP and
Broadwell-EP microarchitectures. Production-related variations in chip quality
are demonstrated through a statistical analysis of the fit parameters obtained
on one hundred Broadwell-EP CPUs of the same model. Insights from the models
are used to explain the performance- and energy-related behavior of the
processors for scalable as well as saturating (i.e., memory-bound) codes. In
the process we demonstrate the models' capability to identify optimal operating
points with respect to highest performance, lowest energy-to-solution, and
lowest energy-delay product and identify a set of best practices for
energy-efficient execution
A Survey on Compiler Autotuning using Machine Learning
Since the mid-1990s, researchers have been trying to use machine-learning
based approaches to solve a number of different compiler optimization problems.
These techniques primarily enhance the quality of the obtained results and,
more importantly, make it feasible to tackle two main compiler optimization
problems: optimization selection (choosing which optimizations to apply) and
phase-ordering (choosing the order of applying optimizations). The compiler
optimization space continues to grow due to the advancement of applications,
increasing number of compiler optimizations, and new target architectures.
Generic optimization passes in compilers cannot fully leverage newly introduced
optimizations and, therefore, cannot keep up with the pace of increasing
options. This survey summarizes and classifies the recent advances in using
machine learning for the compiler optimization field, particularly on the two
major problems of (1) selecting the best optimizations and (2) the
phase-ordering of optimizations. The survey highlights the approaches taken so
far, the obtained results, the fine-grain classification among different
approaches and finally, the influential papers of the field.Comment: version 5.0 (updated on September 2018)- Preprint Version For our
Accepted Journal @ ACM CSUR 2018 (42 pages) - This survey will be updated
quarterly here (Send me your new published papers to be added in the
subsequent version) History: Received November 2016; Revised August 2017;
Revised February 2018; Accepted March 2018
Energy consumption in networks on chip : efficiency and scaling
Computer architecture design is in a new era where performance is increased by replicating processing cores on a chip rather than making CPUs larger and faster. This design strategy is motivated by the superior energy efficiency of the multi-core architecture compared to the traditional monolithic CPU. If the trend continues as expected, the number of cores on a chip is predicted to grow exponentially over time as the density of transistors on a die increases. A major challenge to the efficiency of multi-core chips is the energy used for communication among cores over a Network on Chip (NoC). As the number of cores increases, this energy also increases, imposing serious constraints on design and performance of both applications and architectures. Therefore, understanding the impact of different design choices on NoC power and energy consumption is crucial to the success of the multi- and many-core designs. This dissertation proposes methods for modeling and optimizing energy consumption in multi- and many-core chips, with special focus on the energy used for communication on the NoC. We present a number of tools and models to optimize energy consumption and model its scaling behavior as the number of cores increases. We use synthetic traffic patterns and full system simulations to test and validate our methods. Finally, we take a step back and look at the evolution of computer hardware in the last 40 years and, using a scaling theory from biology, present a predictive theory for power-performance scaling in microprocessor systems
Worst-case energy consumption: A new challenge for battery-powered critical devices
The number of devices connected to the IoT is on the rise, reaching hundreds of billions in the next years. Many devices will implement some type of critical functionality, for instance in the medical market. Energy awareness is mandatory in the design of IoT devices because of their huge impact on worldwide energy consumption and the fact that many of them are battery powered. Critical IoT devices further require addressing new energy-related challenges. On the one hand, factoring in the impact of energy-solutions on device's performance, providing evidence of adherence to domain-specific safety standards. On the other hand, deriving safe worst-case energy consumption (WCEC) estimates is a fundamental building block to ensure the system can continuously operate under a pre-established set of power/energy caps, safely delivering its critical functionality. We analyze for the first time the impact that different hardware physical parameters have on both model-based and measurement-based WCEC modeling, for which we also show the main challenges they face compared to chip manufacturers' current practice for energy modeling and validation. Under the set of constraints that emanate from how certain physical parameters can be actually modeled, we show that measurement-based WCEC is a promising way forward for WCEC estimation.This work has been partially supported by the Spanish Ministry of Economy and Competitiveness (MINECO) under grant TIN2015- 65316-P and the HiPEAC Network of Excellence. Jaume Abella has been partially supported by the MINECO under Ramon y Cajal postdoctoral fellowship number RYC-2013-14717. Carles Hernndez is jointly funded by the MINECO and FEDER funds through grant TIN2014-60404-JIN.Peer ReviewedPostprint (author's final draft
Measuring and Analyzing Energy Consumption of the Data Center
Data centers are continuously expanding, so does the energy consumed to power their infrastructure. Server is the major component of data center’s computer rooms, which runs the most intensive computational workloads and stores the data. Server is responsible for more than a quarter of the total energy consumption of data center. This thesis is focused on analyzing and predicting the energy consumption of the server. Three major components are considered in our study; the processor, the access memory and the network interface controller. We collect data from these components and analyze them using linear regression Lasso model with non-negative coefficients. A power model is proposed for predicting energy consumption at the system-level. The model takes as input CPU cycles and data Translation Lookaside Buffer loads, and predicts the energy consumption of the server with 5.33% median error regardless of its workload
Predictable GPUs Frequency Scaling for Energy and Performance
Dynamic voltage and frequency scaling (DVFS) is an important solution to balance performance and energy consumption, and hardware vendors provide management libraries that allow the programmer to change both memory and core frequencies. The possibility to manually set these frequencies is a great opportunity for application tuning, which can focus on the best applicationdependent setting. However, this task is not straightforward because of the large set of possible configurations and because of the multi-objective nature of the problem, which minimizes energy consumption and maximizes performance.
This paper proposes a method to predict the best core and memory frequency configurations on GPUs for an input OpenCL kernel. Our modeling approach, based on machine learning, first predicts speedup and normalized energy over the default frequency configuration. Then, it combines the two models into a multi-objective one that predicts a Pareto-set of frequency configurations. The approach uses static code features, is built on a set of carefully designed microbenchmarks, and can predict the best frequency settings of a new kernel without executing it. Test results show that our modeling approach is very accurate on predicting extrema points and Pareto set for ten out of twelve test benchmarks, and discover frequency configurations that dominate the default configuration in either energy or performance.DFG, 360291326, CELERITY: Innovative Modellierung fĂĽr Skalierbare Verteilte Laufzeitsystem
- …