1,152 research outputs found
Run-Time Power Modelling in Embedded GPUs with Dynamic Voltage and Frequency Scaling
This paper investigates the application of a robust CPU-based power modelling
methodology that performs an automatic search of explanatory events derived
from performance counters to embedded GPUs. A 64-bit Tegra TX1 SoC is
configured with DVFS enabled and multiple CUDA benchmarks are used to train and
test models optimized for each frequency and voltage point. These optimized
models are then compared with a simpler unified model that uses a single set of
model coefficients for all frequency and voltage points of interest. To obtain
this unified model, a number of experiments are conducted to extract
information on idle, clock and static power to derive power usage from a single
reference equation. The results show that the unified model offers competitive
accuracy with an average 5\% error with four explanatory variables on the test
data set and it is capable to correctly predict the impact of voltage,
frequency and temperature on power consumption. This model could be used to
replace direct power measurements when these are not available due to hardware
limitations or worst-case analysis in emulation platforms
LEGaTO: first steps towards energy-efficient toolset for heterogeneous computing
LEGaTO is a three-year EU H2020 project which started in December 2017. The LEGaTO project will leverage task-based programming models to provide a software ecosystem for Made-in-Europe heterogeneous hardware composed of CPUs, GPUs, FPGAs and dataflow engines. The aim is to attain one order of magnitude energy savings from the edge to the converged cloud/HPC.Peer ReviewedPostprint (author's final draft
Performance and Power Analysis of HPC Workloads on Heterogenous Multi-Node Clusters
Performance analysis tools allow application developers to identify and characterize the inefficiencies that cause performance degradation in their codes, allowing for application optimizations. Due to the increasing interest in the High Performance Computing (HPC) community towards energy-efficiency issues, it is of paramount importance to be able to correlate performance and power figures within the same profiling and analysis tools. For this reason, we present a performance and energy-efficiency study aimed at demonstrating how a single tool can be used to collect most of the relevant metrics. In particular, we show how the same analysis techniques can be applicable on different architectures, analyzing the same HPC application on a high-end and a low-power cluster. The former cluster embeds Intel Haswell CPUs and NVIDIA K80 GPUs, while the latter is made up of NVIDIA Jetson TX1 boards, each hosting an Arm Cortex-A57 CPU and an NVIDIA Tegra X1 Maxwell GPU.The research leading to these results has received funding from the European Community’s Seventh Framework Programme [FP7/2007-2013] and Horizon 2020 under the Mont-Blanc projects [17], grant agreements n. 288777, 610402 and 671697. E.C. was partially founded by “Contributo 5 per mille assegnato all’Università degli Studi di Ferrara-dichiarazione dei redditi dell’anno 2014”. We thank the University of Ferrara and INFN Ferrara for the access to the COKA Cluster. We warmly thank the BSC tools group, supporting us for the smooth integration and test of our setup within Extrae and Paraver.Peer ReviewedPostprint (published version
A survey on run-time power monitors at the edge
Effectively managing energy and power consumption is crucial to the success of the design of any computing system, helping mitigate the efficiency obstacles given by the downsizing of the systems while also being a valuable step towards achieving green and sustainable computing. The quality of energy and power management is strongly affected by the prompt availability of reliable and accurate information regarding the power consumption for the different parts composing the target monitored system. At the same time, effective energy and power management are even more critical within the field of devices at the edge, which exponentially proliferated within the past decade with the digital revolution brought by the Internet of things. This manuscript aims to provide a comprehensive conceptual framework to classify the different approaches to implementing run-time power monitors for edge devices that appeared in literature, leading the reader toward the solutions that best fit their application needs and the requirements and constraints of their target computing platforms. Run-time power monitors at the edge are analyzed according to both the power modeling and monitoring implementation aspects, identifying specific quality metrics for both in order to create a consistent and detailed taxonomy that encompasses the vast existing literature and provides a sound reference to the interested reader
A survey of emerging architectural techniques for improving cache energy consumption
The search goes on for another ground breaking phenomenon to reduce the ever-increasing disparity between the CPU performance and storage. There are encouraging breakthroughs in enhancing CPU performance through fabrication technologies and changes in chip designs but not as much luck has been struck with regards to the computer storage resulting in material negative system performance. A lot of research effort has been put on finding techniques that can improve the energy efficiency of cache architectures. This work is a survey of energy saving techniques which are grouped on whether they save the dynamic energy, leakage energy or both. Needless to mention, the aim of this work is to compile a quick reference guide of energy saving techniques from 2013 to 2016 for engineers, researchers and students
Exceeding Conservative Limits: A Consolidated Analysis on Modern Hardware Margins
Modern large-scale computing systems (data centers, supercomputers, cloud and
edge setups and high-end cyber-physical systems) employ heterogeneous
architectures that consist of multicore CPUs, general-purpose many-core GPUs,
and programmable FPGAs. The effective utilization of these architectures poses
several challenges, among which a primary one is power consumption. Voltage
reduction is one of the most efficient methods to reduce power consumption of a
chip. With the galloping adoption of hardware accelerators (i.e., GPUs and
FPGAs) in large datacenters and other large-scale computing infrastructures, a
comprehensive evaluation of the safe voltage reduction levels for each
different chip can be employed for efficient reduction of the total power. We
present a survey of recent studies in voltage margins reduction at the system
level for modern CPUs, GPUs and FPGAs. The pessimistic voltage guardbands
inserted by the silicon vendors can be exploited in all devices for significant
power savings. On average, voltage reduction can reach 12% in multicore CPUs,
20% in manycore GPUs and 39% in FPGAs.Comment: Accepted for publication in IEEE Transactions on Device and Materials
Reliabilit
- …