Search CORE

157 research outputs found

EClass: An execution classification approach to improving the energy-efficiency of software via machine learning

Author: Chan WK
Kan EYY
Tse TH
Publication venue: 'Elsevier BV'
Publication date: 01/01/2012
Field of study

Energy efficiency at the software level has gained much attention in the past decade. This paper presents a performance-aware frequency assignment algorithm for reducing processor energy consumption using Dynamic Voltage and Frequency Scaling (DVFS). Existing energy-saving techniques often rely on simplified predictions or domain knowledge to extract energy savings for specialized software (such as multimedia or mobile applications) or hardware (such as NPU or sensor nodes). We present an innovative framework, known as EClass, for general-purpose DVFS processors by recognizing short and repetitive utilization patterns efficiently using machine learning. Our algorithm is lightweight and can save up to 52.9% of the energy consumption compared with the classical PAST algorithm. It achieves an average savings of 9.1% when compared with an existing online learning algorithm that also utilizes the statistics from the current execution only. We have simulated the algorithms on a cycle-accurate power simulator. Experimental results show that EClass can effectively save energy for real life applications that exhibit mixed CPU utilization patterns during executions. Our research challenges an assumption among previous work in the research community that a simple and efficient heuristic should be used to adjust the processor frequency online. Our empirical result shows that the use of an advanced algorithm such as machine learning can not only compensate for the energy needed to run such an algorithm, but also outperforms prior techniques based on the above assumption. © 2011 Elsevier Inc. All rights reserved.postprin

HKU Scholars Hub

Exploiting Performance Counters to Predict and Improve Energy Performance of HPC Systems

Author: Da Costa Georges
Lefèvre Laurent
Pierson Jean-Marc
Stolf Patricia
Tsafack Chetsa Ghislain Landry
Publication venue: 'Elsevier BV'
Publication date: 08/08/2013
Field of study

International audienceHardware monitoring through performance counters is available on almost all modern processors. Although these counters are originally designed for performance tuning, they have also been used for evaluating power consumption. We propose two approaches for modelling and understanding the behaviour of high performance computing (HPC) systems relying on hardware monitoring counters. We evaluate the effectiveness of our system modelling approach considering both optimising the energy usage of HPC systems and predicting HPC applications' energy consumption as target objectives. Although hardware monitoring counters are used for modelling the system, other methods -- including partial phase recognition and cross platform energy prediction -- are used for energy optimisation and prediction. Experimental results for energy prediction demonstrate that we can accurately predict the peak energy consumption of an application on a target platform; whereas, results for energy optimisation indicate that with no a priori knowledge of workloads sharing the platform we can save up to 24\% of the overall HPC system's energy consumption under benchmarks and real-life workloads

HAL-ENS-LYON

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

Portable, scalable, per-core power estimation for intelligent resource management

Author: Bhadauria M
Cesati M
Gioiosa R
Goel B
McKee SA
Singh K
Publication venue: IEEE
Publication date: 01/01/2010
Field of study

Performance, power, and temperature are now all first-order design constraints. Balancing power efficiency, thermal constraints, and performance requires some means to convey data about real-time power consumption and temperature to intelligent resource managers. Resource managers can use this information to meet performance goals, maintain power budgets, and obey thermal constraints. Unfortunately, obtaining the required machine introspection is challenging. Most current chips provide no support for per-core power monitoring, and when support exists, it is not exposed to software. We present a methodology for deriving per-core power models using sampled performance counter values and temperature sensor readings. We develop application-independent models for four different (four- to eight-core) platforms, validate their accuracy, and show how they can be used to guide scheduling decisions in power-aware resource managers. Model overhead is negligible, and estimations exhibit 1.1%-5.2% per-suite median error on the NAS, SPEC OMP, and SPEC 2006 benchmarks (and 1.2%-4.4% overall)

Chalmers Research

Energy-efficient processor design using multiple clock domains with dynamic voltage and frequency scaling

Author: Balasubramonian Rajeev
Semeraro Greg
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2002
Field of study

Journal ArticleAs clock frequency increases and feature size decreases, clock distribution and wire delays present a growing challenge to the designers of singly-clocked, globally synchronous systems. We describe an alternative approach, which we call a Multiple Clock Domain (MCD) processor in which the chip is divided into several (coarse-grained) clock domains, within which independent voltage and frequency scaling can be performed. Boundaries between domains are chosen to exploit existing queues, thereby minimizing inter-domain synchronization costs. We propose four clock domains, corresponding to the front end (including LI instruction cache), integer units, floating point units, and load-store units (including Ll data cache and L2 cache). We evaluate this design using a simulation infrastructure based on SimpleScalar and Wattch. In an attempt to quantify potential energy savings independent of any particular on-line control strategy, we use of-line analysis of traces from a single-speed run of each of our benchmark applications to identify profitable reconfiguration points for a subsequent dynamic scaling run. Dynamic runs incorporate a detailed model of inter-domain synchronization delays, with latencies for intra-domain scaling similar to the whole-chip scaling latencies of Intel XScale and Transmeta LongRun technologies. Using applications from the MediaBench, Olden, and SPEC2000 benchmark suites, we obtain an average energy-delay product improvement of 20% with MCD compared to a modest 3% savings from voltage scaling a single clock and voltage system

Power Characterisation of Shared-Memory HPC Systems

Author: Balladini Javier
De Giusti Armando Eduardo
De Giusti Armando Eduardo
Luque Fadón Emilio
Naiouf Marcelo
Pesado Patricia Mabel
Rexachs del Rosario Dolores
Rucci Enzo
Simari Guillermo Ricardo
Suppi Remo
Publication venue: Editorial de la Universidad Nacional de La Plata (EDULP)
Publication date: 16/09/2019
Field of study

Energy consumption has become one of the greatest challenges in the field of High Performance Computing (HPC). Besides its impact on the environment, energy is a limiting factor for the HPC. Keeping the power consumption of a system below a threshold is one of the great problems; and power prediction can help to solve it. The power characterisation can be used to know the power behaviour of the system under study, and to be a support to reach the power prediction. Furthermore, it could be used to design power-aware application programs. In this article we propose a methodology to characterise the power consumption of shared-memory HPC systems. Our proposed methodology involves the finding of influence factors on power consumed by the systems. It is similar to previous works, but we propose an in-deep approach that can help us to get a better power characterisation of the system. We apply our methodology to characterise an Intel server platform and the results show that we can find a more extended set of influence factors on power consumption.Red de Universidades con Carreras en Informática (RedUNCI

Power Characterisation of Shared-Memory HPC Systems

Author: Balladini Javier
De Giusti Armando Eduardo
De Giusti Armando Eduardo
Luque Fadón Emilio
Naiouf Marcelo
Pesado Patricia Mabel
Rexachs del Rosario Dolores
Rucci Enzo
Simari Guillermo Ricardo
Suppi Remo
Publication venue: Editorial de la Universidad Nacional de La Plata (EDULP)
Publication date: 31/03/2013
Field of study

Network Processors and Next Generation Networks: Design, Applications, and Perspectives

Author: VITUCCI FABIO
Publication venue: 'Pisa University Press'
Publication date: 11/04/2008
Field of study

Network Processors (NPs) are hardware platforms born as appealing solutions for packet processing devices in networking applications. Nowadays, a plethora of solutions exists, with no agreement on a common architecture. Each vendor has proposed its specific solution and no official standard still exists. The common features of all proposals are a hierarchy of processors, with a general purpose processor and several units specialized for packet processing, a series of memory devices with different sizes and latencies, a low-level programmability. The target is a platform for networking applications with low time to market and high time in market, thanks to a high flexibility and a programmability simpler than that of ASICs, for example. After about ten years since the "birth" of network processors, this research activity wants to make an analytical balance of their development and usage. Many authoritative opinions suggest that NPs have been "outdated" by multicore or manycore systems, which provide general purpose environments and some specialized cores. The main reasons of these negative opinions are the hard programmability of NPs, which often requires the knowledge of private microcode, or the excessive architectural limits, such as reduced memories and minimal instruction store. Our research shows that Network Processors can be appealing for different applications in networking area, and many interesting solutions can be obtained, which present very high performance, outscoring current solutions. However, the issues of hard programming and remarkable limits exist, and they could be alleviated only by providing almost a comprehensive programming environment and a proper design in terms of processing and memory resources. More e cient solutions can be surely provided, but the experience of network processors has produced an important legacy in developing packet processing engines. In this work, we have realized many devices for networking purposes based on NP platform, in order to understand the complexity of programming, the flexibility of design, the complexity of tasks that can be implemented, the maximum depth of packet processing, the performance of such devices, the real usefulness of NPs in network devices. All these features have been accurately analyzed and will be illustrated in this thesis. Many remarkable results have been obtained, which confirm the Network Processors as appealing solutions for network devices. Moreover, the research on NPs have lead us to analyze and solve more general issues, related for instance to multiprocessor systems or to processors with no big available memory. In particular, the latter issue lead us to design many interesting data structures for set representation and membership query, which are based on randomized techniques and allow for big memory savings

Automatic Performance Setting for Dynamic Voltage Scaling

Author: Flautner Krisztián
Mudge Trevor N.
Reinhardt Steve
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2001
Field of study

The emphasis on processors that are both low power and high performance has resulted in the incorporation of dynamic voltage scaling into processor designs. This feature allows one to make fine granularity tradeoffs between power use and performance, provided there is a mechanism in the OS to control that tradeoff. In this paper, we describe a novel software approach to automatically controlling dynamic voltage scaling in order to optimize energy use. Our mechanism is implemented in the Linux kernel and requires no modification of user programs. Unlike previous automated approaches, our method works equally well with irregular and multiprogrammed workloads. Moreover, it has the ability to ensure that the quality of interactive performance is within user specified parameters. Our experiments show that as a result of our algorithm, processor energy savings of as much as 75% can be achieved with only a minimal impact on the user experience.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/41391/1/11276_2004_Article_5091297.pd

CiteSeerX