Search CORE

3 research outputs found

Phase-based tuning: better utilized performance asymmetric multicores

Author: Sondag Tyler
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2011
Field of study

The latest trend towards performance asymmetry among cores on a single chip of a multicore processor is posing new challenges. For effective utilization of these performance-asymmetric multicore processors, code sections of a program must be assigned to cores such that the resource needs of code sections closely matches resource availability at the assigned core. Determining this assignment manually is tedious, error prone, and significantly complicates software development. To solve this problem, this thesis describes a transparent and fully-automatic process called phase-based tuning which adapts an application to effectively utilize performance-asymmetric multicores. The basic idea behind this technique is to statically compute groups of program segments which are expected to behave similarly at runtime. Then, at runtime, the behavior of a few code segments is used to infer the behavior and preferred core assignment of all similar code segments with low overhead. Compared to the stock Linux scheduler, for systems asymmetric with respect to clock frequency, a 36% average process speedup is observed, while maintaining fairness and with negligible overheads. A key component to phase-based tuning is grouping program segments with similar behavior. The importance of various similarity metrics are likely to differ for each target asymmetric multicore processor. Determining groups using too many metrics may result in a grouping that differentiates between program segments based on irrelevant properties for a target machine. Using too few metrics may cause relevant metrics to be ignored thereby considering segments with different behavior similar. Therefore, to solve this problem and enable phase-based tuning for a wide range of a performance-asymmetric multicores, this thesis also describes a new technique called lazy grouping. Lazy grouping statically (at compile and install times) groups program segments that are expected to have similar behavior. The basic idea is to use extensive compile time analysis with intelligent install time (when the target system is known) group assignment. The accuracy of lazy grouping for a wide range of machines is shown to be more than 90% for nearly all target machines and asymmetric multicores

Digital Repository @ Iowa State University (ISU)

Integrated CPU and l2 cache voltage scaling using machine learning

Author: Alexandre Ferreira
Bruce Childers
Cosmin Rusu
Daniel Mosse
Frank Liberato
Nevine AbouGhazaleh
Rami Melhem
Ruibin Xu
Rusu C.
Weiser M.
Witten I. H.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Abstract Integrated CPU and L2 Cache Voltage Scaling using Machine Learning

Author: Alexandre Ferreira
Bruce Childers
Cosmin Rusu
Daniel Mossé
Frank Liberato
Nevine Aboughazaleh
Rami Melhem
Ruibin Xu
Publication venue
Publication date
Field of study

Embedded systems serve an emerging and diverse set of applications. As a result, more computational and storage capabilities are added to accommodate ever more demanding applications. Unfortunately, adding more resources typically comes on the expense of higher energy costs. New chip design with Multiple Clock Domains (MCD) opens the opportunity for fine-grain power management within the processor chip. When used with dynamic voltage scaling (DVS), we can control the voltage and power of each domain independently. A significant power and energy improvement has been shown when using MCD design in comparison to managing a single voltage domain for the whole chip, as in traditional chips with global DVS. In this paper, we propose PACSL a Power-Aware Compilerbased approach using Supervised Learning. PACSL automatically derives an integrated CPU-core and on-chip L2 cache DVS policy tailored to a specific system and workload. Our approach uses supervised machine learning to discover a policy, which relies on monitoring a few performance counters. We present our approach detailing the role of a compiler in constructing a custom power management policy. We also discuss some implementation issues associated with our technique. We show that PACSL improves on traditional power management techniques that are used in general MCD chips. Our technique saves 22 % on average (up to 46%) in energy-delay product over a DVS technique that applies independent DVS decisions in each domain. Compared to no-power management, our technique improves energy-delay product by 26 % on average (up to 64%)

CiteSeerX