Search CORE

19 research outputs found

Explicit uncore frequency scaling for energy optimisation policies with EAR in Intel architectures

Author: Alonso Jane Lluís
Aneas Gómez Jordi
Corbalán González Julita
Vidal Teruel Oriol
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

EAR is an energy management framework which offers three main services: energy accounting, energy control and energy optimisation. The latter is done through the EAR runtime library (EARL). EARL is a dynamic, transparent, and lightweight runtime library that provides energy optimisation and control. It implements energy optimisation policies that selects the optimal CPU frequency based on runtime application characteristics and policy settings. Given that EARL defines a policy API and a plugin mechanism, different policies can be easily evaluated. In this paper we propose and evaluate the utilisation of explicit Uncore Frequency Scaling (explicit UFS) in Intel architectures to increase the energy savings opportunities in the cases where the hardware cannot select the optimal frequency for the Integrated Memory Controller (IMC). We extended the min_energy_to_solution policy to select the CPU and IMC frequencies and we executed and evaluated it with some kernels and six real applications. Results showed an average energy saving of 9% with an average time penalty of 3%. On some use cases, the impact of explicit UFS compared with HW UFS was up to 8% of extra energy savings.This work has been funded by the BSC-Lenovo collaboration agreement.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Correlating workload behavior with core voltage variability on modern x86 architectures with machine learning

Author: Κανελλής Κωνσταντίνος Γ.
Publication venue
Publication date: 01/01/2018
Field of study

University of Thessaly Institutional Repository

Recommended from our members

Machine Learning for Architectural Design Space Exploration and Resource Control

Author: Penney Drew D.
Publication venue: 'Oregon State University'
Publication date
Field of study

Machine learning has enabled significant advancements in diverse fields, yet, with a few exceptions, has had limited impact on computer architecture. Recent work, however, has begun to explore broader application to design, optimization, and simulation. Notably, machine-learning-based strategies often surpass prior state-of-the-art analytical, heuristic, and human-expert approaches. This thesis first reviews existing work applying machine learning to architecture, ranging from simulation and run-time optimization, to individual component design involving the memory system, branch predictors, networks-on-chip, and GPUs. Next, the thesis presents a novel deep-reinforcement-learning framework for design space exploration. Finally, the thesis introduces an innovative strategy for resource optimization with multiple co-scheduled workloads. Taken together, these works present a promising future for machine-learning-based architectural design

ScholarsArchive@OSU

Enabling Hyperscale Web Services

Author: Sriraman Akshitha
Publication venue
Publication date: 01/01/2021
Field of study

Modern web services such as social media, online messaging, web search, video streaming, and online banking often support billions of users, requiring data centers that scale to hundreds of thousands of servers, i.e., hyperscale. In fact, the world continues to expect hyperscale computing to drive more futuristic applications such as virtual reality, self-driving cars, conversational AI, and the Internet of Things. This dissertation presents technologies that will enable tomorrow’s web services to meet the world’s expectations. The key challenge in enabling hyperscale web services arises from two important trends. First, over the past few years, there has been a radical shift in hyperscale computing due to an unprecedented growth in data, users, and web service software functionality. Second, modern hardware can no longer support this growth in hyperscale trends due to a decline in hardware performance scaling. To enable this new hyperscale era, hardware architects must become more aware of hyperscale software needs and software researchers can no longer expect unlimited hardware performance scaling. In short, systems researchers can no longer follow the traditional approach of building each layer of the systems stack separately. Instead, they must rethink the synergy between the software and hardware worlds from the ground up. This dissertation establishes such a synergy to enable futuristic hyperscale web services. This dissertation bridges the software and hardware worlds, demonstrating the importance of that bridge in realizing efficient hyperscale web services via solutions that span the systems stack. The specific goal is to design software that is aware of new hardware constraints and architect hardware that efficiently supports new hyperscale software requirements. This dissertation spans two broad thrusts: (1) a software and (2) a hardware thrust to analyze the complex hyperscale design space and use insights from these analyses to design efficient cross-stack solutions for hyperscale computation. In the software thrust, this dissertation contributes uSuite, the first open-source benchmark suite of web services built with a new hyperscale software paradigm, that is used in academia and industry to study hyperscale behaviors. Next, this dissertation uses uSuite to study software threading implications in light of today’s hardware reality, identifying new insights in the age-old research area of software threading. Driven by these insights, this dissertation demonstrates how threading models must be redesigned at hyperscale by presenting an automated approach and tool, uTune, that makes intelligent run-time threading decisions. In the hardware thrust, this dissertation architects both commodity and custom hardware to efficiently support hyperscale software requirements. First, this dissertation characterizes commodity hardware’s shortcomings, revealing insights that influenced commercial CPU designs. Based on these insights, this dissertation presents an approach and tool, SoftSKU, that enables cheap commodity hardware to efficiently support new hyperscale software paradigms, improving the efficiency of real-world web services that serve billions of users, saving millions of dollars, and meaningfully reducing the global carbon footprint. This dissertation also presents a hardware-software co-design, uNotify, that redesigns commodity hardware with minimal modifications by using existing hardware mechanisms more intelligently to overcome new hyperscale overheads. Next, this dissertation characterizes how custom hardware must be designed at hyperscale, resulting in industry-academia benchmarking efforts, commercial hardware changes, and improved software development. Based on this characterization’s insights, this dissertation presents Accelerometer, an analytical model that estimates gains from hardware customization. Multiple hyperscale enterprises and hardware vendors use Accelerometer to make well-informed hardware decisions.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/169802/1/akshitha_1.pd

Deep Blue Documents at the University of Michigan

Power Bounded Computing on Current & Emerging HPC Systems

Author: Zou Pengfei
Publication venue: Clemson University Libraries
Publication date: 01/05/2020
Field of study

Power has become a critical constraint for the evolution of large scale High Performance Computing (HPC) systems and commercial data centers. This constraint spans almost every level of computing technologies, from IC chips all the way up to data centers due to physical, technical, and economic reasons. To cope with this reality, it is necessary to understand how available or permissible power impacts the design and performance of emergent computer systems. For this reason, we propose power bounded computing and corresponding technologies to optimize performance on HPC systems with limited power budgets. We have multiple research objectives in this dissertation. They center on the understanding of the interaction between performance, power bounds, and a hierarchical power management strategy. First, we develop heuristics and application aware power allocation methods to improve application performance on a single node. Second, we develop algorithms to coordinate power across nodes and components based on application characteristic and power budget on a cluster. Third, we investigate performance interference induced by hardware and power contentions, and propose a contention aware job scheduling to maximize system throughput under given power budgets for node sharing system. Fourth, we extend to GPU-accelerated systems and workloads and develop an online dynamic performance & power approach to meet both performance requirement and power efficiency. Power bounded computing improves performance scalability and power efficiency and decreases operation costs of HPC systems and data centers. This dissertation opens up several new ways for research in power bounded computing to address the power challenges in HPC systems. The proposed power and resource management techniques provide new directions and guidelines to green exscale computing and other computing systems

Clemson University: TigerPrints

Energy Concerns with HPC Systems and Applications

Author: Dokladal Petr
Mesri Youssef
Nana Roblex
Tadonki Claude
Publication venue
Publication date: 31/08/2023
Field of study

For various reasons including those related to climate changes, {\em energy} has become a critical concern in all relevant activities and technical designs. For the specific case of computer activities, the problem is exacerbated with the emergence and pervasiveness of the so called {\em intelligent devices}. From the application side, we point out the special topic of {\em Artificial Intelligence}, who clearly needs an efficient computing support in order to succeed in its purpose of being a {\em ubiquitous assistant}. There are mainly two contexts where {\em energy} is one of the top priority concerns: {\em embedded computing} and {\em supercomputing}. For the former, power consumption is critical because the amount of energy that is available for the devices is limited. For the latter, the heat dissipated is a serious source of failure and the financial cost related to energy is likely to be a significant part of the maintenance budget. On a single computer, the problem is commonly considered through the electrical power consumption. This paper, written in the form of a survey, we depict the landscape of energy concerns in computer activities, both from the hardware and the software standpoints.Comment: 20 page

arXiv.org e-Print Archive

Recommended from our members

QoS-aware mechanisms for improving cost-efficiency of datacenters

Author: Zhu Haishan
Publication venue
Publication date: 22/01/2021
Field of study

Warehouse Scale Computers (WSCs) promise high cost-efficiency by amortizing power, cooling, and management overheads. WSCs today host a large variety of jobs with two broad performance requirements categories: latency-critical (LC) and best-effort (BE). Ideally, to fully utilize all hardware resources, WSC operators can simply fill all the nodes with computing jobs. Unfortunately, because colocated jobs contend for shared resources, systems with high loads often experience performance degradation, which negatively impacts the Quality of Service (QoS) for LC jobs. In fact, service providers usually over-provision resources to avoid any interference with LC jobs, leading to significant resource inefficiencies. In this dissertation, I explore opportunities across different system-abstraction layers to improve the cost-efficiency of dataceters by increasing resource utilization of WSCs with little or no impact on the performance of LC jobs. The dissertation has three main components. First, I explore opportunities to improve the throughput of multicore systems by reducing the performance variation of LC jobs. The main insight is that by reshaping the latency distribution curve, performance headroom of LC jobs can be effectively converted to improved BE throughput. I develop, implement, and evaluate a runtime system that achieves this goal with existing hardware. I leverage the cache partitioning, per-core frequency scaling, and thread masking of server processors. Evaluation results show the proposed solution enables 30% higher system throughput compared to solutions proposed in prior works while maintaining at least as good QoS for LC jobs. Second, I study resource contention in near-future heterogeneous memory architectures (HMA). This study is motivated by recent developments in non-volatile memory (NVM) technologies, which enable higher storage density at the cost of same performance. To understand the performance and QoS impact of HMAs, I design and implement a performance emulator in the Linux kernel that runs unmodified workloads with high accuracy, low overhead, and complete transparency. I further propose and evaluate multiple data and resource management QoS mechanisms, such as locality-aware page admission, occupancy management, and write buffer jailing. Third, I focus on accelerated machine learning (ML) systems. By profiling the performance of production workloads and accelerators, I show that accelerated ML tasks are highly sensitive to main memory interference due to fine-grained interaction between CPU and accelerator tasks. As a result, memory resource contention can significantly decreases the performance and efficiency gains of accelerators. I propose a runtime system that leverages existing hardware capabilities and show 17% higher system efficiency compared to previous approaches. This study further exposes opportunities for future processor architecturesElectrical and Computer Engineerin

Texas ScholarWorks

Recommended from our members

High-fidelity error injection and acceleration techniques

Author: Chang Chun-Kai
Publication venue
Publication date: 20/07/2021
Field of study

As technology scales down, the likelihood of hardware errors that silently corrupt the results of applications is increasing. Evaluating the resilience of applications against hardware errors is thus of signiﬁcant concern. Current evaluation techniques via error injection are either low-ﬁdelity or inefﬁcient in terms of using computing resources. This dissertation demonstrates that sophisticated integration of injectors across abstraction layers and novel sampling algorithms can signiﬁcantly improve both the ﬁdelity and eﬃciency. Speciﬁcally, this dissertation describes an open-source instruction-level error injector that generates high-ﬁdelity hardware errors due to particle strikes and voltage droops. Two acceleration techniques, nested Monte Carlo and Injection-Point Overprovisioning, are proposed to speed up error injection campaigns by 1−2 orders of magnitude. This dissertation also answers the question of when high-ﬁdelity is needed to evaluate the impact of hardware errors on applications and the eﬀectiveness of error detectors.Electrical and Computer Engineerin

Texas ScholarWorks

Dependable Embedded Systems

Author: Dutt Nikil
Henkel Jörg
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

This Open Access book introduces readers to many new techniques for enhancing and optimizing reliability in embedded systems, which have emerged particularly within the last five years. This book introduces the most prominent reliability concerns from today’s points of view and roughly recapitulates the progress in the community so far. Unlike other books that focus on a single abstraction level such circuit level or system level alone, the focus of this book is to deal with the different reliability challenges across different levels starting from the physical level all the way to the system level (cross-layer approaches). The book aims at demonstrating how new hardware/software co-design solution can be proposed to ef-fectively mitigate reliability degradation such as transistor aging, processor variation, temperature effects, soft errors, etc. Provides readers with latest insights into novel, cross-layer methods and models with respect to dependability of embedded systems; Describes cross-layer approaches that can leverage reliability through techniques that are pro-actively designed with respect to techniques at other layers; Explains run-time adaptation and concepts/means of self-organization, in order to achieve error resiliency in complex, future many core systems

OAPEN Library