289 research outputs found
Unified Datacenter Power Management Considering On-Chip and Air Temperature Constraints
The current approaches for datacenter power management (workload scheduling, CPU speed control, etc) focus primarily on maintaining
the air temperature surrounding servers to be within the manufacturer specified constraint. This is problematic since several CPUs may still be violating the on-chip thermal constraint thereby leading to reliability loss. The primary objective of this work is
to develop a unified approach for datacenter power optimization (by controlling the CPU speeds) which accounts for both the silicon
level temperature of the VLSI components such as CPUs and the air temperature that directly impacts the reliability of other devices
such as disks, and also the performance delivered. Our algorithm follows a two step approach: optimally solving a convex
approximation that assigns continuous frequency values to all CPUs and a discretization step for legalization of the assigned frequencies. The experimental results indicate that our method guarantees both on-chip CPU and off-chip air temperature to be within temperature constraints. However, the traditional approach of
constraining only air temperature will result in on-chip CPU temperature violation on about 40% of the CPUs, or 42% more power consumption to pull the CPU temperature back within constraint by increasing the HVAC cooling.NSF grant CCF 093786
A unified model for holistic power usage in cloud datacenter servers
Cloud datacenters are compute facilities formed by hundreds and thousands of heterogeneous servers requiring significant power requirements to operate effectively. Servers are composed by multiple interacting sub-systems including applications, microelectronic processors, and cooling which reflect their respective power profiles via different parameters. What is presently unknown is how to accurately model the holistic power usage of the entire server when including all these sub-systems together. This becomes increasingly challenging when considering diverse utilization patterns, server hardware characteristics, air and liquid cooling techniques, and importantly quantifying the non-electrical energy cost imposed by cooling operation. Such a challenge arises due to the need for multi-disciplinary expertise required to study server operation holistically. This work provides a unified model for capturing holistic power usage within Cloud datacenter servers. Constructed through controlled laboratory experiments, the model captures the relationship of server power usage between software, hardware, and cooling agnostic of architecture and cooling type (air and liquid). An exciting prospect is the ability to quantify the amount of non-electrical power consumed through cooling, allowing for more realistic and accurate server power profiles. This work represents the first empirically supported analysis and modeling of holistic power usage for Cloud datacenter servers, and bridges a significant gap between computer science and mechanical engineering research. Model validation through experiments demonstrates an average standard error of 3% for server power usage within both air and liquid cooled environments
Power Management Techniques for Data Centers: A Survey
With growing use of internet and exponential growth in amount of data to be
stored and processed (known as 'big data'), the size of data centers has
greatly increased. This, however, has resulted in significant increase in the
power consumption of the data centers. For this reason, managing power
consumption of data centers has become essential. In this paper, we highlight
the need of achieving energy efficiency in data centers and survey several
recent architectural techniques designed for power management of data centers.
We also present a classification of these techniques based on their
characteristics. This paper aims to provide insights into the techniques for
improving energy efficiency of data centers and encourage the designers to
invent novel solutions for managing the large power dissipation of data
centers.Comment: Keywords: Data Centers, Power Management, Low-power Design, Energy
Efficiency, Green Computing, DVFS, Server Consolidatio
Resource Management Algorithms for Computing Hardware Design and Operations: From Circuits to Systems
The complexity of computation hardware has increased at an unprecedented rate for the last few decades. On the computer chip level, we have entered the era of multi/many-core processors made of billions of transistors. With transistor budget of this scale, many functions are integrated into a single chip. As such, chips today consist of many heterogeneous cores with intensive interaction among these cores. On the circuit level, with the end of Dennard scaling, continuously shrinking process technology has imposed a grand challenge on power density. The variation of circuit further exacerbated the problem by consuming a substantial time margin. On the system level, the rise of Warehouse Scale Computers and Data Centers have put resource management into new perspective. The ability of dynamically provision computation resource in these gigantic systems is crucial to their performance. In this thesis, three different resource management algorithms are discussed. The first algorithm assigns adaptivity resource to circuit blocks with a constraint on the overhead. The adaptivity improves resilience of the circuit to variation in a cost-effective way. The second algorithm manages the link bandwidth resource in application specific Networks-on-Chip. Quality-of-Service is guaranteed for time-critical traffic in the algorithm with an emphasis on power. The third algorithm manages the computation resource of the data center with precaution on the ill states of the system. Q-learning is employed to meet the dynamic nature of the system and Linear Temporal Logic is leveraged as a tool to describe temporal constraints. All three algorithms are evaluated by various experiments. The experimental results are compared to several previous work and show the advantage of our methods
A generic learning multi-agent-system approach for spatio-temporal-, thermal- and energy-aware scheduling
This paper proposes an agent based approach to the scheduling of jobs in data centers under thermal constraints. The model encompasses both temporal and spatial aspects of the temperature evolution using a unified model, taking into account the dynamics of heat production and dissipation. Agents coordinate to eventually move jobs to the best suitable place and to adapt dynamically the frequency settings of the nodes to the best combination. Several objectives of the agents are compared under different circumstances by an extensive set of experiments
A Survey of FPGA Optimization Methods for Data Center Energy Efficiency
This article provides a survey of academic literature about field
programmable gate array (FPGA) and their utilization for energy efficiency
acceleration in data centers. The goal is to critically present the existing
FPGA energy optimization techniques and discuss how they can be applied to such
systems. To do so, the article explores current energy trends and their
projection to the future with particular attention to the requirements set out
by the European Code of Conduct for Data Center Energy Efficiency. The article
then proposes a complete analysis of over ten years of research in energy
optimization techniques, classifying them by purpose, method of application,
and impacts on the sources of consumption. Finally, we conclude with the
challenges and possible innovations we expect for this sector.Comment: Accepted for publication in IEEE Transactions on Sustainable
Computin
Resource Management Algorithms for Computing Hardware Design and Operations: From Circuits to Systems
The complexity of computation hardware has increased at an unprecedented rate for the last few decades. On the computer chip level, we have entered the era of multi/many-core processors made of billions of transistors. With transistor budget of this scale, many functions are integrated into a single chip. As such, chips today consist of many heterogeneous cores with intensive interaction among these cores. On the circuit level, with the end of Dennard scaling, continuously shrinking process technology has imposed a grand challenge on power density. The variation of circuit further exacerbated the problem by consuming a substantial time margin. On the system level, the rise of Warehouse Scale Computers and Data Centers have put resource management into new perspective. The ability of dynamically provision computation resource in these gigantic systems is crucial to their performance. In this thesis, three different resource management algorithms are discussed. The first algorithm assigns adaptivity resource to circuit blocks with a constraint on the overhead. The adaptivity improves resilience of the circuit to variation in a cost-effective way. The second algorithm manages the link bandwidth resource in application specific Networks-on-Chip. Quality-of-Service is guaranteed for time-critical traffic in the algorithm with an emphasis on power. The third algorithm manages the computation resource of the data center with precaution on the ill states of the system. Q-learning is employed to meet the dynamic nature of the system and Linear Temporal Logic is leveraged as a tool to describe temporal constraints. All three algorithms are evaluated by various experiments. The experimental results are compared to several previous work and show the advantage of our methods
Energy Concerns with HPC Systems and Applications
For various reasons including those related to climate changes, {\em energy}
has become a critical concern in all relevant activities and technical designs.
For the specific case of computer activities, the problem is exacerbated with
the emergence and pervasiveness of the so called {\em intelligent devices}.
From the application side, we point out the special topic of {\em Artificial
Intelligence}, who clearly needs an efficient computing support in order to
succeed in its purpose of being a {\em ubiquitous assistant}. There are mainly
two contexts where {\em energy} is one of the top priority concerns: {\em
embedded computing} and {\em supercomputing}. For the former, power consumption
is critical because the amount of energy that is available for the devices is
limited. For the latter, the heat dissipated is a serious source of failure and
the financial cost related to energy is likely to be a significant part of the
maintenance budget. On a single computer, the problem is commonly considered
through the electrical power consumption. This paper, written in the form of a
survey, we depict the landscape of energy concerns in computer activities, both
from the hardware and the software standpoints.Comment: 20 page
Thermal Energy Storage for Datacenters with Phase Change Materials
Datacenters, vast warehouses containing millions of servers that run the internet and the cloud, have experienced double digit growth for almost two decades. Datacenters cost hundreds of millions of dollars, with the largest now exceeding over a billion dollars each, and consume enormous amounts of power–over 2% of all electricity in the US and projected to increase up to 10% by 2030.
The impact of such high compute density, with thousands of individual compute nodes packed together in a small space, is heat: every watt of power used by servers must be removed form the datacenter. This requires active cooling: air cooling is by far the most common with an air conditioner or other form of heat exchanger cooling air in the datacenter room then transporting heat outside the facility to heat exchanger or similar fixture. Such a system is simple, common, and functional, but inherently inefficient due to the nature of datacenter workloads.
Datacenters primarily server user facing workloads, that is: the user requests a search or sends and email and their query prompts load in the datacenter. The query is handled locally, on a relative geographic scale, to provide a low response time and positive user experience. This necessitates globally distributed datacenter capacity, but also creates a diurnal load pattern whereby datacenters are most heavily loaded during the peak hours when users in their region of service are awake and active online versus the off hours when users are offline or asleep and query requests are low. Because datacenter infrastructure must be provisioned for peak load, servers, power distribution, and cooling infrastructure is significantly underutilized most of the time.
This dissertation investigates the cooling needs of datacenters, and proposes to decouple the work and cooling needs. Specifically, we hypothesize that by storing thermal energy we can reshape the thermal profile of a datacenter to better balance cooling load throughout the day. We call this technique Thermal Time Shifting (TTS). First, we discuss how phase change materials (PCMs) enable TTS and evaluate the potential use scenarios of placing a small amount of PCM inside of servers for thermal energy storage. Next we dive deeper into the potential of thermal energy storage and propose Virtual Melting Temperatures (VMT), a technique that uses active job placement to control the melting and cooling of PCM to enable a much greater degree of control over the behavior of the thermal profile. Finally we propose and evaluate Thermal Gradient Transfer (TGT), a technique that uses direct water cooling to move heat straight from CPUs and GPUs to the wax for wider applicability and greater peak cooling load reduction.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/147726/1/skachm_1.pdfDescription of skachm_1.pdf : Restricted to UM users only
- …