289 research outputs found

    Unified Datacenter Power Management Considering On-Chip and Air Temperature Constraints

    Get PDF
    The current approaches for datacenter power management (workload scheduling, CPU speed control, etc) focus primarily on maintaining the air temperature surrounding servers to be within the manufacturer specified constraint. This is problematic since several CPUs may still be violating the on-chip thermal constraint thereby leading to reliability loss. The primary objective of this work is to develop a unified approach for datacenter power optimization (by controlling the CPU speeds) which accounts for both the silicon level temperature of the VLSI components such as CPUs and the air temperature that directly impacts the reliability of other devices such as disks, and also the performance delivered. Our algorithm follows a two step approach: optimally solving a convex approximation that assigns continuous frequency values to all CPUs and a discretization step for legalization of the assigned frequencies. The experimental results indicate that our method guarantees both on-chip CPU and off-chip air temperature to be within temperature constraints. However, the traditional approach of constraining only air temperature will result in on-chip CPU temperature violation on about 40% of the CPUs, or 42% more power consumption to pull the CPU temperature back within constraint by increasing the HVAC cooling.NSF grant CCF 093786

    A unified model for holistic power usage in cloud datacenter servers

    Get PDF
    Cloud datacenters are compute facilities formed by hundreds and thousands of heterogeneous servers requiring significant power requirements to operate effectively. Servers are composed by multiple interacting sub-systems including applications, microelectronic processors, and cooling which reflect their respective power profiles via different parameters. What is presently unknown is how to accurately model the holistic power usage of the entire server when including all these sub-systems together. This becomes increasingly challenging when considering diverse utilization patterns, server hardware characteristics, air and liquid cooling techniques, and importantly quantifying the non-electrical energy cost imposed by cooling operation. Such a challenge arises due to the need for multi-disciplinary expertise required to study server operation holistically. This work provides a unified model for capturing holistic power usage within Cloud datacenter servers. Constructed through controlled laboratory experiments, the model captures the relationship of server power usage between software, hardware, and cooling agnostic of architecture and cooling type (air and liquid). An exciting prospect is the ability to quantify the amount of non-electrical power consumed through cooling, allowing for more realistic and accurate server power profiles. This work represents the first empirically supported analysis and modeling of holistic power usage for Cloud datacenter servers, and bridges a significant gap between computer science and mechanical engineering research. Model validation through experiments demonstrates an average standard error of 3% for server power usage within both air and liquid cooled environments

    Power Management Techniques for Data Centers: A Survey

    Full text link
    With growing use of internet and exponential growth in amount of data to be stored and processed (known as 'big data'), the size of data centers has greatly increased. This, however, has resulted in significant increase in the power consumption of the data centers. For this reason, managing power consumption of data centers has become essential. In this paper, we highlight the need of achieving energy efficiency in data centers and survey several recent architectural techniques designed for power management of data centers. We also present a classification of these techniques based on their characteristics. This paper aims to provide insights into the techniques for improving energy efficiency of data centers and encourage the designers to invent novel solutions for managing the large power dissipation of data centers.Comment: Keywords: Data Centers, Power Management, Low-power Design, Energy Efficiency, Green Computing, DVFS, Server Consolidatio

    Resource Management Algorithms for Computing Hardware Design and Operations: From Circuits to Systems

    Get PDF
    The complexity of computation hardware has increased at an unprecedented rate for the last few decades. On the computer chip level, we have entered the era of multi/many-core processors made of billions of transistors. With transistor budget of this scale, many functions are integrated into a single chip. As such, chips today consist of many heterogeneous cores with intensive interaction among these cores. On the circuit level, with the end of Dennard scaling, continuously shrinking process technology has imposed a grand challenge on power density. The variation of circuit further exacerbated the problem by consuming a substantial time margin. On the system level, the rise of Warehouse Scale Computers and Data Centers have put resource management into new perspective. The ability of dynamically provision computation resource in these gigantic systems is crucial to their performance. In this thesis, three different resource management algorithms are discussed. The first algorithm assigns adaptivity resource to circuit blocks with a constraint on the overhead. The adaptivity improves resilience of the circuit to variation in a cost-effective way. The second algorithm manages the link bandwidth resource in application specific Networks-on-Chip. Quality-of-Service is guaranteed for time-critical traffic in the algorithm with an emphasis on power. The third algorithm manages the computation resource of the data center with precaution on the ill states of the system. Q-learning is employed to meet the dynamic nature of the system and Linear Temporal Logic is leveraged as a tool to describe temporal constraints. All three algorithms are evaluated by various experiments. The experimental results are compared to several previous work and show the advantage of our methods

    A generic learning multi-agent-system approach for spatio-temporal-, thermal- and energy-aware scheduling

    Get PDF
    This paper proposes an agent based approach to the scheduling of jobs in data centers under thermal constraints. The model encompasses both temporal and spatial aspects of the temperature evolution using a unified model, taking into account the dynamics of heat production and dissipation. Agents coordinate to eventually move jobs to the best suitable place and to adapt dynamically the frequency settings of the nodes to the best combination. Several objectives of the agents are compared under different circumstances by an extensive set of experiments

    A Survey of FPGA Optimization Methods for Data Center Energy Efficiency

    Get PDF
    This article provides a survey of academic literature about field programmable gate array (FPGA) and their utilization for energy efficiency acceleration in data centers. The goal is to critically present the existing FPGA energy optimization techniques and discuss how they can be applied to such systems. To do so, the article explores current energy trends and their projection to the future with particular attention to the requirements set out by the European Code of Conduct for Data Center Energy Efficiency. The article then proposes a complete analysis of over ten years of research in energy optimization techniques, classifying them by purpose, method of application, and impacts on the sources of consumption. Finally, we conclude with the challenges and possible innovations we expect for this sector.Comment: Accepted for publication in IEEE Transactions on Sustainable Computin

    Resource Management Algorithms for Computing Hardware Design and Operations: From Circuits to Systems

    Get PDF
    The complexity of computation hardware has increased at an unprecedented rate for the last few decades. On the computer chip level, we have entered the era of multi/many-core processors made of billions of transistors. With transistor budget of this scale, many functions are integrated into a single chip. As such, chips today consist of many heterogeneous cores with intensive interaction among these cores. On the circuit level, with the end of Dennard scaling, continuously shrinking process technology has imposed a grand challenge on power density. The variation of circuit further exacerbated the problem by consuming a substantial time margin. On the system level, the rise of Warehouse Scale Computers and Data Centers have put resource management into new perspective. The ability of dynamically provision computation resource in these gigantic systems is crucial to their performance. In this thesis, three different resource management algorithms are discussed. The first algorithm assigns adaptivity resource to circuit blocks with a constraint on the overhead. The adaptivity improves resilience of the circuit to variation in a cost-effective way. The second algorithm manages the link bandwidth resource in application specific Networks-on-Chip. Quality-of-Service is guaranteed for time-critical traffic in the algorithm with an emphasis on power. The third algorithm manages the computation resource of the data center with precaution on the ill states of the system. Q-learning is employed to meet the dynamic nature of the system and Linear Temporal Logic is leveraged as a tool to describe temporal constraints. All three algorithms are evaluated by various experiments. The experimental results are compared to several previous work and show the advantage of our methods

    Energy Concerns with HPC Systems and Applications

    Full text link
    For various reasons including those related to climate changes, {\em energy} has become a critical concern in all relevant activities and technical designs. For the specific case of computer activities, the problem is exacerbated with the emergence and pervasiveness of the so called {\em intelligent devices}. From the application side, we point out the special topic of {\em Artificial Intelligence}, who clearly needs an efficient computing support in order to succeed in its purpose of being a {\em ubiquitous assistant}. There are mainly two contexts where {\em energy} is one of the top priority concerns: {\em embedded computing} and {\em supercomputing}. For the former, power consumption is critical because the amount of energy that is available for the devices is limited. For the latter, the heat dissipated is a serious source of failure and the financial cost related to energy is likely to be a significant part of the maintenance budget. On a single computer, the problem is commonly considered through the electrical power consumption. This paper, written in the form of a survey, we depict the landscape of energy concerns in computer activities, both from the hardware and the software standpoints.Comment: 20 page

    Thermal Energy Storage for Datacenters with Phase Change Materials

    Full text link
    Datacenters, vast warehouses containing millions of servers that run the internet and the cloud, have experienced double digit growth for almost two decades. Datacenters cost hundreds of millions of dollars, with the largest now exceeding over a billion dollars each, and consume enormous amounts of power–over 2% of all electricity in the US and projected to increase up to 10% by 2030. The impact of such high compute density, with thousands of individual compute nodes packed together in a small space, is heat: every watt of power used by servers must be removed form the datacenter. This requires active cooling: air cooling is by far the most common with an air conditioner or other form of heat exchanger cooling air in the datacenter room then transporting heat outside the facility to heat exchanger or similar fixture. Such a system is simple, common, and functional, but inherently inefficient due to the nature of datacenter workloads. Datacenters primarily server user facing workloads, that is: the user requests a search or sends and email and their query prompts load in the datacenter. The query is handled locally, on a relative geographic scale, to provide a low response time and positive user experience. This necessitates globally distributed datacenter capacity, but also creates a diurnal load pattern whereby datacenters are most heavily loaded during the peak hours when users in their region of service are awake and active online versus the off hours when users are offline or asleep and query requests are low. Because datacenter infrastructure must be provisioned for peak load, servers, power distribution, and cooling infrastructure is significantly underutilized most of the time. This dissertation investigates the cooling needs of datacenters, and proposes to decouple the work and cooling needs. Specifically, we hypothesize that by storing thermal energy we can reshape the thermal profile of a datacenter to better balance cooling load throughout the day. We call this technique Thermal Time Shifting (TTS). First, we discuss how phase change materials (PCMs) enable TTS and evaluate the potential use scenarios of placing a small amount of PCM inside of servers for thermal energy storage. Next we dive deeper into the potential of thermal energy storage and propose Virtual Melting Temperatures (VMT), a technique that uses active job placement to control the melting and cooling of PCM to enable a much greater degree of control over the behavior of the thermal profile. Finally we propose and evaluate Thermal Gradient Transfer (TGT), a technique that uses direct water cooling to move heat straight from CPUs and GPUs to the wax for wider applicability and greater peak cooling load reduction.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/147726/1/skachm_1.pdfDescription of skachm_1.pdf : Restricted to UM users only
    • …
    corecore