57 research outputs found

    Dynamic Thermal and Power Management: From Computers to Buildings

    Get PDF
    Thermal and power management have become increasingly important for both computing and physical systems. Computing systems from real-time embedded systems to data centers require effective thermal and power management to prevent overheating and save energy. In the mean time, as a major consumer of energy buildings face challenges to reduce the energy consumption for air conditioning while maintaining comfort of occupants. In this dissertation we investigate dynamic thermal and power management for computer systems and buildings. (1) We present thermal control under utilization bound (TCUB), a novel control-theoretic thermal management algorithm designed for single core real-time embedded systems. A salient feature of TCUB is to maintain both desired processor temperature and real-time performance. (2) To address unique challenges posed by multicore processors, we develop the real-time multicore thermal control (RT-MTC) algorithm. RT-MTC employs a feedback control loop to enforce the desired temperature and CPU utilization of the multicore platform via dynamic frequency and voltage scaling. (3) We research dynamic thermal management for real-time services running on server clusters. We develop the control-theoretic thermal balancing (CTB) to dynamically balance temperature of servers via distributing clients\u27 service requests to servers. Next, (4) we propose CloudPowerCap, a power cap management system for virtualized cloud computing infrastructure. The novelty of CloudPowerCap lies in an integrated approach to coordinate power budget management and resource management in a cloud computing environment. Finally we expand our research to physical environment by exploring several fundamental problems of thermal and power management on buildings. We analyze spatial and temporal data acquired from an real-world auditorium instrumented by a multi-modal sensor network. We propose a data mining technique to determine the appropriate number and location of temperature sensors for estimating the spatiotemporal temperature distribution of the auditorium. Furthermore, we explore the potential energy savings that can be achieved through occupancy-based HVAC scheduling based on real occupancy data of the auditorium

    Power, Energy, and Thermal Management for Clustered Manycores

    Get PDF
    Efficient and effective system-level power, energy, and thermal management are very important issues in modern computing systems, for which clustered architectures with multiple voltage islands are an expected compromise between global and per-core DVFS. In this dissertation, we focus on two of the most relevant problems for such architectures, specifically, optimizing performance under power/thermal constraints, and minimizing energy under performance constraints

    ADAPTIVE POWER MANAGEMENT FOR COMPUTERS AND MOBILE DEVICES

    Get PDF
    Power consumption has become a major concern in the design of computing systems today. High power consumption increases cooling cost, degrades the system reliability and also reduces the battery life in portable devices. Modern computing/communication devices support multiple power modes which enable power and performance tradeoff. Dynamic power management (DPM), dynamic voltage and frequency scaling (DVFS), and dynamic task migration for workload consolidation are system level power reduction techniques widely used during runtime. In the first part of the dissertation, we concentrate on the dynamic power management of the personal computer and server platform where the DPM, DVFS and task migrations techniques are proved to be highly effective. A hierarchical energy management framework is assumed, where task migration is applied at the upper level to improve server utilization and energy efficiency, and DPM/DVFS is applied at the lower level to manage the power mode of individual processor. This work focuses on estimating the performance impact of workload consolidation and searching for optimal DPM/DVFS that adapts to the changing workload. Machine learning based modeling and reinforcement learning based policy optimization techniques are investigated. Mobile computing has been weaved into everyday lives to a great extend in recent years. Compared to traditional personal computer and server environment, the mobile computing environment is obviously more context-rich and the usage of mobile computing device is clearly imprinted with user\u27s personal signature. The ability to learn such signature enables immense potential in workload prediction and energy or battery life management. In the second part of the dissertation, we present two mobile device power management techniques which take advantage of the context-rich characteristics of mobile platform and make adaptive energy management decisions based on different user behavior. We firstly investigate the user battery usage behavior modeling and apply the model directly for battery energy management. The first technique aims at maximizing the quality of service (QoS) while keeping the risk of battery depletion below a given threshold. The second technique is an user-aware streaming strategies for energy efficient smartphone video playback applications (e.g. YouTube) that minimizes the sleep and wake penalty of cellular module and at the same time avoid the energy waste from excessive downloading. Runtime power and thermal management has attracted substantial interests in multi-core distributed embedded systems. Fast performance evaluation is an essential step in the research of distributed power and thermal management. In last part of the dissertation, we present an FPGA based emulator of multi-core distributed embedded system designed to support the research in runtime power/thermal management. Hardware and software supports are provided to carry out basic power/thermal management actions including inter-core or inter-FPGA communications, runtime temperature monitoring and dynamic frequency scaling

    Thermal-aware resource allocation in earliest deadline first using fluid scheduling

    Get PDF
    Thermal issues in microprocessors have become a major design constraint because of their adverse effects on the reliability, performance and cost of the system. This article proposes an improvement in earliest deadline first, a uni-processor scheduling algorithm, without compromising its optimality in order to reduce the thermal peaks and variations. This is done by introducing a factor of fairness to earliest deadline first algorithm, which introduces idle intervals during execution and allows uniform distribution of workload over the time. The technique notably lowers the number of context switches when compare with the previous thermal-aware scheduling algorithm based on the same amount of fairness. Although, the algorithm is proposed for uni-processor environment, it is also applicable to partitioned scheduling in multi-processor environment, which primarily converts the multi-processor scheduling problem to a set of uni-processor scheduling problem and thereafter uses a uni-processor scheduling technique for scheduling. The simulation results show that the proposed approach reduces up to 5% of the temperature peaks and variations in a uni-processor environment while reduces up to 7% and 6% of the temperature spatial gradient and the average temperature in multi-processor environment, respectively

    The Thermal-Constrained Real-Time Systems Design on Multi-Core Platforms -- An Analytical Approach

    Get PDF
    Over the past decades, the shrinking transistor size enabled more transistors to be integrated into an IC chip, to achieve higher and higher computing performances. However, the semiconductor industry is now reaching a saturation point of Moore’s Law largely due to soaring power consumption and heat dissipation, among other factors. High chip temperature not only significantly increases packing/cooling cost, degrades system performance and reliability, but also increases the energy consumption and even damages the chip permanently. Although designing 2D and even 3D multi-core processors helps to lower the power/thermal barrier for single-core architectures by exploring the thread/process level parallelism, the higher power density and longer heat removal path has made the thermal problem substantially more challenging, surpassing the heat dissipation capability of traditional cooling mechanisms such as cooling fan, heat sink, heat spread, etc., in the design of new generations of computing systems. As a result, dynamic thermal management (DTM), i.e. to control the thermal behavior by dynamically varying computing performance and workload allocation on an IC chip, has been well-recognized as an effective strategy to deal with the thermal challenges. Over the past decades, the shrinking transistor size, benefited from the advancement of IC technology, enabled more transistors to be integrated into an IC chip, to achieve higher and higher computing performances. However, the semiconductor industry is now reaching a saturation point of Moore’s Law largely due to soaring power consumption and heat dissipation, among other factors. High chip temperature not only significantly increases packing/cooling cost, degrades system performance and reliability, but also increases the energy consumption and even damages the chip permanently. Although designing 2D and even 3D multi-core processors helps to lower the power/thermal barrier for single-core architectures by exploring the thread/process level parallelism, the higher power density and longer heat removal path has made the thermal problem substantially more challenging, surpassing the heat dissipation capability of traditional cooling mechanisms such as cooling fan, heat sink, heat spread, etc., in the design of new generations of computing systems. As a result, dynamic thermal management (DTM), i.e. to control the thermal behavior by dynamically varying computing performance and workload allocation on an IC chip, has been well-recognized as an effective strategy to deal with the thermal challenges. Different from many existing DTM heuristics that are based on simple intuitions, we seek to address the thermal problems through a rigorous analytical approach, to achieve the high predictability requirement in real-time system design. In this regard, we have made a number of important contributions. First, we develop a series of lemmas and theorems that are general enough to uncover the fundamental principles and characteristics with regard to the thermal model, peak temperature identification and peak temperature reduction, which are key to thermal-constrained real-time computer system design. Second, we develop a design-time frequency and voltage oscillating approach on multi-core platforms, which can greatly enhance the system throughput and its service capacity. Third, different from the traditional workload balancing approach, we develop a thermal-balancing approach that can substantially improve the energy efficiency and task partitioning feasibility, especially when the system utilization is high or with a tight temperature constraint. The significance of our research is that, not only can our proposed algorithms on throughput maximization and energy conservation outperform existing work significantly as demonstrated in our extensive experimental results, the theoretical results in our research are very general and can greatly benefit other thermal-related research

    Energy-Efficient and Reliable Computing in Dark Silicon Era

    Get PDF
    Dark silicon denotes the phenomenon that, due to thermal and power constraints, the fraction of transistors that can operate at full frequency is decreasing in each technology generation. Moore’s law and Dennard scaling had been backed and coupled appropriately for five decades to bring commensurate exponential performance via single core and later muti-core design. However, recalculating Dennard scaling for recent small technology sizes shows that current ongoing multi-core growth is demanding exponential thermal design power to achieve linear performance increase. This process hits a power wall where raises the amount of dark or dim silicon on future multi/many-core chips more and more. Furthermore, from another perspective, by increasing the number of transistors on the area of a single chip and susceptibility to internal defects alongside aging phenomena, which also is exacerbated by high chip thermal density, monitoring and managing the chip reliability before and after its activation is becoming a necessity. The proposed approaches and experimental investigations in this thesis focus on two main tracks: 1) power awareness and 2) reliability awareness in dark silicon era, where later these two tracks will combine together. In the first track, the main goal is to increase the level of returns in terms of main important features in chip design, such as performance and throughput, while maximum power limit is honored. In fact, we show that by managing the power while having dark silicon, all the traditional benefits that could be achieved by proceeding in Moore’s law can be also achieved in the dark silicon era, however, with a lower amount. Via the track of reliability awareness in dark silicon era, we show that dark silicon can be considered as an opportunity to be exploited for different instances of benefits, namely life-time increase and online testing. We discuss how dark silicon can be exploited to guarantee the system lifetime to be above a certain target value and, furthermore, how dark silicon can be exploited to apply low cost non-intrusive online testing on the cores. After the demonstration of power and reliability awareness while having dark silicon, two approaches will be discussed as the case study where the power and reliability awareness are combined together. The first approach demonstrates how chip reliability can be used as a supplementary metric for power-reliability management. While the second approach provides a trade-off between workload performance and system reliability by simultaneously honoring the given power budget and target reliability

    Dynamic Thermal Management for Microprocessors

    Get PDF
    In deep submicron era, thermal hot spots and large temperature gradients significantly impact system reliability, performance, cost and leakage power. Dynamic thermal management techniques are designed to tackle the problems and control the chip temperature as well as power consumption. They refer to those techniques which enable the chip to autonomously modify the task execution and power dissipation characteristics so that lower-cost cooling solutions could be adopted while still guaranteeing safe temperature regulation. As long as the temperature is regulated, the system reliability can be improved, leakage power can be reduced and cooling system lifetime can be extended significantly. Multimedia applications are expected to form the largest portion of workload in general purpose PC and portable devices. The ever-increasing computation intensity of multimedia applications elevates the processor temperature and consequently impairs the reliability and performance of the system. In this thesis, we propose to perform dynamic thermal management using reinforcement learning algorithm for multimedia applications. The presented learning model does not need any prior knowledge of the workload information or the system thermal and power characteristics. It learns the temperature change and workload switching patterns by observing the temperature sensor and event counters on the processor, and finds the management policy that provides good performance-thermal tradeoff during the runtime. As the system complexity increases, it is more and more difficult to perform thermal management in a centralized manner because of state explosion and the overhead of monitoring the entire chip. In this thesis, we present a framework for distributed thermal management in many-core systems where balanced thermal profile can be achieved by proactive task migration among neighboring cores. The framework has a low cost agent residing in each core that observes the local workload and temperature and communicates with its nearest neighbor for task migration and exchange. By choosing only those migration requests that will result in balanced workload without generating thermal emergency, the presented framework maintains workload balance across the system and avoids unnecessary migration. Experimental results show that, our distributed management policy achieves almost the same performance as a global management policy when the tasks are initially randomly distributed. Compared with existing proactive task migration technique, our approach generates less hotspot, less migration overhead with negligible performance overhead. Temperature affects the leakage power and cooling power. In this thesis, we address the impact of task allocation on a processor\u27s leakage power and cooling fan power. Although the leakage power is determined by the average die temperature and the fan power is determined by the peak temperature, our analysis shows that the overall power can be minimized if a task allocation with minimum peak temperature is adopted together with an intelligent fan speed adjustment technique that finds the optimal tradeoff between fan power and leakage power. We further present a multi-agent distributed task migration technique that searches for the best task allocation during runtime. By choosing only those migration requests that will result chip maximum temperature reduction, the presented framework achieves large fan power savings as well as overall power reduction

    Predictive Thermal Management for Energy-Efficient Execution of Concurrent Applications on Heterogeneous Multicores

    Get PDF
    Current multicore platforms contain different types of cores, organized in clusters (e.g., ARM's big.LITTLE). These platforms deal with concurrently executing applications, having varying workload profiles and performance requirements. Runtime management is imperative for adapting to such performance requirements and workload variabilities and to increase energy and temperature efficiency. Temperature has also become a critical parameter since it affects reliability, power consumption, and performance and, hence, must be managed. This paper proposes an accurate temperature prediction scheme coupled with a runtime energy management approach to proactively avoid exceeding temperature thresholds while maintaining performance targets. Experiments show up to 20% energy savings while maintaining high-temperature averages and peaks below the threshold. Compared with state-of-the-art temperature predictors, this paper predicts 35% faster and reduces the mean absolute error from 3.25 to 1.15 °C for the evaluated applications' scenarios
    • …
    corecore