7 research outputs found

    Attaining Single-Chip, High-Performance Computing Through 3D Systems with Active Cooling

    Get PDF
    Three-dimensional (3D) stacking is an attractive method for designing large manycore chips as it provides high transistor integration densities, improves manufacturing yield due to smaller chip area, reduces wirelength and capacitance, and enables heterogeneous integration of different technologies on the same chip. Stacking, however, significantly increases the thermal resistivity and the on-chip temperatures. In fact, temperature is among the major manufacturing challenges for 3D design. Active cooling, where the chip is cooled through the liquid flowing in built-in microchannels or through a cold plate, has emerged as a viable cooling alternative for high-performance 3D manycore systems. Liquid cooling is more efficient in removing the heat in comparison to conventional cooling methods with heat sinks and fans. Nevertheless, the dynamically changing nature of workloads running on manycore systems require runtime techniques to enable energy-efficient, reliable, and high-performance operation of liquid-cooled 3D systems. This article focuses on the benefits and the challenges of 3D design, and discusses novel techniques to integrate predictive cooling control with chip-level thermal management methods such as job scheduling and voltage frequency scaling. The key message is that 3D liquid-cooled systems with intelligent runtime management provide an energy-efficient solution to designing future single-chip high-performance manycore architectures

    3D-ICE: a Compact Thermal Model for Early-Stage Design of Liquid-Cooled ICs

    Get PDF
    Liquid-cooling using microchannel heat sinks etched on silicon dies is seen as a promising solution to the rising heat fluxes in two-dimensional and stacked three-dimensional integrated circuits. Development of such devices requires accurate and fast thermal simulators suitable for early-stage design. To this end, we present 3D-ICE, a compact transient thermal model (CTTM), for liquid-cooled ICs. 3D-ICE was first advanced by incorporating the 4-resistor model based CTTM (4RM-based CTTM). It was enhanced to speed up simulations and to include complex heat sink geometries such as pin fins using the new 2 resistor model (2RM-based CTTM). In this paper, we extend the 3D-ICE model to include liquid-cooled ICs with multi-port cavities, i.e., cavities with more than one inlet and one outlet ports, and non-straight microchannels. Simulation studies using a realistic 3D multiprocessor system-on-chip (MPSoC) with a 4-port microchannel cavity highlight the impact of using 4-port cavity on temperature and also demonstrate the superior performance of 2RM-based CTTM compared to 4RM-based CTTM. We also present an extensive review of existing literature and the derivation of the 3D-ICE model, creating a comprehensive study of liquid-cooled ICs and their thermal simulation from the perspective of computer systems design. Finally, the accuracy of 3D-ICE has been evaluated against measurements from a real liquid-cooled 3D IC, which is the first such validation of a simulator of this genre. Results show strong agreement (average error<10%), demonstrating that 3D-ICE is an effective tool for early-stage thermal-aware design of liquid-cooled 2D/3D ICs

    Modeling and optimization of high-performance many-core systems for energy-efficient and reliable computing

    Full text link
    Thesis (Ph.D.)--Boston UniversityMany-core systems, ranging from small-scale many-core processors to large-scale high performance computing (HPC) data centers, have become the main trend in computing system design owing to their potential to deliver higher throughput per watt. However, power densities and temperatures increase following the growth in the performance capacity, and bring major challenges in energy efficiency, cooling costs, and reliability. These challenges require a joint assessment of performance, power, and temperature tradeoffs as well as the design of runtime optimization techniques that monitor and manage the interplay among them. This thesis proposes novel modeling and runtime management techniques that evaluate and optimize the performance, energy, and reliability of many-core systems. We first address the energy and thermal challenges in 3D-stacked many-core processors. 3D processors with stacked DRAM have the potential to dramatically improve performance owing to lower memory access latency and higher bandwidth. However, the performance increase may cause 3D systems to exceed the power budgets or create thermal hot spots. In order to provide an accurate analysis and enable the design of efficient management policies, this thesis introduces a simulation framework to jointly analyze performance, power, and temperature for 3D systems. We then propose a runtime optimization policy that maximizes the system performance by characterizing the application behavior and predicting the operating points that satisfy the power and thermal constraints. Our policy reduces the energy-delay product (EDP) by up to 61.9% compared to existing strategies. Performance, cooling energy, and reliability are also critical aspects in HPC data centers. In addition to causing reliability degradation, high temperatures increase the required cooling energy. Communication cost, on the other hand, has a significant impact on system performance in HPC data centers. This thesis proposes a topology-aware technique that maximizes system reliability by selecting between workload clustering and balancing. Our policy improves the system reliability by up to 123.3% compared to existing temperature balancing approaches. We also introduce a job allocation methodology to simultaneously optimize the communication cost and the cooling energy in a data center. Our policy reduces the cooling cost by 40% compared to cooling-aware and performance-aware policies, while achieving comparable performance to performance-aware policy

    Attaining Single-Chip, High-Performance Computing through 3D Systems with Active Cooling

    No full text

    Power-Thermal Modeling and Control of Energy-Efficient Servers and Datacenters

    Get PDF
    Recently, the energy-efficiency constraints have become the dominant limiting factor for datacenters due to their unprecedented increase of growing size and electrical power demands. In this chapter we explain the power and thermal modeling and control solutions which can play a key role to reduce the power consumption of datacenters considering time-varying workload characteristics while maintaining the performance requirements and the maximum temperature constraints. We first explain simple-yet-accurate power and temperature models for computing servers, and then, extend the model to cover computing servers and cooling infrastructure of datacenters. Second, we present the power and thermal management solutions for servers manipulating various control knobs such as voltage and frequency of servers, workload allocation, and even cooling capability, especially, flow rate of liquid cooled servers). Finally, we present the solution to minimize the server clusters of datacenters by proposing a solution which judiciously allocates virtual machines to servers considering their correlation, and then, the joint optimization solution which enables to minimize the total energy consumption of datacenters with hybrid cooling architecture (including the computing servers and the cooling infrastructure of datacenters)

    Improving processor efficiency through thermal modeling and runtime management of hybrid cooling strategies

    Full text link
    One of the main challenges in building future high performance systems is the ability to maintain safe on-chip temperatures in presence of high power densities. Handling such high power densities necessitates novel cooling solutions that are significantly more efficient than their existing counterparts. A number of advanced cooling methods have been proposed to address the temperature problem in processors. However, tradeoffs exist between performance, cost, and efficiency of those cooling methods, and these tradeoffs depend on the target system properties. Hence, a single cooling solution satisfying optimum conditions for any arbitrary system does not exist. This thesis claims that in order to reach exascale computing, a dramatic improvement in energy efficiency is needed, and achieving this improvement requires a temperature-centric co-design of the cooling and computing subsystems. Such co-design requires detailed system-level thermal modeling, design-time optimization, and runtime management techniques that are aware of the underlying processor architecture and application requirements. To this end, this thesis first proposes compact thermal modeling methods to characterize the complex thermal behavior of cutting-edge cooling solutions, mainly Phase Change Material (PCM)-based cooling, liquid cooling, and thermoelectric cooling (TEC), as well as hybrid designs involving a combination of these. The proposed models are modular and they enable fast and accurate exploration of a large design space. Comparisons against multi-physics simulations and measurements on testbeds validate the accuracy of our models (resulting in less than 1C error on average) and demonstrate significant reductions in simulation time (up to four orders of magnitude shorter simulation times). This thesis then introduces temperature-aware optimization techniques to maximize energy efficiency of a given system as a whole (including computing and cooling energy). The proposed optimization techniques approach the temperature problem from various angles, tackling major sources of inefficiency. One important angle is to understand the application power and performance characteristics and to design management techniques to match them. For workloads that require short bursts of intense parallel computation, we propose using PCM-based cooling in cooperation with a novel Adaptive Sprinting technique. By tracking the PCM state and incorporating this information during runtime decisions, Adaptive Sprinting utilizes the PCM heat storage capability more efficiently, achieving 29\% performance improvement compared to existing sprinting policies. In addition to the application characteristics, high heterogeneity in on-chip heat distribution is an important factor affecting efficiency. Hot spots occur on different locations of the chip with varying intensities; thus, designing a uniform cooling solution to handle worst-case hot spots significantly reduces the cooling efficiency. The hybrid cooling techniques proposed as part of this thesis address this issue by combining the strengths of different cooling methods and localizing the cooling effort over hot spots. Specifically, the thesis introduces LoCool, a cooling system optimizer that minimizes cooling power under temperature constraints for hybrid-cooled systems using TECs and liquid cooling. Finally, the scope of this work is not limited to existing advanced cooling solutions, but it also extends to emerging technologies and their potential benefits and tradeoffs. One such technology is integrated flow cell array, where fuel cells are pumped through microchannels, providing both cooling and on-chip power generation. This thesis explores a broad range of design parameters including maximum chip temperature, leakage power, and generated power for flow cell arrays in order to maximize the benefits of integrating this technology with computing systems. Through thermal modeling and runtime management techniques, and by exploring the design space of emerging cooling solutions, this thesis provides significant improvements in processor energy efficiency.2018-07-09T00:00:00
    corecore