Abstract-Experimental investigation of data center cooling and computational energy efficiency improvement through advanced thermal management was performed. A chiller-less data center liquid cooling system was developed that transfers the heat generated from computer systems to the outdoor ambient environment while eliminating the need for energy-intensive vapor-compression refrigeration. This liquid cooling system utilizes a direct-attach cold-plate approach that enables the use of warm water at temperature a few degrees above outdoor ambient to achieve lower chip junction temperatures than refrigerated air. Using this approach, we demonstrated a cooling energy reduction by over 90% and computational energy reduction of up to 14% compared to traditional refrigerated air cooled data centers. To enable future computational efficiency improvements through high-density 3-D-chip stacking, we developed a 3-D compatible chip-embedded two-phase liquid cooling technology where a dielectric coolant is pumped through microscale cavities to provide thermal management of chips within the stack. In two-phase cooling, liquid is converted to vapor, which increases the capacity to remove heat, while the dielectric fluid enables integration with chip electrical interconnects. A test vehicle simulating an eightcore microprocessor was fabricated with embedded cooling channels. Results demonstrate that this volumetrically efficient cooling solution compatible with 3-D chip stacks can manage three times the core power density of today's high-power processor while maintaining the device temperature well within limits.
industrial sector and the creation of an information network supported by data centers throughout the world. Data centers vary in size and power usage. The ten largest "mega data centers" reported by the Top Data Centers website [1] have a size of 1-10 million square feet and power usage between 80 and 150 MW. A significant fraction of the data center energy usage is the cooling energy required to remove the heat generated by computer hardware (e.g., microprocessors and memory) while meeting operating temperature margins in the data center. The computer semiconductor chips in data centers operate at temperatures well above the outdoor ambient environment. Reducing the thermal impedance between the chips and the coolant enables the use of the outdoor ambient environment to provide effective cooling. This eliminates the need for energy-intensive refrigeration, which can use up to 30% of data center energy.
The computer systems in data centers are also facing new technology challenges as transistors reach atomic dimensions. Scaling that had followed Moore's Law-doubling of the number of transistors on a chip every two years-is slowing down and will soon reach an end. To achieve scalability of computer system performance, the IT industry is exploring new technologies to improve computational performance. These new technologies include increasing packaging integration density through 3-D stacking of chips. However, high-power 3-D chip stacks require new cooling solutions to address significant thermal challenges. The development of chip-embedded cooling that utilizes two-phase flow boiling of dielectric fluid through microscale cavities can provide a cooling solution compatible with 3-D chip stack interconnects, fabrication, and assembly.
II. IT GROWTH AND DATA CENTER ENERGY USAGE
The statistics of Internet use available online [2] report that, as of August 2016, there were 3.4 billion Internet users worldwide, which is greater than 40% of the world's population of 7.4 billion. The reported amount of electronic information processed every 60 s [2] includes: 150 million email messages, 3.3 million searches on Google, 7.7 million videos viewed on YouTube, and 0.44 million "tweets" sent on Twitter.
To support the flow of information on the Internet, many data centers that house computer systems required to process and communicate information have been constructed throughout the world. In 2010, there were roughly 33 million computer servers installed worldwide. In the U.S. alone, about 12 million computer servers were installed, of which 97% were volume servers and 3% were midrange and high-end servers [3] . Data centers house thousands of server racks, with each rack typically consuming 10-30 kW of power. A study from the Lawrence Berkeley National Laboratory [4] reported that in 2005, data center power usage amounted to 1.2% and 0.8% of the total U.S. and worldwide energy consumption, respectively. From 2005 to 2013, energy use by data centers and their supporting infrastructure was reported to have increased to over 2% of total U.S. energy [5] , [6] . By 2020, the energy use is expected to increase further by over 40% [5] . Data center efficiency improvement requires addressing the energy usage of both the computer systems and cooling infrastructure of these megascale facilities. To that end, thermal solutions for data centers must take a holistic view from the nanometer semiconductor device scale to the kilometer data center scale.
III. DATA CENTER TRADITIONAL COOLING
Data centers have traditionally used energy-intensive compressor-based cooling to provide cold air to cool the IT equipment. In these air-cooled data centers, the IT equipment (servers) use roughly 50% of the total data center power usage and the corresponding cooling energy usage, is ∼25%-30% [7] . In other words, in a traditional air-cooled data center, to cool 1 W of server power requires ∼0.5 W of cooling power. This is an important baseline metric to quantify the cooling inefficiencies in a traditional air-cooled data center. Fig. 1 shows a facility-level schematic of a data center traditional cooling system that is used to transfer heat from the server exhaust air to the ambient outdoor air. Racks of IT equipment are arranged in rows to form aisles in which two rows of racks face each other at their inlets. The servers inside each rack use forced air cooling via fans inside each server, and thus the racks of servers require a continuous and reliable supply of cool air for their operation. This cool air is supplied by the computer room air conditioning (CRAC) units that operate by receiving chilled water through underfloor pipes from a chiller. The cool air enters the data center room via floor perforated tiles, passes through the racks being heated in the process, then is circulated to the intake of the room CRAC units, which again cools the heated air and blows it back under the floor plenum.
The American Society of Heating, Refrigerating and Air-Conditioning Engineers (ASHRAE) sets environmental classes for the operation of IT equipment in data centers. The ASHRAE class A1 specifies an allowed inlet air temperature range of 15°C-32°C [8] . The lower temperatures are achieved through the use of a refrigeration chiller plant that operates on a vapor compression cycle and consumes a significant fraction of the cooling energy (∼50%) to provide subambient temperatures. Under new ASHRAE Data Center Environmental Classes A3 and A4, the allowed temperature for inlet air temperatures in data centers has expanded to maximum temperatures of 40°C and 45°C, respectively [8] . This eliminates the need for chillers/compressors for a large fraction of the year, thereby saving a significant amount of cooling power. Raising the data center temperature, however, has several disadvantages, which will be detailed later.
The daisy-chained heat exchange loops existing in a typical air-cooled data center are shown in Fig. 1 . The data center cooling loop consists of CRAC units placed inside the data center to extract heat from the IT equipment by circulation of air cooled by chilled water provided from the refrigeration chiller plant. The refrigerant chiller plant loop rejects the heat into a condenser water loop via the refrigeration chiller condenser heat exchanger. A condenser pump circulates water between the refrigeration chiller plant and the evaporative cooling tower. The evaporative cooling tower uses forced air movement and water evaporation to extract heat from the condenser water loop and transfer it into the outside ambient environment.
In this "standard" facility cooling design, the primary cooling energy consumption components (percentage of cooling power usage) include [9] : 1) CRAC blowers (∼33%); 2) building chilled water (BCW) pumps (∼10%); 3) refrigeration chiller compressors (∼50%); 4) condenser water pumps (∼3%); 5) cooling tower blowers (∼4%). It should be noted that the energy used by the fans inside the servers is considered as IT equipment power and not as cooling energy consumption. Some of our recent server characterization work [10] , [11] has shown that in typical aircooled servers for processor-intensive workloads the processor power can range from 50% to 80% of the total server power while the fan power consumption can vary from 5% to 20% of the total server power. Fan power is a strong function of the server ambient temperature and processor junction temperature. Fig. 2 is a measurement of a server fan power and processor junction temperature variation with server ambient temperature.
For server nominal ambient temperatures of 26°C-27°C, server fan power consumption was observed to be ∼8% of server (IT) power. As mentioned earlier, to provide the air ambient temperature for ASHRAE class A1 in the server room requires ∼50% of IT power. Using this normalization, a typical 100-W processor would require ∼62 W for cooling.
As the server ambient air temperature increases, the processor junction temperature (T) increases, and the margin between the processor operating T and maximum allowable T j is reduced. Note that exceeding the maximum allowable processor T j would result in the processor shutting down to prevent thermal damage to the chip. To account for this reduced margin, chip cooling is increased by ramping up the fan speed, which increases air flow and reduces heat sink thermal resistance. This results in a corresponding increase in server fan power and, therefore, in total server (IT) power. For example, an increase in server ambient temperature from 27°C to 31°C forces the fan power to increase by ∼50% to provide additional cooling with a slightly lower chip temperature. An additional rise in server ambient air temperature forces additional increases in server fan power to maintain the processor operating T j until the fan power reaches a maximum of ∼20% of the total server power. In addition, the processor subthreshold leakage power is a function of chip temperature and typically increases by ∼3% for every 10°C rise in chip temperature.
Hence, as servers operate in higher data center temperatures, the server power will increase due to the higher server fan power and the server processor leakage power. In the case of increasing data center temperature from 27°C to 37°C, as shown in Fig. 2 , the server fan power increases by 12% of the server power, and the server processor leakage power is expected to increase ∼3%, resulting in an estimated ∼14% increase in the total server power. This server power increase reduces the benefit of the cooling power savings achieved by increasing the data center temperature [10] , [12] .
This increase in server IT power consumption is important to consider with increasing data center temperatures from ASHRAE A1 to the higher temperature A3 and A4 classes. It should also be noted that the increased IT power would appear to improve the data center power usage effectiveness metric defined as the total energy/IT energy.
In addition, there are server reliability issues at increased temperatures that are beyond the scope of this paper.
IV. ADVANCED DATA CENTER LIQUID COOLING
Several factors contribute to the inefficiency and excessive energy consumption of data center traditional cooling designs, including:
1) inefficiency of using air (which has low thermal conductivity and heat capacity) as a coolant; 2) large thermal resistance between electronic components (e.g., chips) and the coolant (air); 3) use of subambient temperature air and water that requires energy-intensive chillers; 4) use of daisy-chained cooling loops adding inefficiencies in each heat-exchanger. In this section, we describe the development of more advanced energy-efficient cooling designs such as dualenclosure liquid cooling (DELC) technology [13] and advanced thermal metal interface [14] , [15] . These technologies provide a cooling solution in which a liquid coolant, at temperatures higher than the outdoor ambient year-round, can effectively cool the data center servers. This eliminates the need for energy-intensive cooling systems such as refrigeration-based chillers. These advanced thermal approaches can be placed into two general categories: first, reducing the thermal resistance from the junction to the coolant; second, transfer of heat from the servers/racks to the outdoor ambient environment while minimizing the energy usage.
A. Thermal Resistance
The electronic packaging of semiconductor chips provides both electrical connections to the chip and a thermal structure to remove the heat generated by the chip.
A typical air-cooled server would incorporate a microprocessor chip package as shown in Fig. 3 (a), which shows a microprocessor mounted on a package substrate to provide electric connections for power and signals. The heat from the chip is removed by attachment of a thermally conductive lid and air-cooled heat sink that are thermally coupled to the backside of the chip with thermal interface materials (TIM). The temperature rise within the chip is a function of the thermal resistance between the active semiconductor devices of the microprocessor and the coolant being utilized to remove the generated heat. The thermal resistance as shown in Fig. 4 includes: 1) a silicon chip, which also acts as a heat spreader; 2) a first thermal interface material (TIM1) between the silicon and the package lid; 3) the lid, which also acts as a heat spreader; 4) a second thermal interface material (TIM2) between the lid and air-cooled heat sink; and 5) the heat sink to ambient air. The cooling performance and energy efficiency of a server can be significantly improved by utilizing liquid cooling to remove heat from high-power components within the server. In this case, as shown in Fig. 3(b) , for a microprocessor chip package, the air-cooled heat sink is replaced with a liquidcooled cold-plate while keeping the same package lid and both a TIM1 and TIM2. Similarly, other high-power components in a server such as memory modules, graphics card, ASICs, etc., can also be liquid-cooled. A recent study [11] comparing a typical air-cooled server to a hybrid air-/liquid-cooled server incorporating liquid cooling for the processors and memory and air cooling for remaining components highlights the benefits of liquid cooling in terms of improvement in device temperatures and server power consumption savings. The results are summarized in Table I , which showed that the liquid-cooled server, cooled by 40°C water, uses about 3% less power than is used by its air-cooled counterpart cooled by 22°C air. Furthermore, if cooled by 25°C water, the hybrid server uses ∼14% less power, which could lead to over 2 kW in power savings per rack of 40 servers. In summary, we found that "warm water cooling is more efficient than cool air cooling."
In a more advanced liquid-cooled microprocessor package, as illustrated in Fig. 3(c) , the lid and both TIMs can be replaced with a single high-performance liquid metal TIM (LMTIM) [14] that has a thermal conductivity an order of magnitude higher than commonly used TIMs. In this directattach approach, the cold-plate is attached directly on the backside of the die, which eliminates the thermal resistance of the lid and second TIM2, significantly lowering the thermal resistance path to the coolant. The LMTIM is composed of an indium-gallium-tin alloy that is liquid at all normal operating temperatures. Fig. 5 shows the comparison between the lidded cold-plate shown in Fig. 3 (b) and direct-attach coldplate shown in Fig. 3 (c) with LMTIM for a 100-W processor verses coolant temperature [15] . The results showed an estimated temperature improvement of nearly 5°C improvement over a range of coolant temperatures. This would allow an increase in coolant temperature of 5°C to achieve the same chip junction temperature.
B. Energy-Efficient Heat Rejection to Outdoor Ambient
The improvement in thermal resistance of the chip package described in Section IV-A reduces the temperature difference between the coolant and chip junction temperature. This allows the coolant temperature to be above the outdoor ambient temperature. In this section, we describe a system designed to use the ambient environment for cooling. Fig. 6 illustrates a DELC system, which is an energyefficient method of rejecting the heat from the servers/racks directly to the outdoor ambient [13] . This system uses two liquid coolant loops-an internal water loop and an external water/ethylene glycol (antifreeze) loop-to facilitate above ambient cooling of the IT equipment. The internal water loop is routed through the servers inside a sealed equipment rack to extract the heat directly from the processors and memory modules via cold-plates. The remaining heat from other server (disk drives, voltage regulators, etc.) and rack components (power distribution units, switches, etc.) is removed by recirculating air and utilizing an air-to-liquid heat exchanger housed inside the sealed rack to transfer the heat to internal water loop. The airflow within the rack is provided by the server fans, which were reduced in number as more than half the heat is removed by cold plates. Thus, all the heat from the servers is transferred to the internal water loop within the sealed rack. This heat is then transferred to the external liquid loop via a liquid-to-liquid heat exchanger and then subsequently rejected to the outside environment through a dry cooler. In contrast to traditional air-cooling approach, the DELC system uses only three powered cooling devices-the external dry-cooler fans, the external loop pump, and the internal loop pump, resulting in significant cooling energy savings.
A recent study [16] reported on a long-term demonstration of this system where it was operated for over a two-month period from May to June in New York. The ambient temperature ranged from 4.7°C to 35.8°C with an average of 21.6°C. Results show that this system consumed on average 3.5% of IT power to reject the heat to the outside ambient while keeping the server component temperatures within specification. This is a reduction of over 90% compared to data center traditional air-cooling systems. The anticipated benefits of such energycentric configurations are significant. When compared to a traditional midsized 10-MW data center, which typically uses 25% of its total data center energy consumption for cooling, this technology could potentially enable a cost savings of roughly between $800 000 and $2 200 000/year (assuming electricity costs of 4¢ to 11¢ per kilowatt-hour) through the reduction in electrical energy usage while also eliminating water usage of up to 240 000 gal/day [17] .
Overall, this DELC system offers the following key advantages. First, the system rejects heat from the data center to the outside ambient without exposing the equipment to the outside air that contains contaminants that can degrade IT equipment reliability. Second, the dry cooler rejects the heat to the outside air environment through a radiator and fan system operating without consuming external water. Third, having a reduced number of server fans coupled with advanced thermal packaging reduces the IT power consumption by up to 14% [10] . Fourth, this chiller-less system uses significantly lower cooling energy (3.5% of reduced IT power [16] ) to reject the heat to outdoor ambient. Note that since the servers used in this paper were retrofitted with liquid cooling, the maximum outdoor ambient temperature that the DELC system can operate without any degradation in server performance or any long-term reliability issues is 45°C [13] .
V. COMPUTER SYSTEM SCALING
At the core of improving data center energy efficiency is the continuous scaling of IT server performance per watt. The GREEN 500 list [18] that tracks supercomputer performance reports the most efficient supercomputers process over 6.6 billion floating point operations per second per watt of power. To put this in perspective, this performance is close to a trillion times greater than the first commercial computer introduced in the 1950s.
The driver for IT system performance growth in the last 50 years has been the scaling of transistors to smaller dimensions. This scaling has enabled the number of transistors on a chip to follow Moore's law projection of doubling approximately every two years. As shown in Fig. 7 , the number of transistors on a chip has increased more than a million times from 1975 to 2015 [19] . However, as transistor scaling approaches atomic dimensions, the scaling has slowed [20] .
A path for continuously increasing system performance as Moore's law slows and reaches the end is the implementation of 3-D or "stacked" chips to bring more compute and memory resources closer together, reducing latency and input/output power. However, cooling stacked chips with more than one high-power layer utilizing conventional single-sided heat sink approaches becomes difficult or impossible due to the thermal resistance associated with multiple intervening chip layers as shown in Fig. 8 .
The IT industry has developed 2.5-D high-density packages as shown in Fig. 9(a) , in which a silicon substrate provides high-density wiring between a graphics processor and stacked memory, which increases packaging integration but allows cooling with traditional external heat-sinks [21] . However, to realize the full performance improvements of an integrated 3-D chip stack structure with high-bandwidth interconnects including high-power microprocessors and accelerators and memory as shown in Fig. 9(b) , new cooling technologies that can provide cooling within the chip stack would be required.
VI. TWO-PHASE COOLING FOR COMPUTATIONAL EFFICIENCY
A solution to address the challenges of cooling 3-D chip stack structures is embedded cooling, where coolant is flowed between the stacked high-power chips. Embedded cooling adds a new set of constraints to the system, including the constraints on available channel dimensions and selection of coolant. Channel heights are limited by fabrication processes and electrical performance requirements of the electrical links between and through the stack layers. For example, parallel channels with height and width that meet the electrical requirements would create substantial pressure drop when using a single-phase liquid to flow across the 20 mm or more length of large processor dies. Use of the most conventional liquid coolant, water (which is conductive), is problematic due to both the need to fully isolate it from any voltage in the electrical interconnects and the significant losses associated with transmitting very high-frequency signals in proximity to it.
An approach to mitigate these concerns is embedded two-phase cooling using flow boiling of dielectric coolant in channels between the stacked high-power active layers [22] . Two-phase flow boiling has long been proposed as a potential method for cooling high-performance computer systems [23] , [24] . A large body of work investigating and developing technologies appropriate for cooling electronics with two-phase flow boiling in parallel micro/mini-channels exists [25] . Nevertheless, potential flow instabilities, low latent heat and low thermal conductivity of hydrofluorocarbon dielectric coolants, and high flow resistance along relatively long parallel channels have resulted in critical heat fluxes (CHFs) limited to a range of about 40 [26] to 255 W/cm 2 [27] .
A significantly different approach to embedded cooling in 3-D chip stacks is to replace long parallel channels with a central fluid manifold and radially expanding channels that exit at the edges of the die. Similar radial arrangements of microchannels have been demonstrated in previous studies of cooling plates or heat exchangers carrying single-phase gas [28] , [29] or liquid [30] , [31] , where this arrangement showed specific pressure drop and flow stability benefits when operated in a two-phase mode. Recent numerical studies [32] , [33] also reveal that this two-phase cooling approach could achieve better energy efficiency and maximum CHF from the resulting reduced flow path [34] . A thermal test vehicle (TTV) incorporating this approach [35] - [37] is described in Section VI-A.
A. Thermal Test Vehicle
The TTV, illustrated in Fig. 10 , was designed to reasonably mimic the power map of an eight-core microprocessor while providing the ability to operate at powers in excess of those expected in next-generation microprocessors. This enables the study of the capability limits associated with the underlying approach and configuration.
The TTV was fabricated by bonding two silicon dies of size 2 × 2 cm 2 that are identified as a thermal die and a manifold die in Fig. 10(a) . The thermal die front side (bottom surface) includes: 1) metal thin film background heaters; 2) eight "core" heaters (3.6 × 4.8 mm 2 ) intended to mimic microprocessor cores; and 3) 16 "hot spot" 200 × 200 µm 2 heaters to mimic hotspot effects. Fig. 10(b) shows a representative power map where blue-colored zone highlights the background heater regions, light blue zones highlight the core regions, and red zones indicate the hot-spot regions. The thermal die front side also includes 37 resistive temperature detectors (RTDs) arrayed throughout the 4-cm 2 die area as shown in Fig. 10(c) . The thermal die backside (top surface) has etched orifices and radial channels as shown in Fig. 10(d) . The etched cooling structure has a radial quadrant symmetry and includes a central inlet region and five inlet orifices and five radial expanding channels (18°sections each) per quadrant. These etched features are 120 µm deep and include a pin fin array of 80-µm-diameter pins on 200-µm pitch to simulate through silicon via (TSV) structures in a stacked die arrangement. The inlet orifices to each of the five radial channels per quadrant were designed to have a width appropriate to the maximum expected power to be removed by the coolant flowing in that channel relative to the total power. As designed, the pressure drop associated with inlet orifices was expected to be a major fraction of the total pressure drop across the TTV so as to passively attain the required flow distribution. This pressure drop is predominantly due to the area contraction ratio and is dependent on the width of the orifices. Furthermore, such a high pressure drop allows the coolant to flash (undergo subcooled boiling) right across the orifices. The radial channels are further subdivided into two branches at a radius of 4 mm and four subbranches at 7 mm. Since the power map is highly nonuniform, the branching structure helps minimize the flow biasing toward one flow guide wall within a given 18°radial channel segment.
The thermal die backside (top surface) was bonded to a manifold die that has a central hole to provide a fluidic path Fig. 11 . Schematic of the test system. and creates the top walls of the radial expanding channels. The die pair was flip-chip-bonded to a ceramic substrate module. A copper manifold lid was bonded to the manifold die and ceramic substrate with solder to provide leak-tight fluidic inlet and outlet connections to the upstream coolant delivery and the downstream condensation system.
B. Test Setup
The TTV was mounted on a test card to provide electrical connections to the heaters and signal connections to measure the RTD sensor temperatures. Fluidic connections were made between the TTV and a test cooling system to supply coolant. The dielectric coolant (trans-1,3,3,3-tetrafluoroprop-1-ene, commonly referred to as R1234ze) was selected after a detailed evaluation of thermodynamic and transport properties of several single-component refrigerants [34] . This coolant has a negligible ozone depletion potential and a very low global warming potential. A schematic of this test system is shown in Fig. 11 . Both absolute pressure at the TTV inlet and the differential pressure across the TTV were measured and recorded along with inlet and outlet temperature, RTD temperature data, and coolant mass flow rate. Note that in this test system, the TTV outlet pressure as well as the reservoir temperature are constant for a fixed chiller set-point. Thus, the inlet pressure as well as the inlet subcooling increases with an increase in pressure drop across the TTV.
In this paper, the RTD temperature data T RTD are not equivalent to actual microprocessor junction temperatures T j , as there are insulating layers between the RTDs and the silicon die that do not exist in a real-world processor. This additional thermal resistance can be substantial at very high heat flux conditions, and thus, it was subtracted from the experimental RTD results to report the results as an equivalent junction temperature, T j .
C. Key Results
The TTV was installed into the test system (Fig. 11) , and refrigerant was flowed through it. The initial test vehicles for which data are reported here showed some variability in behavior from quadrant to quadrant, requiring collection of data on subsets of quadrants in order to determine the thermal performance limits of that subset. Fig. 12 shows an interpolated visualization of the temperature profile of a TTV with two diagonal quadrants powered at 355 W/cm 2 with 20 W/cm 2 background heat flux (∼300 W). This profile takes data from the 37 RTD sensors and fits a spatial temperature map to the sensor data. This power density is 2-3× that of today's high-power processor core heat densities. Fig. 13 shows quadrant average core sensor data versus core heat flux value for this pair of quadrants. In collecting this data, subcooling at the device inlet was held to 7.0°C ± 0.4°C at an inlet temperature of about 32°C for the highest power case. For this data, the TTV refrigerant mass flow rate was 15 kg/h, and at the highest power level, the average exit vapor quality and pressure drop were observed to be ∼38% and 320 kPa, respectively. However, as only the inlet mass flow rate is measured, the exact flow through the individual quadrants for which the data is shown cannot be determined precisely. Note that the thermal resistance line shown in Fig. 13 begins to turn upward around 300 W/cm 2 , but an average junction temperature of under 60°C was maintained up to 350 W/cm 2 . Unlike most single-phase systems, this two-phase system exhibits flux-dependent thermal resistance behavior typical of two-phase systems. At low power densities, the heat must generally travel through a thicker liquid film to get to the vapor/liquid interface, resulting in an increased thermal resistance. As power rises, this film generally thins, reducing the thermal resistance. Eventually, early stages of dry-out begin to appear, causing the thermal resistance to increase. Shown in Fig. 14 are the junction temperatures for each sensor located within the powered core areas of the two powered quadrants. Early stages of dry-out are observed at a range of power densities, but the observed temperatures prior to this power density are tightly bunched with sensors located further downstream generally cooler. This is as expected in a two-phase system where the saturation temperature is dropping with decreasing pressure along the channel. Fig. 15 shows the absolute pressures observed as power to the two diagonal powered cores was increased on a separate data set. The pressure drop increase is not strongly dependent on the power, which limits the impact of power input on flow imbalance between powered and unpowered quadrants to a few percent.
While the data show effective cooling for core power densities of ∼350 W/cm 2 , of additional interest is the behavior of small localized high-power areas or hot spots. To investigate hot-spot behavior, eight separate hot spots of 200×200-µm 2 size within the two powered quadrants were powered with increasing heat flux at two different coolant flow rates. In Fig. 16 , the average hot spot T j is plotted as a function of hot spot heat flux. It shows that the hot-spot T j increases linearly with increasing hot-spot heat flux, with well-controlled temperature up to 2 kW/cm 2 . The average hot spot T j is 0.5°C-1.2°C lower for flow rate of 15 kg/h than that for a flow rate of 10 kg/h. This indicates that the peak junction temperatures do not change significantly with changes in the coolant flow rate unless a dry-out or partial dry-out condition is encountered. This is due to the fact that at these flow rates, the liquid film may be similar, resulting in roughly the same heat transfer rates. This highlights a key nonlinear behavior of two-phase cooling configurations where higher flow rates do not necessarily result in lower junction temperatures.
In summary, TTVs appropriate for the understanding of potential stacked 3-D high-power processor thermal behavior using novel radial expanding channels and pin fields cooled using two-phase dielectric fluid were constructed. Of particular interest was the behavior [temperature rise and effective CHF (e-CHF)] of power-generating features that mimic the "core" areas of a high-power processor. For this configuration, an operational e-CHF is defined as a rise in T j of 30°C above the inlet fluid temperature. In this paper, the e-CHF was found to be ∼350 W/cm 2 , which is significantly higher than the values observed for longer parallel channels. Hot-spot power densities in excess of 2 kW/cm 2 were also effectively cooled. The work indicates that this approach would be well suited to adequately cool stacked 3-D high-power processor dies.
VII. 3-D CO-DESIGN
In this section, we will describe the interdependencies and tradeoffs between thermal, mechanical, electrical, and computational performance through an example 3-D chipstack having embedded two-phase dielectric cooling. Although this work primarily addresses one example 3-D configuration, many of the conclusions will apply to other 3-D configurations utilizing embedded cooling.
The 3-D straw-man configuration, as illustrated in Fig. 9(b) , includes a processor chip, an accelerator chip, and four 3-D memory stacks joined to the accelerator chip. The dielectric refrigerant would be passed through a micro-pin field between the processor and accelerator stacks. This 3-D straw-man is one example of a heterogeneous 3-D system that could not be realized with traditional air heat sinks or liquid cooling cold-plates, where the heat is removed through the top of the stack (in this case, the top chip of memory stacks). In this example, the high thermal resistance from the processor and accelerator chips to the heat sink, due to multiple sets of high-thermal-resistance interconnect layers in the memory stacks and accelerator chip, would result in unacceptably high temperatures in the high-power layers farther from the heat sink.
This configuration includes a micro-pin interconnect between two thin (∼75 µm) chips with TSVs in a back-toback orientation. The back-to-back chip orientation enables a low-resistance thermal path between the high-power circuits on both the processor and accelerator chips to the two-phase dielectric coolant flowing between those two chips. The lower power memory stacks are joined to the accelerator chip in a face-to-face orientation with a fine-pitch micro-pillar interconnect. This enables a high-bandwidth interface between the memory stacks and the accelerator chip. The heat from the memory stacks is removed through the accelerator chip into the two-phase cooling channels. The assembled module includes a center inlet between the memory stacks connecting to a center inlet hole in the accelerator chip, through which the dielectric fluid is fed into the radial expanding channels between the processor and accelerator chips. The total height of this high-power 3-D chip stack with embedded cooling is less than the thickness of a current microprocessor die. This novel, radially symmetric cooling configuration and its performance have been described in Section VI.
A detailed evaluation was performed to determine the feasibility of such a 3-D structure. Below is a summary.
A. Thermal Performance
An evaluation of tradeoffs between fluid channel dimensions, CHF criteria, and pressure drop to determine the range of pin spacing options for different heat flux criteria was performed. For example, increasing fluid channel dimensions to support higher power densities results in a reduction in available interconnects for chip-to-chip power delivery and electrical communication.
Initial thermal analysis results were combined with process compatibility evaluations, mechanical modeling, and electrical modeling to determine an ideal set of technology parameters for the proposed 3-D straw-man. A thermal analysis using a reduced-physics model [38] with a cooling channel height of 150 µm was completed for the 3-D straw-man. In this reduced physics approach, the heat conduction in the solid domains (processor, accelerator, and memory stack) was computed in 3-D, while the cooling channels including the pin-field/radial channels were approximated as porous media and solved in 2-D (radial and azimuthal). The modeling of a quadrant of the 3-D stack structure included example power maps representative of a high-performance processor, an accelerator chip, and 3-D memory eight-layer stacks. The total quadrant power was ∼95 W with maximum power density of ∼270 W/cm 2 in roughly 1-mm 2 hot spot regions. The results, as shown in Fig. 17 , indicate acceptable temperature in all layers of the 3-D stack, with the highest temperature in the chip with the highest power density (the processor chip).
B. Mechanical Stability
Mechanical models of the 3-D straw-man included discrete pin geometry, chip stack and package details, Cu pin fillet, and Cu yielding. The models predict thermal stresses created during fabrication as well as stress associated with operation. Modeling results suggest that the coefficient of thermal expansion (CTE) mismatch between an organic laminate package and the chip stack can result in high stresses in the chip-to-chip Cu pillars during fabrication if there are hightemperature excursions after the chip stack to laminate join is completed. Explicit modeling of the pillar geometry as well as accurate modeling of Cu yielding are required for accurate prediction of stability of the structure. Optimization of pillar size and pitch, consideration of CTE mismatch between chip and package, and thermal excursions during fabrication are important considerations for mechanical stability. Mechanical modeling completed on the 3-D straw-man after refinement of the technology and process parameters indicated acceptable stress levels throughout the stack.
C. Electrical Modeling/Power Delivery
In the 3-D straw-man, the highest power chip is placed closest to the package providing the best power delivery (reduced IR drop and I 2 R loss), leading to an ideal performance/power ratio. This configuration would not be possible without embedded cooling. The total current does not exceed what can be reliably delivered through the C4 and 3-D interconnects at the nominal operating voltage. In addition, simulation of the IR drop indicates less than 20 mV of supplyvoltage drop through the stack and accelerator chip power distribution network supporting robust, reliable power delivery.
D. Electrical Modeling/Signal Integrity
Signal integrity analysis through the chip-to-chip connections in the presence of a refrigerant surrounding the connections was also completed. First, the electrical properties of the fluid and vapor phases of the refrigerant were measured. Next, those results were fit to a causal multipoint Debye model, and then a canonical interdie signal traverse whose geometry represented 3-D chip-to-chip connections was simulated. Finally, eye diagrams for the links using a pseudorandom 25-GB/s bit stream with four nearest neighbor aggressors, was simulated. The results indicate good signal integrity with the link loss being only marginally (below 1%) impacted by having the refrigerant in direct contact with the interdie connections for the data rates envisioned for this application.
E. Computational Modeling/System Performance Simulations
Computational modeling focused on exploration of the benefits of increased bandwidth and low latency in a two-chip 3-D stack versus two chips in a side-by-side configuration. Different execution phases from a set of benchmarks, where the first phase is sequential (usually an initialization part at the beginning of the execution) followed by one or more computeintensive loops where the main processing takes place, were simulated. For the 3-D to 2-D comparison, we assumed that the application (kernel) runs on the main processor and that each iteration of the main loop(s) is offloaded to the accelerator for execution along with its input data. Our hypothesis for the 3-D case was that the chip-to-chip bus can operate at least at double the speed compared to the 2-D case assumption and with two times the number of lanes. As a result, the bus for the proposed 3-D straw-man can sustain a bandwidth four times higher than the 2-D baseline. Results showed that the higher the utilization of the chip-to-chip bus, the larger the benefits of a 3-D approach versus a 2-D approach, achieving around 2× speedup for 70% bus utilization. Through such detailed analysis, we can guide the co-design of embedded cooling technology development to provide compatibility with 3-D chip-stacking system requirements.
VIII. CONCLUSION
The development of advanced thermal solutions is critical to the continued improvement of computing performance and efficiency. To that end, experimental investigation of an energy-efficient liquid cooling system for data centers was performed. This design utilized a closed-loop liquid cooling system to transport the heat from computer servers to the outdoor ambient environment, eliminating the need for energy, intensive compressor-based refrigeration used in traditional air-cooled data centers. This hybrid air-liquid cooling system utilized cold-plates to remove the heat generated by the processors and memory and an air-to-liquid heat exchanger that removed heat from other components by recirculating air. The system was operated over two months from May to June in Poughkeepsie, NY, USA, and demonstrated a 90% reduction in cooling energy compared to traditional air-cooled data centers. The liquid cooling system reduced chip junction operating temperatures and allowed removal of server fans, which reduced the IT energy by up to 14%.
Transistor scaling that has historically driven the improvements in performance is reaching a physical limit. The devel-opment of high-density packaging of heterogeneous computing elements provides a path to continue and accelerate the continued improvements in computer performance. However, to realize the full performance improvements of an integrated 3-D chip stack structure with high-bandwidth interconnects, new cooling technologies that can provide cooling within the chip stack would be required. To address such cooling challenges, chip-embedded two-phase dielectric liquid cooling was experimentally demonstrated. The use of dielectric coolants provides a cooling solution compatible with 3-D chip stack electrical interconnects. A TTV having 120-µm-deep radially expanding cooling microchannels with pin-fin arrays representing 3-D chip stacks interconnects was fabricated to simulate the power map of an eight-core microprocessor. This vehicle was operated at a core power density of 350 W/cm 2 and hot spot density of 2000 W/cm 2 while achieving a chip junction of roughly 65°C. Results demonstrate that this volumetrically efficient cooling solution compatible with 3-D chip stacks can manage three times the core power density of today's high-power processor while maintaining the device temperature well within limits.
The advanced thermal solutions provide three major benefits to computer efficiency. First, integration of liquid cooling to the chip reduces chip junction temperature and leakage power reducing the energy per computation. Second, the integration of liquid cooling to the chip reduces the thermal resistance between the chip and the coolant allowing coolant temperatures above the outdoor ambient temperature, thus eliminating subambient energy-intensive cooling requirements. Third, the integration of chip stack embedded liquid cooling provides a path to high-bandwidth 3-D chip stacking of heterogeneous components which has the potential to accelerate computational performance improvements.
These thermal solutions have the opportunity to impact the energy efficiency of mega-scale data centers that are consuming over 2% of U.S. electricity and will continue to grow as the acquisition, processing, and transmission of information continues to expand.
