Through Silicon Via (TSV) based 3D integration technology is a promising technology to increase the performance of FPGAs by achieving shorter global wire-length and higher logic density. However, 3D FPGAs also suffer from severe thermal problems due to the increase in power density and thermal resistance. Moreover, past work has shown that leakage power can account for 40% of the total power at current technology nodes and leakage power increases non-linearly with temperature. This intensifies the thermal problem in 3D FPGAs and more aggressive cooling methods such as micro-channel based fluidic cooling are required to fully exploit their benefits. The interaction between micro-channel heat sink design and the performance of a 3D FPGA is very complicated and a comprehensive approach is required to identify the optimal design of 3D FPGAs subject to thermoelectrical constraints. In this work, we propose an analysis framework for 3D FPGAs embedded with micro-channelbased fluidic cooling to study the impact of channel density on cooling and performance. According to our simulation results, we provide guidelines for designing 3D FPGAs embedded with micro-channel cooling and identify the optimal design for each benchmark. Compared to naive 3D FPGA designs which use fixed thermal heat sink, the optimal design identified using our framework can improve the operating frequency and energy efficiency by up to 80.3% and 124.0%.
INTRODUCTION AND MOTIVATION
With the continued dimensional scaling, the non-recurring engineering cost in the design of cell-based Application Specific Integrated Circuits (ASICs) escalates enormously. Compared to ASICs, FPGAs have the advantage of higher reusability, simpler design cycle and faster time-to-market. On the other hand, FPGAs suffer from tremendous programming overheads. According to [12] , programmable elements can account for 90% total footprint area, 80% total path delay and large portion of power consumption. These overheads enlarge the performance gap between FPGAs and ASICs [9] and become the main limitation to the growth of FPGA market. 3D integration technology provides an opportunity to increase the performance of FPGAs. This emerging technology stacks multiple dies on top of each other and uses Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org.
FPGA '16, February 21-23, 2016 Through Silicon Vias (TSVs) for inter-die connection which achieves shorter global wire-length and higher transistor density. However, 3D integration is not a panacea and it introduces a number of new challenges: (1) relatively larger TSV dimension will cause the increase in footprint area, (2) switch boxes that provide inter-layer connection will use more transistors resulting in extra power and delay, and (3) stacked structure will exacerbate thermal problem by increasing both the power density and thermal resistance. Recently, several significant researches have been done to exploit 3D integration in FPGAs. Authors of [6] and [15] studied the design of switch box topology in 3D FPGAs based on the fact that the number of TSVs is limited in 3D ICs. Pangracious et al. [14] proposed to use tree-based interconnect topology to build 3D FPGAs in order to reduce the area overhead and improve logic density. Instead of waferstacked 3D FPGAs, people also studied monolithic 3D FPGAs [12, 11] which, according to the results, have better performance. Meanwhile, placement and routing (P&R) tools for 3D FPGAs have also been developed, among which TPR [3] and 3D-Meander [24] are the most popular ones. Unfortunately, despite of the work that has been done, thermal issues are almost unexplored in 3D FPGAs. Conventionally, 2D FPGAs dissipate a small amount of power due to low frequency and low logic density. Therefore, they can be sufficiently cooled by air-based heat sink. With 3D integration technology, power density of 3D FPGAs will increase due to the improvement of frequency and logic density while the heat removal capacity of conventional air-based cooling does not improve accordingly. This will lead to the rise of on-chip temperature. Authors of [6] characterized the temperature of a type of 3D FPGAs based on Virtex-4 architecture and discovered a 2.5x increase in peak temperature when the number of layers increases from one to four. If the on-chip temperature exceeds over the temperature limit, operating frequency should be scaled down to make the design ther- Thermal problems can be further exacerbated when the leakage power is taken into consideration. With the geometric and supply voltage scaling, leakage power becomes the primary contributor to the total power dissipation in FPGAs. Several researches [4, 26] demonstrate that leakage power can account for as much as 40% of the total power in FPGAs at sub-100nm technology nodes. Moreover, leakage power increases non-linearly with temperature [20] (as illustrated in Figure 1 ). The positive feedback between temperature and leakage power will lead to thermal runaway if 3D FPGAs are not sufficiently cooled. In order to fix the thermal problem and fully exploit the benefits of 3D FPGAs, more aggressive cooling approaches, such as micro-channel-based fluidic cooling, should be applied. Micro-Channel-Based Fluidic Cooling (MC Cooling) comprises micro-channels etched into silicon substrate of each layer in 3D ICs. Fluid coolant (usually de-ionized water) is pumped into micro-channels and forms a distributed heat sink. This kind of micro-fluidic cooling is more efficient than air-cooling, because it can directly take heat away from adjacent regions instead of merely removing heat from top of the chip. According to [8, 27] , single-phase micro-fluidic cooling can provide heat removal capacity as high as 700W/cm 2 and two-phase cooling is even more efficient [7] . Recently, researchers built a micro-channel cooling system in the Altera FPGA with 28nm technology node and achieved much lower chip temperature compared to the air cooling [18] . Despite of these benefits, MC Cooling comes with other overheads:
(1) TSVs cannot be built through micro-channels which results in a trade-off between micro-channel density and TSVbased vertical bandwidth; (2) extra energy is required to pump the coolant in order to cool the chip even though the pumping power is quite low according to the previous work [28, 16, 13] . For years, the application of micro-channel cooling in 3D ASICs has been widely studied [17, 22] and a 2D FPGA with liquid-cooling was reported recently [18] . However, the design-time optimization of 3D FPGAs with micro-channel cooling has not been investigated. The scenario of applying MC Cooling in 3D FPGAs is far more complicated than that for 3D ASICs. On one hand, reducing micro-channel density will lead to worsening cooling. And increasing temperature may make 3D FPGAs thermally infeasible. On the other hand, due to the trade-off between channel density and vertical bandwidth, decreasing channel density leads to the increase of vertical bandwidth which may not necessarily reduce the delay of 3D FPGAs. This is because that although larger vertical bandwidth will improve the latency of 3D nets, it will also cause congestion in 2D routing channels around TSVs due to the restriction of routing resources. This may increase the delay of the whole system in some cases. As for power dissipation, although increasing vertical bandwidth will possibly reduce the use of switch boxes, the average number of transistors per switch box will increase since more transistors are needed to support vertical connection and this will cause the rise of leakage power of a single switch box. Moreover, the dynamic power is determined by the operating frequency which is bounded by the routing delay. Hence the impact of channel density on power dissipation of 3D FPGAs is also nonmonotonic. Therefore, the relationship between cooling, operating frequency and power is very complicated and we cannot find an optimal design based on a simple model. A comprehensive analysis approach is required to study the impact of the design of micro-channel heat sink on the 3D FPGA performance. Guidelines for designing 3D FPGAs embedded with micro-channel-based fluidic cooling are also necessary.
The contribution of our work is as follows: (1) We propose an analysis framework to study the impact of micro-channel density on performance and power of 3D FPGAs. (2) According to the simulation results of our framework, we provide guidelines for designing 3D FPGAs embedded with MC Cooling. (3) Our framework sweeps different micro-channel densities and identifies the optimal physical design of microchannel heat sink and 3D FPGAs. Compared to naive designs using fixed micro-channel density, our framework can improve the operating frequency and energy efficiency by up to 80.3% and 124%, respectively.
MODELING OF FPGAS
FPGAs are a kind of integrated circuits that can be programmed by customers. Based on different types of global routing architecture, FPGAs can be categorized into hierarchical style and island style [9] . In island-style FPGAs, Configurable Logic Blocks (CLBs) and I/O blocks are arranged in a two dimensional mesh and routing fabrics (including Connection Boxes (CBs), Switch Boxes (SBs) and routing channels) are evenly distributed throughout the mesh. Figure 2 illustrates the architecture of an island-style FPGA. As illustrated by the figure, an island-style FPGA have a regular pattern and can be regarded as the construction of identical tiles [9] . Each tile contains a CLB with its adjacent routing fabrics as illustrated in Figure 2 . Due to the regular pattern, island-style FPGAs exhibit a number of desirable properties including efficient connect between CLBs and routing tracks and high scalability. These properties make island-style the most commonly used architecture in modern SRAM-based FPGAs [2] . In this work we will use 2D island-style FPGAs as the baseline and stack several identical FPGA chips vertically to form a 3D FPGA.
3D FPGA Model
In our work, a 3D FPGA is modeled by vertically stacking a number of identical 2D island-style FPGA chips as illustrated in Figure 3 . Some of the switch boxes in 2D FPGAs should be extended to support inter-layer connection and these switch boxes are called 3D switch box (3D-SB) while the others are 2D switch box (2D-SB). Inter-layer connection is realized with TSVs which are fabricated between two 3D-SBs placed on adjacent layers. In our model, we assume that each 3D-SB connects to identical number of TSVs.
A primary concern about 3D FPGAs is the design of switch boxes. In this work, the so-called "subset-style" switch boxes[3, 
24
] are employed while they can be replaced with any other advanced switch box architectures [6] . The topology of subsetstyle switch boxes is illustrated in Figure 4 . In a switch box, an incoming routing track from one side can connect to routing tracks from other sides with the identical ID number. Therefore, the flexibility (Fs) of a 2D-SB equals three while a 3D-SB has Fs = 5 (including two TSVs connecting to the upper and lower layer respectively). Therefore, if both vertical and horizontal routing bandwidth in 3D FGPAs are equal to Wroute, the number of transistors used in a 2D-SB is 6 × Wroute while the number of transistors in a 3D-SB is 15 × Wroute. In practice, however, since TSVs occupy much more silicon area compared to 2D routing channels, we have to limit the number of TSVs connected by each 3D-SB (usually the number is smaller than the width of a 2D routing channel). Another concern is the electrical characterization of TSVs.
In this work, we use the following equations [19] to calculate the resistance and capacitance of a TSV:
(1)
In the equations, ρm is the resistivity of the metal filling in a TSV, rvia is the radius of the metal filling region in the TSV, tox is the sickness of the oxide layer around the TSV and ϵr is the relative permittivity of SiO2.
Area Model of 3D FPGAs
Calculation of temperature requires accurate modeling of 3D FPGA area. In this work, we follow the methodology introduced in [5] to determine the area of 3D FGPAs. According to [5] , FPGA area is modeled as the number of "minimum width transistor areas". The "minimum width transistor area" is the area of the layout of the smallest transistor, plus the minimum spacing to another transistor above it and to its right. In order to calculate the number of minimum width transistor areas, we first determine the number and size of transistors used in each programmable block (i.e. CLB, CB and SB). The schematic of each programmable block in our model is similar to the one described in [5] . For transistors (3) where DSMWT stands for the "Drive Strength of Minimum Width Transistor". Following this, we can calculate the number of minimum width transistor areas for each programmable block and the "real" area is computed by multiplying the number of minimum width transistor areas with the area value of a certain technology node. Note that up to this stage, TSVs are not considered yet. In our model, TSVs are assumed to be built between two 3D-SBs placed on adjacent layers. Therefore, the area of a 3D-SB should be added with the total silicon area occupied by TSVs connected to the 3D-SB. In order to simplify the analysis, in this work, we extend the "tile" of an island-style FPGA as introduced in Section 2 in the context of 3D FGPAs. By doing this, a tile can be categorized into 3D-tile or 2D-tile based on the different types of switch boxes constituting the tile. This is illustrated in Figure 3 The area of a tile is the total area of all the programmable blocks included in the tile.
Characterizing The Impact of Micro-Channel Density In 3D FPGAs
The structure of 3D FPGAs embedded with micro-channelbased fluidic cooling is illustrated in Figure 5 . Micro-channel structure is characterized by a series of physical parameters: channel width (W ch ), silicon thickness between channels (W wall ), channel hight (H ch ) and channel length (L ch ). Usually, the length of the channel is equal to the length of the FPGA chip (L ch = L chip ). TSVs of 3D FPGAs cannot be built through micro-channels rsultinge in a trade-off between vertical bandwidth and micro-channel density. In this work ,we assume the silicon thickness between channels can only be equal to or be a multiple of channel width (W wall = λW ch , λ = 1, 2, 3, ...). Our work is going to study how the channel density affects cooling capacity and performance of 3D FPGAs. Because a 3D FPGA is modeled by the array of tiles, we will capture the impact of micro-channel density on the distribution of different types of tiles in the 3D FPGA. In order to do this, we initially set 3D-tile as the default type of tile for all tiles. After this, we change some 3D-tiles to 2D-tiles such that micro-channels can be allocated, since micro-channels can only be allocated below the 2D-tiles due to their conflict with TSVs. It should be noted that, we assume a 2D-tile occupies the same area as a 3D-tile in a single layout even though they may require different areas. After mapping the micro-channel distribution to the 3D FPGA architecture, we can characterize different micro-channel densities by changing the distribution of 3D and 2D tiles in FPGAs. The distribution of different types of tiles will further influence the placement and routing in 3D FPGAs. 
ANALYSIS APPROACH FOR 3D FPGAS WITH MICRO-CHANNEL COOLING
In this section, we will introduce the method we used to study the impact of micro-channel density on the performance, power and energy efficiency of 3D FPGAs. In this work, we use the operating frequency to represent the performance. Energy efficiency is similar to the inverse of energydelay-product (EDP) and is defined by the following equation: Figure 6 illustrates the flow chart of the framework of our analysis methodology. The kernel of the framework is extended from TPR [3] . A given circuit (in blif format) is fed into T-Vpack (which is part of VPR [5] , a P&R tool for 2D FPGA) and ACE2.0 [10] (an activity estimation tool) to generate CLB-level netlist and transition density, respectively. Then, CLB-level netlist, FPGA architecture and distribution of micro-channels are fed into TPR for placing and routing. Meanwhile, TPR also calculates the final delay of the circuit based on its embedded delay model. Following this, we take the placement file, routing file as well as the delay and transition density as the inputs to our power model to calculate the power dissipation profile. The power profile is used to calculate temperature profile which is then fed back into power model to update the leakage power. This "power-thermal" loop is necessary because of the positive feedback between leakage power and temperature. The "power-thermal" loop stops when the update of temperature profile is negligible.
Background of TPR
TPR is one of the most popular P&R tools for 3D FPGAs. Originally, TPR can only support the 3D FPGA architecture with fully-vertical inter-connection. We extend the tool so that it can support different types of switch boxes (i.e. 2D-SB and 3D-SB). In this subsection, we will briefly introduce the principles of TPR. More detailed descriptions can be found in [3] . TPR takes the CLB-level netlist and the 3D FPGA architecture as inputs. In this work, we add micro-channel distribution as another input. In the first stage, the netlist is partitioned into different layers to minimize the total number of TSVs. Following this, TPR performs timing-driven partitioning-based placement successively from the top layer to the bottom layer. After the position of each CLB is determined, the tool uses Pathfinder Negotiated Congestiondelay algorithm [5] to assign routing elements to each net based on the distribution of routing resources. After routing is finished, the delay of the circuit is calculated based on Elmore Delay Model. Finally, TPR will output placement file, routing file and delay which will be used in the following stages.
Power Model
In order to compute the temperature and evaluate the energy efficiency of 3D FPGAs, we should first calculate the power dissipation profile. Since a 3D FPGA is modeled by an array of 3D and 2D tiles, we assume each tile has the uniform power. Power dissipation profile is characterized by the spacial distribution of tile power. Power dissipation of 3D FPGAs is the sum of dynamic power and static power (leakage power). The methodology of modeling the two types of power is introduced as follows.
Dynamic Power
Dynamic power is generated by transition of signals. Signal transition will cause frequently charging and discharging capacitors and this forms the most significant contribution to the dynamic power in 3D FPGAs. This kind of dynamic power can be modeled using the following equation:
where Ci, Vi, Di and fi are the total capacitance, swing voltage, signal transition density and operating frequency of source i. Another component of dynamic power is the short-circuit power which is caused by signal switching. According to [21] , while short-circuit power in CLB accounts for a higher percentage of its total power, it only contributes less than 10% to the power in the interconnect in an FPGA. In order to capture both factors of dynamic power and avoid complex computation, in our work, we model the dynamic power in two independent parts: Interconnect Power and CLB Power. Interconnect Power: The calculation of interconnect power takes two steps. First, we compute the power for each routing segment (including horizontal routing segments on each layer and TSVs). A segment connects two terminals (a terminal is an input/output pin of a CLB or a pin of an SB). The equation to calculate the capacitor-charging-based dynamic power of segment i is shown as follows:
where Ci is the sum of input/output capacitance of the two terminals and the distributed capacitance of the segment; Di is the signal transition density of the net to which the segment belongs and its value is calculated by ACE2.0 [10] ; fi is the operating frequency of the system and Vi is simply taken as the supply voltage. Second, we project the power of each segment to the tile because the power of each tile is what really matters. In order to do this, we divide the segment power equally into two parts, and add one part to the tile containing one terminal and add the other part to the tile containing the other terminal. Note that, it is possible that both terminals of a segment reside in one tile. In this case, the whole power of the segment is assigned to this tile.
CLB Power: According to our FPGA model, a CLB contains LUTs, flip-flops, multiplexers, buffers and memory cells. Modeling the dynamic power in a CLB is difficult due to the complex sub-routing within the CLB. In our work, we use a simulation-based method to model the CLB power. First of all, we will calculate the average dynamic power of an active CLB. We use SPICE to simulate each component in a CLB with random input vector pairs at a certain frequency. The simulation gives the dynamic power of each component for all the pairs of input vector. The dynamic power of each component is then taken as the average power for all the input vector pairs of the component, with the assumption that each pair of input vectors has the same probability of occurrence. The dynamic power of a CLB (P owerCLB,0)is the sum of all the components constituting the CLB. Next, we take P owerCLB,0 as the input and scale it with the real frequency to get the real dynamic power of an active CLB. Finally, this power is added to the tile which contains the CLB.
Static Power
Static power is caused by the leakage current in FPGAs and can be categorized into two types: gate leakage and source-to-drain leakage. Detailed modeling of the two types of static power is difficult and not accurate at small technology nodes. Therefore, in our work, we take the experiment value from [1] and calculate the leakage power for different temperatures based on an analytical extrapolation function. According to [1] , at 85 o C, Typical High-Performance Stratix III has 3W static power for 300K logic blocks (similar to the tile in our model). Since the static power is linear to the number of logic blocks, we can calculate the static power for a single logic block, which is around 10µW . This value is then taken as the static power of a 2D-tile at 85 o C in our 3D FPGA model. The static power of a 3D-tile is γ × 10µW where γ is a scaling parameter equal to the ratio of the number of minimum width transistor areas (without considering the area occupied by TSVs) between a 3D-tile and a 2D-tile. Following this, we use the following equation [20] to extrapolate the static power for different temperatures:
Now we get a look-up-table for static power at different temperatures, which is then fed into the simulation framework.
Thermal Model
The thermal behavior of 3D FPGAs can be characterized with thermal resistance-capacitance (RC) network which resembles the electrical RC network. In order to do this, a 3D FPGA is divided into grids and each grid is represented with a node in the thermal RC network. In this network, the voltage of each node represents the average temperature of the grid while the power of the grid is indicated by the current source connected to the grid. The resistance connecting each adjacent pair of nodes indicates a heat transfer path and the thermal capacitance represents the ability to store heat. In this work, we are mostly interested in steady state thermal behavior. Therefore we will primarily focus on the resistance network. Our resistance network is the same as the one introduced in [23] , which is illustrated in Figure 7 . The cooling performance of micro-channels is captured by three resistors: R cond , Rconv and R heat . R cond and Rconv represent the heat transfer between silicon and fluid by conduction and convection respectively. R heat indicates the heat transfer within the fluid. More detailed definition of the resistance network can be found in [23] .
After we get the profile of the total power dissipation, we 
where G is the thermal conductance matrix determined by the thermal resistance network. During the simulation process, when a new temperature profile is calculated, the static power of each tile is updated according to the new temperature. Then the temperature is recalculated based on the updated total power dissipation. This is illustrated in Figure  6 . This loop stops until the temperature is converged.
SIMULATION AND DISCUSSION
We will describe our simulation and discuss the results in this section. First, we will introduce the setup of the simulation. Next, we use our analysis framework to perform exploration on 12 benchmarks from the Microelectronics Center of North Carolina (MCNC) benchmark suit and study (1) the different characteristics of air-cooling and micro-channelbased fluidic cooling and (2) the impact of micro-channel density in a 3D FPGA embedded with micro-channel-based fluidic cooling.
Setup of the Simulation
During our design space exploration, we study how the cooling method and the density of micro-channel influence the electrical properties of a 3D FPGA. By "electrical properties", we mean the operating frequency (Freq.), Power per CLB (PCLB) and Energy Efficiency (E.E.), where Energy Efficiency is calculated with Equation 4. We first run the thermo-electrical simulation on fully-verticalconnected 3D FPGAs (100% of the tiles are 3D-tile) with aircooling. Following this, we apply micro-channel-based fluidic cooling in 3D FPGAs and sweep the micro-channel density. In our work, we change the micro-channel density by increasing the silicon thickness between channels in multiples of the channel width ( Figure 5 ). We define a new parameter, Micro-Channel Pitch (M C pitch ), as M C pitch = W wall W ch to describe the micro-channel density. The vertical bandwidth is maximized for each micro-channel density. Due to the restriction of the size of benchmarks, we range M C pitch from 1 to 11 with a step of 1, which implies that the percentage of 3D-tiles among all tiles ranges from 50% to 92%. For the fully-vertical-connection, only air-cooling is applied due to the lack of space for micro-channels, while for other cases a hybrid cooling scheme (air-cooling and micro-channel-based fluidic cooling) is used. In this work, we assume the temperature limit is 85 o C [23] . If the maximum temperature exceeds this limit, the design is regarded as thermally infeasible. In this case, we scale down the operating frequency to fix the thermal problem. Other simulation variables for 3D FPGAs embedded with micro-channel-based fluidic cooling are summarized as follows:
(1) 3D FGPAs stack up to four functional layers.
(2) Each 3D-SB placed on the i th layer has four TSVs [25] , The dimension of a single microchannel is set as 50µm and 100µm (Width × Height). This will yield a pressure drop less than 1 × 10 5 P a and pumping power blow 1W according to the model in [23] . Other channel dimensions may have different impact on cooling performance. As illustrated by [23] , increasing channel width on one hand reduces the pressure drop hereby leading to smaller thermal resistance, but on the other hand it causes decrease in the coverage of micro-channels hence degrading the heat removal rate. Since the main focus of this paper is the impact of micro-channel density on the performance of 3D FPGAs, we fixed the channel dimension during our simulation although our algorithm works for other channel dimensions as well. Table 1 lists the rest of the important electrical and thermal variables employed in this work. Table 4 and Table 5 show the results of some benchmarks on 3D FPGAs before and after scaling down the operating frequency, respectively. Because of the page limits, only seven out of the twelve benchmarks are shown in the tables. The tables illustrate the peak temperature (MaxT) as well as normalized operating frequency (Freq.) and energy efficiency (E.E.) for each benchmark. In Section 4.2 and 4.3, we will discuss the results in detail.
Fully-Vertical-Connected 3D FGPAs
In fully-vertical-connected 3D FPGAs, all tiles are 3D-tile, thus micro-channels cannot be allocated under this scenario. Only air-cooling is applied during the simulation. Figure  8(a) shows the maximum peak temperature over all benchmarks as well as normalized operating frequency, power per CLB, and energy efficiency for each number of stacked layers. For each benchmark, operating frequency, power per CLB and energy efficiency are normalized to their counterparts when the benchmark is realized on a 2D FPGA. Then the normalized characters are averaged over all the benchmarks to get the results shown in Figure 8(a) . Under this scenario, since there is no room to place micro-channels, we can only use air-cooling to remove heat from top of the 3D FGPA. As illustrated by this figure, the peak temperature of some benchmarks exceeds T limit = 85 o C even when only 2 functional layers are stacked. In addition, the peak temperature increases dramatically by increasing the number of layers in a 3D FPGA. Even though operating frequency and energy efficiency can be improved by stacking more layers in a 3D FPGA, these benefits cannot be realized with a thermally-infeasible design. In order to fix the thermal problem, we scaling down the operating frequency for each According to the results illustrated in Figure 8 (b), due to the thermal restriction, increasing number of layers may lead to the degradation of frequency and energy efficiency. Power per CLB also decreases because of the reduction of operating frequency. This degradation can be reduced by using micro-channel-based fluidic cooling. Micro-channel structure imposes new constraints on 3D FPGA design and we will study the impact of micro-channel density on the electrical properties of 3D FPGAs in Section 4.3.
3D FPGAs with Micro-channel-based Fluidic Cooling
In this scenario, some 3D-tiles are replaced by 2D-tiles, thus a number of micro-channels can be allocated below 2D-tiles. We will discuss the impact of micro-channel density on the cooling and performance of 3D FPGAs in this section.
Before Fixing Thermal Problem
The trend of operating frequency (Freq.), power per CLB (PCLB) and energy efficiency (E.E.) with increasing microchannel pitch for different number of stacked layers is shown in Figure 9 (a)-(c). Note that increasing micro-channel pitch reduces the micro-channel density thereby reducing the cooling capacity. The approach of normalization of each of the three electrical properties is the same as described in Section 4.2. In each figure, we also show the maximum peak temperature over all benchmarks. According to Figure9(a), as we increase the number of layers in a 3D FPGA, the frequency will increase. However, for a certain number of layers, the change of frequency with reducing micro-channel density is not monotonic. In our work, reducing micro-channel density represents the increase in vertical bandwidth. This nonmonotonic relationship between frequency and vertical bandwidth is also discovered in [24] . This phenomenon can be caused by the following two factors:
(1) Although increasing vertical bandwidth will improve the latency in 3D nets, it can also cause congestion in 2D routing channels around the 3D-SB which may lead to the increase in delay of other nets. (2) The distribution of routing resources changes when we alter the micro-channel density, which forces the router to find completely new paths to connect CLBs. Since each routing channel has tracks of different lengths, these paths may have very significant variations in the RC parameters which are essential in calculating delay. Since both power density and energy efficiency depend on the operating frequency, they also exhibit nonmonotonic behavior as shown in Figure 9(b)-(c) . Another interesting point that can be made is about power per CLB (PCLB). According to Figure 9(b) , for a certain number of layers, power per CLB has an increasing tendency when reducing the micro-channel density. This can be explained as follows: reducing micro-channel density leads to lower mass flow rate of coolant which will reduce the cooling capacity of micro-channel heat sink and increase the on-chip temperature; increased on-chip temperature will in turn cause the rise of leakage power, thus increasing the power dissipation for each CLB. On the other hand, for a fixed micro-channel density, when another functional layer is added, the power dissipation per CLB will increase. This behavior is determined by three factors: (1) operating frequency may increase with the number of layers which leads to the increase of dynamic power per CLB; (2) increasing the number of stacked layers enables us to realize a certain benchmark using smaller number of CLBs; (3) increasing the number of layers causes the rise of on-chip temperature thereby increasing the leakage power per CLB due to the positive feedback between temperature and leakage power.
Since increasing power per CLB may imply larger power dissipation and energy efficiency is inversely related to power, there is a decreasing tendency in energy efficiency by increasing micro-channel pitch. As illustrated in Figure 9 (c), this tendency becomes more obvious when the number of layers increases. By increasing the number of layers for a fixed micro-channel density, however, energy efficiency will increase since operating frequency is dominated.
After Fixing Thermal Problem
According to the results shown in Figure 9 , although microchannel-based fluidic cooling can significantly reduce the on-chip temperature. if micro-channel density is too small, thermal violations can still occur. For these cases, we scale down the operating frequency to fix the thermal problem and the results are shown in Figure 10(a)-(c) . The figures illustrate that the peak temperatures of all designs are below T limit and the curves of electrical properties are pulled down at the high-micro-channel-pitch end due to the reduction of operating frequency. According to Figure 10(a) , the drop of frequency becomes sharper and occurs earlier when more layers are stacked in 3D FPGAs. This is because increasing the number of layers intensifies the thermal problem of 3D FPGAs and frequency should be scaled down to a much lower level in order to fix the thermal problem. The degradation of frequency meanwhile causes the reduction in power dissipation per CLB and energy frequency. According to the figures, we can get the following guidelines for designing 3D FPGAs embedded with micro-channel-based fluidic cooling: (1) the performance of 3D FPGAs with higher vertical bandwidth will be "locked" due to the thermal restriction; (2) there is an "unlocked window" close to the lower-verticalbandwidth end where we can find the optimal design of a 3D FPGA; (3) the size of unlocked window shrinks as increasing the number of layers; (4) the architecture of optimal design is different for different number of layers. Our analysis framework also identifies the optimal physical architecture of 3D FPGAs with different number of stacked layers for each benchmark. In order to evaluate the optimal designs, we compare their operating frequency and energy efficiency against that of the designs with largest microchannel density (Best Cooling) and smallest micro-channel density (Best Bandwidth), respectively. The improvement is averaged over all the benchmarks and the results are shown in Table 2 and Table 3 . According to the tables, the improvement with respect to the Best Bandwidth Design is generally larger than that with respect to the Best Cooling Design. This is because that the frequency of the Best Bandwidth Design has already been scaled down to fix the thermal problem, thus leading to the degradation of performance and energy efficiency. It should be noted that optimal design varies for different benchmarks and different number of layers. The optimal design of some benchmarks are shown in bold type in Table 5 .
CONCLUSION
In this work, we studied the thermo-electrical interaction in 3D FPGAs. Our results demonstrate that micro-channel based fluidic cooling is necessary to realize the true potential of 3D integration technology. We proposed an analysis framework to study the impact of micro-channel density on the performance and identify the optimal 3D FPGA design for each benchmark. According to the simulation results, we proposed guidelines for designing the physical architecture of 3D FPGAs embedded with micro-channel-based fluidic cooling. Comparison with Best Cooling Design and Best Bandwidth Design shows that the optimal design identified by our framework can improve operating frequency and energy efficiency up to 80.3% and 124.0% respectively. 
