Abstract: 3D ICs is a solution for multi-core processors, with critical challenge of internal thermal problem. A new solution for this problem is interlayer cooling system, which expands the floorplan design space of micro-processors in 3D ICs. This work proposes a floorplanner for multicore processor in 3D ICs with interlayer cooling system, integrated with greedy and particle swarm optimization method. The results show that the maximal temperature and temperature gradient reduced by 10.3°C and 9.2°C respectively, compared with the baseline design in 3 active device layers.
Introduction
As an important signal of the develop of integrated circuit technology, the process feature size is becoming smaller but closer to the physical extreme limit, according to the ITRS roadmap [1] . To overcome the limit of long wire delay in 2D processor, three-dimensional integrated circuits (3D ICs) is introduced as a solution, the capabilities of a single processor are extended too [2] , the performance of microprocessors can keep improving. The density of power consumption in processor means higher internal temperature, the performance and reliability will be influenced by too high temperature simultaneously. As the temperature increases, the mobility of carriers decades and the resistance of metal wires increases, which both result the descending of the speed of processor [3] .
Reliability has more direct relationship with temperature. The time to failure of integrated circuits can be predicted as e Ea=kT , where T is the absolute temperature, k is the Boltzmann's constant, and Ea is the activation energy of the failure mechanism being accelerated by the increased temperature [4] . Furthermore, the electromigration of metal wires is influenced by the temperature gradient, can also reduce the reliability of integrated circuits [5] . In 3D ICs, the thermal problem gets more concerns, the power density increases significantly compared to 2D ICs due to the reduced distances between units, the thermal conductivity of bond and insulation layers between the silicon layers is too low compared to silicon and metal [6] . The external heat removal solutions, like heat sink with fans or even fluid cooling, have little influence on the internal temperature gradient [7] .
Interlayer cooling can remove the internal heat flux of processor effectively, and is the only feasible heat removal solution for multiple active layers in 3D ICs so far [8, 9] . In this scheme, the coolant pass through the fluid cavities between the active layers, and remove the internal heat flux as Fig. 1 shows. Floorplaning algorithms are for minimizing chip area in original, but the thermal-aware floorplanning algorithms decide the placement of units according to their thermal characters, to reduce and even out the internal temperature of processor through thermal diffusion effect [10] . The units are classified as hot and cool ones, are considered as thermal sources and sinks separately, and arranged by the multi-objective algorithm, liner programming or simulated annealing methods [4] . In the interlayer cooling system, the distribution of heat removal coefficient is also influenced by the relative location of units and fluid cavities.
This work proposes a thermal-aware floorplanner with iterative floorplanning algorithm based on greedy method and particle swarm optimization (PSO) method, to solve the thermal problem in multi-core 3D ICs with interlayer cooling system, considering the balance between the maximal temperature, temperature gradient and total wire length in the design with 3 active device layers.
Design flow
The floorplanner proposed in this work is design to reduce the temperature of the multi-core 3D ICs, especially internal temperature. To execute efficiently, the algorithm is divided into two phases. One for initializing, and the other for optimizing. The main process of design flow is shown in Fig. 2 .
To obtain the temperature of chip under different floorplan solutions, a simplified thermal model is used in this algorithm, which is derived from Ref. [4] . The chip is divided into cubical thermal cells as Fig. 3 shows. The thermal resistance in 6 directions and thermal capacitance of the thermal cell are calculated by the follow expressions:
where k Si is the thermal conductivity of silicon, and sc Si is specific heat capacity per volume of silicon. These cells constitute the thermal RC grid for the chip, to obtain the temperature of each cells through an iterative method.
Initializing floorplan
The first phase initializes the floorplan according to the characters of units, such as power density and critical level.
For the hot units like cores, the primary target is to find the coolest position where away from the similar hot ones and get better heat removal coefficient refer to interlayer cooling system. The rest ones like last level caches (LLCs) and Switch-buses are arranged close to the connected units to reduce the total wire length and act as heat sink to reduce internal temperature.
The units to be arranged are selected from candidate units list according the power density. All units are classified into 3 types, such as cores, LLCs and Switchbuses. For fairness, the units of the same type have same power density in this phase, according to the homogeneous baseline design.
Therefore, this phase is divided into 3 sub-phases: 1. Hot units seek for coolest positions, as cores; 2. Medium units approach the connected units arranged, as LLCs; 3. Coolest units seek for minimum total wire length, as Switch-buses.
In the first sub-phase, the cores are seeking for coolest positions one by one similar to greedy method. The coolest position means as large distance from the cores arranged already as possible, while as close to the coolant inlet as possible, where have better heat removal coefficient. When a new core is adding into the floorplan, several candidate floorplans are evaluated to select the one with the minimum temperature and gradient.
Then, the LLCs are adding into the floorplan one by one. For their lower power density refer to cores, they are considered to be acted as heat sink. The primary goal of this sub-phase is to find the minimum distance for LLCs to their corresponding cores arranged already in previous sub-phase, to reduce total wire length.
Switch-buses are adding into the floorplan in the final. They are considered to be heat sink as LLCs, and are arranged carefully to reduce the total wire length, for a Switch-bus is connected to a group of LLCs. The processes in the latter two subphases are similar to linear programming method.
Optimizing floorplan
The floorplan achieved by the prior phase has some limits due to the sequential operations. The design space has not been explored sufficiently.
Particle swarm optimization is a widely used optimization method. It improves a candidate solution by iteratively trying according to measure standard [11] . The units move in the search space with 4 candidate trends:
• Prior move trend in last step;
• Keep away from the closest neighbour;
• Closer to the best position in global;
• Closer to the best position in own history. The best position means the position with lowest temperature. In this phase, the move trend (V i ) of a candidate unit is calculated by the equation:
where V iÀ1 is the prior move trend of the unit, P current is the current position of the unit, P Closest Neighbour , P global best and P hisitory best are the positions of the closest neighbour, best position in global and best position in own history for the unit respectively. w is the weight value for prior move trend, c 1 , c 2 and c 3 are the corresponding learning factors, and r 1 , r 2 and r 3 are random numbers in [0:1].
The final floorplan solution with lowest maximum temperature and maximum temperature gradient is achieved by an iterative process like simulated annealing method.
Experimental results
The floorplan designs in our work are realized in the simulation platform named 3D Interlayer Cooling Emulator (3D-ICE), which simulates the transient thermal behaviour of 3D IC structures with interlayer cooling system, to perform thermal analysis of 3D ICs in the early design stages of processor [9] . The baseline design for 36 cores are distributed in 3 active layers, 12 cores for each layer using same floorplan as Fig. 4 shows. The total area of chip is fixed at 10 Â 10 mm 2 , for minimizing chip area is not the primary goal in this work.
The 3D stack structure for our work is shown in Fig. 1(b) , the chip starts with a 10 µm thick PCB layer at bottom, followed by the first micro-channel cooling layer, which is 100 µm thick. A 50 µm thick silicon substrate layer is placed above the micro-channel, and bellow the bottom active device layer, which is 2 µm thick. There is a 10 µm thick BEOL (Back End Of Line) layer for bonding between the bottom active device layer and the second micro-channel cooling layer. Then, the second and third micro-channel cooling layer, silicon substrate layer, active device layer and BEOL layer are placed above. At the top of chip, there are the top microchannel layer and a silicon layer to finish the stack.
In this work, we use the power information from ESESC, a fast microprocessor architecture simulator [12] . The benchmarks binding to the cores are the same one with similar power in medium level, for fairness in the comparison with baseline design. Fig. 5 shows the temperature maps for the 3 layers of baseline design, where the green arrows are the directions of coolant flow. The average temperature of layers are 65.85°C, 76.17°C and 79.06°C from the top to bottom layer separately, and the maximum temperature of whole chip is 122.85°C in the bottom active layer, the maximum temperature gradient is 46.4°C in the same layer. The floorplans achieved by our floorplanner are shown in Fig. 6 and Fig. 7 . The 3 figures in Fig. 6 are the designs for 3 active device layers respectively after the initialize phase, the initialized floorplan achieved by the first phase of our The final result optimized further is shown as Fig. 7 , the average temperature of layers are 67.69°C, 73.63°C and 73.19°C separately, and the maximum temperature is 112.55°C and the maximum temperature gradient of 37.2°C, which are reduced by 10.3°C and 9.2°C compared to the baseline design, respectively.
Conclusion
This work proposes a thermal-aware floorplanner to solve the thermal problem in multi-core 3D ICs with interlayer cooling. This solution processes through two phases to achieve the final design, the first phase achieves initial floorplan through greedy method and linear programming method, and the second one achieves the optimized result design by particle swarm optimization method, considering an balance between temperature, gradient and total wire length. The experimental results obtained in a 36 cores design with 3 active layers realized by 3D-ICE show that the maximal temperature is reduced by 10.3°C, and the temperature gradient is reduced by 9.2°C compared to the baseline design.
