Abstract. A novel 3D Tree-based Multilevel FPGA architecture that unifies two unidirectional programmable interconnection network is presented in this paper. In a Tree based architecture, the interconnects are arranged in a multilevel network with the logic blocks placed at different Tree levels using ButterflyFat-Tree network topology. 2D physical layout development of a Tree-based multilevel interconnect network is a major challenge for Tree-based FPGA. A 3D interconnect network technology leverage on Through Silicon Via (TSVs) to redistribute the Tree interconnects, based on network delay and thermal considerations into multiple silicon layers discussed. The impact of of Through Silicon Vias and performance improvement on 3D Tree-based FPGA analyzed and also an optimized physical design technology leveraging on TSV, Thermal-TSV (TTSV), and thermal analysis are presented. Compared to 3D Mesh-based FPGA, the 3D Tree-based FPGA design reduces the number of TSVs by 29% and a performance improvement of 53% recorded in our place and route experiments.
Introduction
Three Dimensional integration is a promising technology to manufacture high density and high performance 3D FPGA. 3D integration involves stacking of multiple silicon wafers interconnected with Through Silicon Vias (TSVs) and vertical stacking of multiple chips reduces interconnect delays and increases overall integration density. Advances in 3D integration and vertical interconnect (TSVs) technologies are undoubtedly gaining momentum and have become the critical interest of the semiconductor community today. A Field Programmable Gate Array (FPGA) is a flexible and reusable architecture with a symmetrical array of logic blocks interconnected by routing resources. To support the growing demands, FPGAs must be built with higher logic density and interconnection networks. In such huge FPGA systems, 3D integration technology and the use of through-silicon vias (TSVs) for inter-layer communication is emerging as an effective solution to diminish the impact of increasing the interconnect delays.
For the past several years, major studies and research on 3D FPGA design and integration were conducted by industry and research institutions. A survey of existing design methods and tools for 3D integration is presented in [2] and the details of the existing 3D manufacturing technologies are presented in [3] . In [4] a 3D place and route (TPR) was presented, to investigate the wire length and delay associated 3D Mesh-based FPGA. In order to support the implementation of an application on such a device, VPR based tool [10] is used. TPR [4] is flexible on deciding the number of vertical channels compared the horizontal channels, however all switch blocks are assumed to be 3D may lead to large number of unused TSV resources, which increase manufacturing cost. Furthermore TPR also assumed that the number of TSVs are electrical equivalent of the horizontal channel width. In 3D integration technology, the TSVs are much thicker than horizontal wires [9] , which makes this assumption impractical.
A design framework for 3D FPGA architecture level exploration methodology was presented in [6] . It includes an additional feature to explore the vertical interconnect distribution, however this leads to usage of 2D and 3D switch blocks intermittently, which may lead to number of design and manufacturing issues. A dynamically reconfigurable 3D Mesh-based FPGA was presented in [7] , which consisted of three physical layers: logic block and local interconnect layer, routing layer, and memory layer. Recently, in [8] analyzed the performance benefits of a monolithically stacked 3D Mesh-based FPGA. However they used very fine TSVs 3D integration, which allowed them to stack the configuration memory on top of the of the FPGA layers.
Fundamental understanding of the electrical, mechanical, and thermal properties of vertical interconnects (TSVs) is essential in successful physical design of TSV-based 3D ICs [9] . The major challenges 3D integration technology facing today are high inter-layer temperature and limited number of TSVs. We propose architecture level solutions to optimize the number of TSV and inter-layer temperature. A detailed thermal analysis of heterogeneous Mesh-based FPGA discussed in [13] , which considers functional units likes LBs, BRAM, DSP units etc. Nevertheless the programmable interconnect network of FPGA consumes a lot of power as well. In contrast, the methodology we propose, considers the power consumption of logic blocks and interconnect sections separately to investigate the temperature variations in FPGA. The rest of paper is organized as follows. Section 2 presents the Tree-based MFPGA and the 3D MFPGA exploration methodology elaborated in section 3. Section 4 discusses the performance analysis experiments and TSV count optimization is illustrated in section 5. Section 6 explains the thermal analysis of 3D MFPGA and section 7 concludes the paper.
Tree-based Multilevel FPGA Interconnect Organization
Mesh is the most studied and used industrial topology. Considerable amount of research work [1] and industrial applications has been implemented in the case of mesh architecture. Mesh is a regular island style structure with an array of logic blocks with input pins on each side. A new re-programmable Tree-based Multilevel FPGA architecture is proposed in [11] . The main motivation for the Tree-based FPGA architecture is to achieve the best performance by balancing interconnect and logic block utilization, where logic blocks and routing resources are sparsely partitioned into a multilevel clustered structure [12] . In a Tree-based FPGA architecture, the LBs (Logic Blocks) are grouped into clusters located at different levels of the Tree. Each cluster contains a switch block to connect local LBs. A switch block is divided into MSBs (Mini switch Blocks). The Tree-based FPGA architecture unifies two unidirectional upward and downward interconnection networks using a Butterfly-Fat-Tree topology to connect Downward MSBs (DMSBs) and Upward MSBs (UMSBs) to LBs inputs and outputs. Figure 1 illustrate 2 level Tree-based architecture. The number of DMSBs of a cluster located at level ℓ is equal to the number of inputs of a cluster located at level ℓ − 1. The upward UMSB network connects LBs outputs to the DMSBs at each level. As illustrated in Figure 1 , the UMSBs are used to allow LBs outputs to reach a large number of DMSBs and to reduce fanout on feedback lines. The number of UMSBs of a cluster located at level ℓ is equal to the number of outputs of a cluster located at level ℓ − 1. UMSBs are organized in a way allowing LBs belonging to the same "owner cluster" to reach exactly the same set of DMSBs at each level. Thus positions, inside the same cluster, are equivalent, and LBs can negotiate with their siblings about the use of a larger number of DMSBs depending on their fanout. The input and output pads are grouped into specific clusters and are connected to UMSBs and DMSBs, respectively as presented in figure 1 . Thus, all input and output pads can reach any LBs of the architecture.
Exploration Methodology for 3D Tree-Based FPGA
The proposed methodology for design and exploration of 3D Tree-based FPGA architecture is illustrated in figure 2 . The HDL generator is designed to generate VHDL code based on a hierarchical approach that partitioned the design into smaller sections, implement them separately and assemble them together at the final design phase. The VHDL code is generated based on the architecture description file, which is used directly for design evaluation and analysis. The thermal model [16] is used to extract the thermal profile of the multi-layer chip based on layout geometrical features and power consumption of the functional unites. The 3D Tree-based FPGA evaluation module includes a top-down recursive partitioning tool. The routing algorithm implemented is "Pathfinder" [12] , which is an iterative, negotiation-based approach. The physical design experiments are performed based on the layout generated using ST Micro's 130nm technology node. Mentor's Spice accurate circuit simulator Eldo is used to estimate the wire delay and power consumption of switches and interconnection networks at different tree levels. Designing the tree interconnect is a major challenge for Tree-based FPGA. In order to maintain the hierarchy of Tree-based FPGA, a special layout methodology is used. We propose two ways to organize the Tree-based Multilevel FPGA layouts.
2-Dimensional Tile-Based Multilevel FPGA Design
The physical design experiments revealed the wire length increases exponentially as the Tree grows to higher levels. A major disadvantage of Tree-based architecture compared to Mesh where the largest wiring distance is fixed. The layout experimentation was performed based on the layout generated using ST microelectronics 130nm technology node. The 2D Tile-based layout illustrated in figure 3 was developed to spread the congestion and wire density over the MFPGA surface. However this layout is not comparable to industrial Mesh layout in terms of speed and performance for larger circuits due to larger wire length at higher levels. Nevertheless this design can be used to build compact size and Mesh of Tree based FPGA [12] .
3-Dimensional Tree-Based Multilevel FPGA Design
To mitigate the wire length issue in 2D Tile-based layout, we designed a new Tree-based 2D layout with 3D adaptability, is illustrated in figure 4 . The interconnect organization in Tree-based layout is arranged in way to bring together every cluster and its corresponding interconnect in order to form level wise sections to enable the study of 2 layer 3D Tree-based FPGA. The figure 4 Tree levels 0 to 3 [17] . This layout design offers the possibility to re-distribute the Tree interconnect at certain level called the break point level based on wire delay estimation from the timing characterization and thermal analysis data. However this is not possible with 2D Tile-based layout due to the tiled and rearranged Tree interconnection format.
The subpath timing characterization is performed for both 2D Tile-based and Treebased layout using the layout generated in ST Micro's 130nm Technology node. Maximum wire length at different levels are evaluated from the layout and used Mentor's spice accurate circuit simulator Eldo to investigate delay and power consumption. An accurate ST 130nm transistor level technology models were used to investigate switch, hal-00872757, version 1 -15 Oct 2013 MX1  MX2  MX3  MX4  MX1  MX2  MX3  MX4  MX4  MX3  MX2  MX1   DMSB  DMSB  DMSB   UMSB   MX1  MX2  MX3  MX4  UMSB  MX1 MX2  MX3  MX4 figure 5 , where the upward, downward and feedback interconnection networks are marked. We performed delay estimation and power consumption analysis on all three interconnection networks. Figure 6 illustrate the upward delay measured up to 7 levels of the Tree-based architecture. Similar delays measured for other networks as well. The interconnect delay investigation substantiate the exponential increase in wire delay as the tree grows to higher levels. Based on the measured delay and thermal data, the 2D Tree-based layout design is re-distributed into 2 silicon layers at a higher interconnect Tree level called the break point level. The decision to choose the break point level is based on measured delay and thermal data. In this study the interconnect network is partitioned between level 3 and 4 as the average delay is above 2ns to form a two layer 3D Tree-based Multilevel FPGA. To illustrate the design process a seven level architecture is presented in figure 7 , where the break point is shown between levels 3 and 4. For this study the communication is realized with Through Silicon Via (TSV) and electrical characterization of TSVs was performed based on the approach from [15] . The electrical model and parasitic components for each TSV was realized using the electrical model of TSV interconnect presented in [15] . The interconnect length of levels above the break point level for 3D Tree-based FPGA timing characterization was extracted from the re-designed floorplan shown in figure 7 . The 2 layer 3D Tree-based FPGA architecture presented in figure 7 used for experimentation and comparison. Nevertheless as Tree grows to higher levels, the multiple layer 3D MFPGA can be designed.
3D MFPGA Experimental Evaluation
To evaluate the performance of the proposed 3D MFPGA architecture, we place and route the largest set MCNC 3 benchmark circuits, and compare with the 3D Mesh-based FPGA architecture [6] . The netlist is partitioned into tree based cluster nets attributing randomly to each cluster a position inside the owner. An iterative PathFinder which is, negotiation-based approach [12] is used to implement the placement and routing algorithm which is able to deal with any graph representing the interconnection routing resources. The 3D routing tool was adapted to handle the performance analysis of 2D and 3D layout with TSV interconnections, based on the 2 layer Tree-based FPGA. The detailed performance analysis of 2D and 3D designs presented in table 1. The Tree-based FPGA architecture with tree level from 0 to 6 with arity 4 is presented in table 1. The critical path delay comparison between 2D and 3D layout shows that the small and big designs outperform in 3D implementation of Tree-based FPGA compared to the 2D counterpart. An average speed improvement of 68.7% is recorded in this experiment compared to our 2D design. The gain obtained in performance is due the optimized wire delay at higher levels of the Tree interconnect by re-arranging them in the 2 layer 3D chip with the Tree interconnection between level 3 and 4 is realized using TSVs. Similarly the comparison with 3D Mesh-based FPGA [6] with 32% gain shows, 3D Tree-based FPGA outperform in all benchmarks and an overall performance gain of 53% recorded in the experiment.
Vertical Interconnect (TSV) Optimization
To make 3D Tree-based Multilevel FPGA more effective in terms of design and manufacturing, its essential to minimize the TSV count. The vertical interconnect optimization is be done using Rent's parameter "p" defined for the an architecture as follows. The Tree level is represented as ℓ and m is the cluster arity, c is the number of in/out pins of an LB and IO is the number of in/out pins of a cluster located at level ℓ.
A Rent's parameter based random level vertical interconnect minimization program developed on 3D Tree based router is used to find the smallest number of vertical interconnects to implement MCNC netlist using a binary search methodology. The optimization program consider the same architecture level, in this case the break point level with different p values to estimate the minimum TSV requirement for a particular netlist. An example of two level Tree-based MFPGA with p=0.73 illustrated in figure 8 , in which a 27% reduction of interconnects requirement achieved. The optimization of Tree-based interconnect network based on Rent's parameter as follows.
The Tree-Based Multilevel FPGA Interconnect Network Model
In downward interconnection network, a cluster situated at level ℓ contain N in (ℓ − 1) DMSB with k outputs and
inputs. DMSBs being full crossbar devices, total number of downward switches at level ℓ cluster is k(N in (ℓ) + kN out (ℓ − 1)). In upward interconnection network, every cluster at level ℓ contain N out (ℓ − 1) UMSBs with k inputs and outputs. UMBSs are also full crossbar devices with k 2 × N out (ℓ − 1) switches at a level ℓ cluster. Since we have N k ℓ clusters at each level ℓ, and the total number of switches in Tree network can be calculated by equation 2.
The effectiveness of TSV optimizer was evaluated with use of 16 largest MCNC benchmark suit. During the optimization process each netlist is passed through 3D router based TSV optimizer to find the minimum number of TSVs required to implement the function with the 2 layer 3D Tree-based multilevel FPGA. The advantage in this type of optimization is to provide a realistic count of TSVs requirement for each 
3D Tree-Based Multilevel FPGA Thermal Analysis
Thermal analysis of FPGA architecture is essential as the power dissipation and leakage expected to increase as we scale the technology below 100nm node [9, 13] . The absence effective heat removal solutions may lead to performance and reliability degradation of the 3D chip and an effective thermal conduction among multiple layers of 3D Chip is essential to maintain the performance of the 2 layer 3D Tree-based Multilevel FPGA. The thermal model used in this work is similar to the model presented in [16] and it considers the temperature-dependent thermal conductivity of silicon. In this work, a first-order dependence of these parameters on temperatures around 300K is assumed. The 3D thermal model is modified to include the effect of effective thermal conductivity of thermal interface material (TIM) through which the vertical interconnections (TSVs) pass. The TIM layer is a thermally inactive layer, is used to attach layer 1 and 2 on top of each other. Nevertheless the thermal conductivity of TIM increases due to the cu TSVs from layer 1 to 2. The effective thermal conductivity of TIM and the active layer 2 is calculated based on the equation 3. TSV density is computed based on number of TSVs (optimized count), TSV dimensions, and pitch constraints [15] between TSVs of layer 1 and 2.
k ef f = k cu × (T SV Area ) + K th × (Level BreakP ointArea − T SV Area )
Another feature included in 3D thermal model is to place additional block of thermal TSVs at a specific hotspot location to re-distribute heat from a hotspot to coldspot. The thermal profile of layer 1 and 2 is presented in figure 9 . The maximum temperature of 2 layer 3D MFPAG chip without thermal TSVs (TTSV) was 371 o C and average temperature is 361 o C. With addition of 2% TTSVs at the hotspot location resulted in balancing the temperature of the chip. The maximum temperature measured is 341 o C and average temperature is 335 o C. The 3D experimental chip had only 2 active layers and 1 TIM layer, which explains the dramatic improvement in temperature, nevertheless the improvement varies with no of layers of the chip.
Conclusion
We have demonstrated that 3D Tree-based Multilevel FPGA provides significant advantages over 2D Mesh-based FPGA by improving the performance by 53% and reducing the TSV count by 29%. Also addressed the 2D physical design issues of Tree-based Multilevel interconnect architecture and demonstrated our alternative 3D physical design solutions. However the 3D integration increases the inter-layer temperature. Our 3D experimental setup indicate the the peak temperature of 2 later 3D chip increased to 371 o C. However the heat transfer solution by placing thermal TSVs blocks at specified locations helped the 3D FPGA to balance the temperature uniformly across multiple layers of the 3D chip.
