Abstract-In the era of post-device scaling, three-dimensional (3-D) integration is a promising solution to meet performance, power, and cost requirements in modern applications, such as IoT, high performance computing, and cyber-physical systems. A novel design automation flow, compatible with static timing analysis (STA), for exploring the timing and power of 3-D ICs is proposed. Among the different types of vertical interconnects, TSVs modeled as RC wires, are considered in this work. The flow enables design space exploration and optimization utilizing existing timing and power analysis tools, e.g. PrimeTime and PrimeTimePX. The design experience is similar to a 2-D design flow where the placement in multiple tiers is merely performed by an open-source 3-D placer. Application of the flow to different benchmark circuits shows that even with no optimization effort, a two tier 3-D stack produced by the flow achieves up to 14.6% average power reduction, 18.7% performance improvement, and 49% footprint reduction as compared to the 2-D design for a specific circuit 
I. INTRODUCTION
Three-dimensional integration has been introduced as a promising candidate to improve power, cost, and performance in the era of post-device scaling [1] , [2] . Various 3-D stacking technologies including through-silicon-via (TSV) [3] , monolithic stacking [4] , and AC coupling [5] , have been proposed. The 3-D technology discussed in this work focuses on TSVs, which allow for die stacking of several tiers with high density vertical channels.
For the past few years, several methods and techniques that address specific steps of the design process have been developed. Some examples of 3-D floorplanning and placement techniques include [6] , [7] , which decrease the wirelength of 3-D circuits. Integrating these methods, however, in a design flow is not a straightforward task and consequently there is not a complete and open-source design flow to support the design of TSV-based 3-D ICs. In addition to physical design techniques, plenty of TSV models have been proposed [8] , [9] due to the diverse fabrication processes and bonding styles for 3-D circuits. This manufacturing diversity makes design space exploration a primary requirement. The design of 3-D test circuits published to date has been based mainly on in-house flows not available for use across the research community. Furthermore, the existing flows emphasize the behaviour of inter-tier nets at the interfaces of circuits blocks, without providing timing and power information for the entire 3-D stack. This situation not only limits the potential for exploiting 3-D integration but also does not provide a reference 1 The flow will be released to allow for other groups to explore the effects of 3-D integration on their circuits. framework within which physical design tools for 3-D circuits can be evaluated.
To address these issues, a static timing analysis (STA) compatible design flow is introduced in this work. The flow begins with HDL synthesis producing a netlist of logic gates similar to a 2-D design flow. The following stages are different from the conventional 2-D flow. An academic 3-D physical design tool [7] is utilized to partition the circuit among the tiers generating a network of gates for each tier. Then, a conventional 2-D placer is used to place each tier. Several new steps are introduced after placement as the 2-D tools cannot simultaneously manage more than one tier. The flow enables the evaluation of timing and power for 3-D systems similar to a standard 2-D flow.
Succinctly, the proposed 3-D IC flow contributes by offering:
• A fully automated flow from HDL description to postlayout timing and power analysis based on a combination of 2-D tools and an open-source academic floorplanner.
• Investigation of different TSV technologies and bonding styles for 3-D circuits.
• Design space exploration without relying on wirelength or interconnect distribution models but on full timing and power information across the 3-D stack.
• A baseline platform where the performance of 3-D physical design tools can be evaluated by seamlessly performing STA and power analysis on the 3-D circuits produced by these tools. The rest of the paper is organized as follows. Section II discusses related work of design tools and flows for 3-D ICs. Section III presents the proposed STA compatible design flow for 3-D ICs. Section IV demonstrates design space exploration of timing and power by applying the proposed flow on several benchmark circuits. Section V concludes the work.
II. RELATED WORK
Several 3-D prototype circuits have been published in the past few years [10] , [11] , [12] . The design of these circuits combines existing 2-D tools and plenty of design effort to properly adapt these tools to ensure that more than one tier can be managed. In addition, these approaches were developed for specific circuits and a specific TSV technology and bonding styles. An early effort for a 3-D design flow was developed by NCSU [13] , providing a process design kit for multitier circuits based on Cadence tools [14] . However, important steps of the flow, such as 3-D placement were not supported. Recently, Lim has developed a complete flow for monolithic circuits [4] . Since monolithic circuits do not use TSVs, adjusting commercial 2-D tools is sufficient for developing a design flow for this type of 3-D circuits. This situation is due to the monolithic inter-tier vias, which can be treated as common metal contacts. Alternatively, TSV-based 3-D circuits cannot be managed by standard 2-D tools. In addition, to enable design exploration of this emerging technology, any design flow should support the integration of different TSV technologies and corresponding electrical models.
Several design methodologies and related tools for specific steps of the 3-D IC design process such as floorplanning [15] , and placement [6] have been developed. Few of these tools have, however, been released for public use. One of these 3-D IC tools has been developed by the EDA lab of UCLA [7] . step is the same as in a general 2-D flow. The steps of the flow decribed in Sections III-A to III-C are specific to 3-D ICs. Sections III-A to III-C introduce the integration of TSV models, the framework that includes the added stages for the 3-D flow, and timing and power analysis, respectively.
A. Models of TSVs
To provide accurate timing and power simulations of 3-D circuits, appropriate models of TSV are required. Compact TSV models consist of the layout-exchange-format (LEF) files of TSVs and the RC characteristics. The floorplanner 3-D Craft along with the technology library freePDK45 3-D [14] provide templates to create the TSV LEF files for the targeted TSV technology. There is no limitation on the utilized TSV model and any model that describes the electrical characteristic of the TSV (e.g. R, C, or L) can be integrated to the flow. TSV models with parameterized physical characteristics, for example, the TSV diameter can also be employed using a look-up table. In this work, the ITRS roadmap of vertical interconnects was used to produce a library of TSV technologies [16] supported by the flow. Moreover, closed-form expressions from [8] are utilized to determine the impedance characteristics of TSVs. In Table I , the resistance and the capacitance of the different TSVs are listed.
B. 3-D Framework
The following stage is the 3-D framework, which creates a physical design for the synthesized netlist. The framework includes the steps of 3-D floorplanning, placement and routing (PnR) for each tier, integration of the clock distribution network (CDN), and merging of the standard-parasitic-exchangeformat (SPEF) files for each tier.
With the synthesized netlist, the 3-D floorplanner generates a floorplan for each tier based on the selected TSV technology. The partition of the netlist among the tiers is performed by the 3D-Craft [7] . 3D-Craft generates a DEF file and a network of logic gates in a verilog file (.v) for each tier. The framework has as input only the location information of pins and TSVs from the DEF files. The reason is that 3D-Craft is not fully compatible with libraries provided by foundries often resulting in placement with some overlap among standard cells. Moreover, the framework aims to use a commercial placer and router for each tier to ensure that circuits are designed with mature EDA tools for the majority of the stages The following steps are placement and routing for each tier. These steps are similar to a conventional 2-D flow with some additional information such as pre-defined pins and pseudocells for TSVs. The framework utilizes Encounter [17] for placing and routing the physical design of each tier. The framework manages information for the connections between tiers, such as naming convention of pins, access points of each tier, and global connections across multiple tiers.
The subsequent steps merge the clock network and cells into the placed and routed tiers (indicated by the dashed dark grey rectangle in Fig. 1) . However, the current framework does not synthesize a CDN. A CDN from a 2-D design is imported with the CDN corresponding cells to each tier. Again, to generalize the use of the framework, CDNs from other academic tools [18] with a compatible standard file format can be imported.
A crucial step of the framework is to merge the SPEF files of each tier into a global SPEF file and perform a design equivalence check through the formality tool [19] across tiers. Appropriate scripts have been developed for merging the SPEF files. This task requires the resistance and capacitance of the TSVs and naming of pins of each tier, which are collected from the previous steps of the framework. The key idea of merging the SPEF files is that a net can either be routed within one tier or can span multiple tiers (a 3-D net). In the first case, the parasitic impedance of these nets is provided by the output of the PnR tool. For the 3-D nets, which traverse multiple tiers, the RC sections of a 3-D net within each tier need to be unified to one RC tree, modeling the electrical characteristics of the entire net. This linkage of the RC sections is implemented by removing the pseudo-TSV cells and adding the RC characteristics of the interconnects for the chosen TSV technology. The resulting RC network for an example net is depicted in Fig. 2 . The routing information for a portion of 3-D net in each tier is contained in the SPEF files and is used to create an undirected graph of the nodes of the net per tier.
Starting from the tier where the driver is placed, a breadthfirst search algorithm is employed to iteratively construct the RC tree of an inter-tier net until all the branches of this net within a tier are added to this tree. This process is repeated for each tier that the net spans, where the location of the TSV for each branch of the net is considered as the root of the RC tree in a tier. Note that multiple disconnected RC trees can be formed in each tier for a 3-D net, as these nets may have more than one TSV connections to an adjacent tier. This process terminates when all the sinks of a net are added to the tree and all of the TSV cells are replaced by the appropriate RC section. At the same time, an equivalence check is also performed with Formality to ensure that after the several steps of the 3-D flow, the resulting netlist is the same as the initially synthesized network.
C. STA and Power
The purpose of the proposed flow is to provide an accurate exploration of 3-D circuits. Our 3-D flow enables the usage of commercial tools such as Primetime and PrimetimePX [20] for evaluating the speed and power of 3-D ICs. Other timing and power analysis tools can also be used since the SPEF file format is compatible to many timing analysis tools.
The built-in functionalities of timing and power analysis tools provide deeper insight as to how 3-D integration affects the performance and power of ICs. As full timing analysis is supported by our flow for the entire 3-D stack, rather than the delay of the longest (inter-tier) path, setup and hold violations can be observed and corrected. In addition, the paths can be grouped into specific categories depending on their functionality. This grouping enables the monitoring of timing bottlenecks in a circuit and how these are affected by 3-D integration. Moreover, as the proposed flow supports STA for 3-D ICs, back-annotation circuit simulation with specific application testbenches is also available. This situation enables both average and time-based power analysis of 3-D circuits with workloads from different applications.
IV. RESULTS AND DISCUSSION
This section presents timing and power results for several benchmark circuits by using a typical 2-D flow and the proposed STA compatible 3-D flow. Table II lists the benchmark circuits and the respective number of cells. The results presented in this section are based on these benchmark circuits. Table III [21] 45nm G* 317 B19 [21] 45nm G* 66,117 B20 [21] 45nm G* 12,110 AVA [22] 45nm G* 12,275 LDPC [23] 65nm LP* 67,003
A. Physical Characteristics of 2-D and 3-D designs
* TSMC technologies column of the table. The 3-D flow assumes that there are two tiers with Via-First TSVs as vertical channels. Both flows are run by using minimum optimization effort for each circuit to guarantee fairness. This choice is due to the lack of optimization objectives of the 3D-Craft other than wirelength whereas the PnR step in each tier by the 2-D tools is supplemented by optimization techniques with several design objectives. In general, the total area of a circuit resulting from the 3-D flow can be larger than a planar implementation due to the area of the TSVs as also pointed out in [24] . For some benchmark circuits, the 3-D flow decreases the wirelength of the design. In others, the wirelength slightly increases due to the TSVs. For example, LDPC is a wiredominant circuit, thus several long paths are substituted by TSVs resulting in significant wirelength reduction of 26.6%. Alternatively, B19 is a circuit composed of four modules without long connections, resulting in slight wirelength reduction by substituting the nets within a module with TSVs. As the timing and power information of all paths is reported by the flow, specific design parameters of the circuit can be adapted to meet the performance and power requirements. Some examples of such design space exploration are presented in the next section.
B. Timing Analysis of 3-D Circuits
Most previous works of evaluating 3-D circuits report the longest delay [25] as a performance metric or utilize wirelength as an optimization objective. However, this approach offers limited insight of how TSVs affect the timing of 3-D ICs. By using the proposed flow, detailed timing information is obtained. Table IV lists the supported clock period of the benchmark circuits implemented respectively in one (2-D) and two tiers (3-D). The circuit B19 exhibits an interesting timing behavior, where although the wirelegth of the 3-D design is smaller than the 2-D design, the clock frequency is slightly lower. This result indicates that purely wirelength analysis is not sufficiently accurate for timing analysis or performance optimization. To further enhance the timing of a 3-D circuit, considering only the path with the longest delay in the circuit is not practical as it may represent a false-path of the circuit. The proposed 3-D flow is fully compatible to STA analysis, such that users can analyze and improve the timing of the design similarly to a conventional 2-D flow. Built-in functions of PrimeTime provide in-depth understanding of the effect of TSVs on the delay of internal paths. To facilitate the timing analysis, a simple grouping of the paths among input to register (in2reg), register to register (reg2reg), and register to output (reg2out) paths is set during the configuration of PrimeTime. Fig. 3 shows a breakdown of path delays for the benchmark circuits. The following observations can be made for each circuit.
• For B04, 3-D integration affects reg2reg paths, but in2reg and reg2out exhibit a negligible decrease in delay since reg2reg dominate the delay of the design.
• For B19 and B20, 3-D integration offers only marginal improvements. Recalling that the wirelength of the two circuits is still reduced in 3-D, this observation suggests that the performance of these circuits is dominated by the logic gate delay and therefore the insertion of TSVs degrades performance.
• AVA and LDPC demonstrate an improvement in the delay of all groups of paths in 3-D, which indicates that wiredominant designs exhibit better performance in 3-D. To better understand the timing behaviour of 3-D circuits, monitoring the delay of all inter-tier paths and evaluating the delay overhead of TSVs are supported by the flow. Figs. 4(a) and 4(b) depict the timing slack histogram of inter-tier nets for circuits B04 and LDPC, respectively. B04 shows that the parasitic capacitance of TSVs and long wires incurred by routing between distant TSVs, increase the delay of the paths as compared with the 2-D design of this circuit. Therefore, changing the bonding style is an alternative solution to improve the delay. Moreover, timing violations for specific paths are difficult to be determined by using earlier 3-D flows based on wirelength models, as only the delay of long paths is considered. For LDPC, the benefits of 3-D integration are depicted in Fig. 4(b) where the timing slack of inter-tier paths greatly increases as long wires are substituted with TSVs.
C. Power Analysis of 3-D Circuits
Power analysis in 3-D ICs with our flow is demonstrated in this subsection. The proposed flow extends previous analysis from first-order models that combine power information of logic gates and wirelength [25] to advanced 2-D commercial tools, where power is analyzed in both average and time-based mode. This extension is implemented by merging the extracted RC of each tier within a 3-D circuit and adding the appropriate RC of the vertical interconnects. For a fair comparison between the power of 2-D and 3-D circuits, the same clock frequency is used in both cases for all explored circuits.
For the average power analysis, the toggle rate of all nets and cells within each circuit is set to 20%. The average power consumed by the circuits and a breakdown into the different power components are listed in Table V . Power in B04 and B20 is increased with 3-D integration by 67% and 34%, respectively, which may appear as counterintuitive. For B04, this increase is due to the great effect of TSVs on the area and total wirelength of the circuit (see Table III ). The increase in wirelength leads to larger interconnect capacitance, thereby increasing power. With the proposed flow, different bonding styles can be explored to mitigate this increase in power. For example, with face-to-face (F2F) bonding no TSVs are employed and thus no area-overhead exists. On the other hand, for circuit B20, the total wirelength is reduced in 3-D but the power increases. This behaviour demonstrates the non-negligible effect of TSV capacitance to the total power. Different TSV technologies can be utilized for this circuit to reduce the capacitance of the vertical interconnects. The circuits B19, AVA, and LDPC exhibit less power in 3-D as compared to 2-D designs by 5%, 3.5%, and 14.6%, respectively. These power gains can be further enhanced by removing and/or downsizing cells. Our flow facilitates this optimization step. By combining the power and performance analysis discussed in Section IV-B, cells, which are powerintensive and non-timing critical and therefore candidates for downsizing, can be identified.
A distinct advantage of this flow is that the power of 3-D circuits is monitored on application-specific benchmarks by utilizing time-based power analysis. To the best of our knowledge, this is the first work which demonstrates a flow compatible with this mode, while considering the electrical/physical characteristics of the vertical interconnects. The power consumed by the AVA and LDPC circuits on real-time tasks is listed in Table VI. The total power in time-based mode is less than in average mode (see Table III ), as the switching of nets and cells is based on application-specific testbenches rather than switching all of the nets and cells as in average mode. Consequently, the power between these two modes can differ significantly. For example, in circuit AVA, according to the average power analysis, three-dimensional integration results in power reduction. However, the power from executing This case study demonstrates a 10% reduction in power for the 3-D circuit as compared to the 2-D circuit. This decrease is due to the fact that the LDPC circuit is wire-dominant and this testbench toggles a large portion of nets. In addition, the peak power is reduced by 11.6%. Considering the importance of thermal and reliability issues in 3-D ICs due to the increased power densities, accurate power analysis is critical to limit expensive overdesign. Our flow addresses this requirement by enabling multi-mode power analysis.
V. CONCLUSION
In this work, a 3-D flow is proposed which enables timing and power analysis by primarily utilizing commercial tools such as Primetime and PrimetimePX. In order to achieve this, a framework is implemented which offers design exploration of 3-D ICs while considering different bonding styles and TSV technologies for multi-tier implementations. A fully automated 3-D flow integrating both 3-D academic and 2-D commercial tools is presented where the built-in functionalities of the 2-D tools are enabled, providing significant insight as to how 3-D integration affects circuit performance and power. By using this flow to design 3-D circuits, results on specific benchmark circuits demonstrate that circuit performance is improved by 18.7% as compared to the 2-D counterpart. Moreover, 3-D circuits exhibit up to 14.6% less average power than in 2-D. To the best of our knowledge, this is the first work, which demonstrates a flow compatible with time-based power analysis on application-specific testbenches, while considering the electrical/physical characteristics of vertical interconnects. Results show a decrease of 10% in power when a realtime task is executed on a 3-D circuit. In addition, 11.6% reduction of peak power is observed. Future work includes 3-D design optimizations, such as resizing cells, and integration of synthesized 3-D clock networks.
