The concept of 3D chip stacks has been advocated by both industry and academia for many years, and hailed as one of the most promising approaches to meet ever-increasing demands for performance, functionality and power consumption going forward. However, a multitude of challenges has thus far obstructed large-scale transition from "classical" 2D chips to stacked 3D chips. We survey major design challenges for 3D chip stacks with particular focus on their implications for physical design. We also derive requirements for advances in design automation, such as the need for a unified workflow. Finally, we outline current promising solutions as well as areas needing further research and development.
INTRODUCTION
3D chip stacks offer the potential to meet current and future requirements of electronic circuits, such as for performance, functionality, and power consumption. Two design paradigms, namely "More Moore" (shrinking device nodes) and "More-than-Moore" (heterogeneous integration), are advocating 3D chip stacks in particular [1] (Fig. 1) . Despite significant projected benefits over 2D chips, the overall adoption of 3D stacks still lags behind expectations. What are reasons for the current lack of industrial adoption, and what are specific implications for physical design?
Successful adoption of 3D chip stacks requires addressing different classical and novel challenges which simultaneously affect the manufacturing processes, design practices and physical design tools. If not properly addressed, the fairly complex challenges may render 3D chip stacks commercially unviable, despite recent advances in manufacturing yield and cost reduction. Physical design automation partially meets these challenges at present, but further efforts are needed to fully exploit the potential of 3D chip stacks and to facilitate their commercial breakthrough.
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).
ISPD'16 April 03-06, 2016, Santa Rosa, CA, USA The well-known "More Moore" trend is slowly but surely reaching its limits for CMOS technology. "More than Moore", i.e., exploiting heterogeneous integration, has been identified as novel, important direction. 3D chip stacks offer the potential to meet both trends at the same time.
In this paper, we will first review the diversity of stacking options for 3D chips and formulate related high-level design challenges. Then, along with further needs for design automation, we discuss the following classical challenges and related promising solutions:
• Power delivery and thermal management; • Clock delivery; • Partitioning and floorplanning; • Placement; • Routability estimation and routing; and • Testing.
It is important to understand that these challenges are much more complex for 3D chip stacking, and that existing design tools for 2D chips cannot be directly applied.
We also elaborate on novel challenges that are currently (at least partially) addressed, but which require further efforts, such as:
• Pathfinding and design-space exploration;
• Chip-package co-design; and • Multi-physics simulation and verification.
Our paper concludes with promising trends for physical design of 3D stacks, which meet the identified open challenges, and whose application can ease the adoption of 3D stacks.
Connectivity and Integration Density

Need for Advances in Design Automation
Package Stacking
Interposer Stack TSV-based 3D IC Monolithic 3D IC Figure 2 : Implementation options for 3D chip stacks. Originating with package stacking, 3D integration has evolved through interposer stacks (also known as "2.5D integration") towards TSV-based and monolithic 3D ICs. Advances in design automation are typically called for the latter two options but are also required for modern, large-scale interposer stacks.
scope of application, with distinctive benefits and drawbacks as well as requirements for design and manufacturing processes [1, 2] . In the following, related key aspects are reviewed and design challenges are outlined.
Options for 3D Chip Stacking
Package stacking, based on wire bonding and/or flip-chip bonding, has been widely adopted and thus is not reviewed here.
TSV-based 3D ICs have initially attracted the most attention and efforts; many academic/industrial prototypes/products nowadays are based on TSV technology [3] [4] [5] [6] [7] . TSVs are metal plugs (copper or tungsten) that penetrate whole dies-which are stacked and bonded-to interconnect those dies. Different stacking scenarios are applicable, such as face-to-back die-to-wafer stacking [2] .
Depending on the TSV type, different challenges arise: via-first and via-middle TSVs obstruct the device layer and result in placement obstacles; via-last TSVs obstruct the device layer and the metal layers, resulting in placement and routing obstacles. Due to their relatively large size and intrusive character, TSVs cannot be deployed excessively and/or arbitrarily but have to be optimized in count and arrangement [8, 9] . Note that TSVs do not scale at the same rate as transistors, thus the TSV-to-cell mismatch will remain for future nodes and may even increase [10] .
Interposer stacks are a cost-efficient but only seemingly straightforward option for 2.5D or 3D integration [11] [12] [13] . Mainly predesigned dies are stacked in lateral and/or vertical fashion on interposer where redistribution layers and TSVs realize the connectivity. Interposer are typically fabricated as passive carriers, but may also embed components like decaps or even "glue logic" [11] . The concept of interposer stacks supports various applications and is thus widely acknowledged in the industry. Notable products include the Xilinx Virtex-7 FPGA family [14] and a GLOBALFOUNDRIES prototype with two ARM Cortex-A9 chips [15] .
The design of interposer stacks is obstructed by the lack of dedicated design tools [12] . For example, routing highly-interconnected interposer and the related design of large-scale NoCs requires further research [16] . Other challenges such as simulation and verification of signal integrity across an interposer stack have been recently addressed [17] , but require further integration efforts [18] .
Monolithic 3D ICs have recently gained more attention [19] , thanks to advances of the processing technology. The key feature of monolithic integration is that active layers are sequentially manufactured into one stack rather than bonded using separate dies. Due to their relatively small vias, monolithic 3D ICs enable finegrain transistor-level integration, which is especially sought-after for high-density logic integration [19] .
As for design challenges, routing becomes notably more complex due to high congestion along with increased delays [19] . Novel approaches covering partitioning, placement and routing for monolithic integration are required; such a holistic approach has recently been proposed in [20] . Furthermore, thermal properties differ from "classical" TSV-based 3D ICs: on the one hand, the small vias are not as effective as TSVs for conducting heat towards the heatsink; on the other hand, monolithic stacks do not exhibit "thermal barriers" in form of bonding layers. Thus, the thermal coupling within monolithic stacks is larger and more uniform than for TSV-based 3D ICs. This, in turn, calls for dedicated thermal management [21] .
High-Level Design Challenges
Exploring and selecting the most suitable chip-stacking options for a particular design is much more complex than handling the decisions required for classical 2D-IC design. Given an abstract design description in early planning phases, one has to consider the following challenges, among others:
• How can intellectual property (IP) blocks, pre-designed modules or even whole dies be effectively reused in the stack? Most of these challenges are interacting, and any respective decision does impact the overall performance, reliability and cost of the 3D chip stack. It is apparent that solving such a convoluted set of challenges requires sophisticated design know-how, EDA capabilities and well-defined project structures. Specifically, methodologies and tools are required that enable (i) an accurate (yet fast) exploration of the design space and (ii) rapid prototyping/evaluation of different stacking configurations (see Sec. 4).
CLASSICAL DESIGN CHALLENGES:
AGGRAVATED BUT SOLVABLE Classical challenges such as power delivery, floorplanning and placement become much more complex when designing 3D chip stacks instead of 2D chips. Existing methodologies and tools for 2D-chip design cannot be directly applied and are only to some degree extendible. During the recent years, however, many efforts in academia and industry have been undertaken which render those classical challenges still aggravated but solvable. In the following, key challenges and promising solutions are discussed.
Power Delivery and Thermal Management
High integration density-a key benefit of 3D stacking-notably complicates both power delivery and thermal management. A stack of d dies/layers potentially exhibits a d-fold power density compared to 2D chips. This implies that approximately d× power has to be delivered through the stack, without excessive static and dynamic voltage drop, despite the notable parasitics of 3D powerdistribution networks (PDNs) [22] . At the same time, approximately d× of heat has to be removed from the stack. Solutions for power delivery and thermal management are discussed next. Arrangement of TSVs: For power/ground (PG) TSVs, a distributed, irregular topology is superior to both regularly placed (single) TSVs and clustered TSVs in order to limit power noise [22, 23] . Proper planning of TSVs may also serve power delivery and thermal management at the same time. For example, aligned PG-TSV stacks may simultaneously increase heat dissipation and reduce power noise [23, 24] . Regular signal TSVs can be similarly arranged into (possibly aligned) TSV cluster for improved heat conduction without excessive routing overheads [25] [26] [27] .
Design of PDN and PG grids: A promising (yet expansive) PDN architecture is the "multi-story power delivery" [28] , where several power supplies (e.g., one per die) distribute the load more effectively compared to a single power supply. As for decaps, their allocation has to be carefully investigated to manage their complex impact on the power-noise distribution in 3D chip stacks [29] .
The design of PG grids should generally account for the overall 3D PDN, not only for the dies each grid is attached to [24, 30] .
Low-power design: Consumed power and, in turn, generated heat can be reduced by deployment of low-power circuitry. In this context, sub-/near-threshold circuitry for 3D chip stacks has gained attention; detailed studies and prototypes are available [31, 32] . Arrangement of multiple voltages domains in order to reduce power and temperature has been recently proposed as well [33] .
Thermal-aware design: Spreading high-power modules in an orchestrated manner, e.g., during thermal-aware floorplanning [27, [34] [35] [36] or placement [37] , is a simple but effective measure. Along with high temperature comes a strong impact on transistor behaviour; thermal-aware design is thus also important for reliability.
Technological approaches: In order to limit package impedance and external current supply, one can bring voltage converters closer to the logic by "sandwiching" dedicated converter dies into the 3D stack [38] . To reduce power noise, classical CMOS decaps and/or metal-insulator-metal (MIM) decaps can be deployed within each die [39] or even in dedicated "decap dies" as well [30] . As for thermal paths, internal paths (i.e., paths across and within dies) as well as external paths (i.e., paths to the heatsink and the package) can be improved by introducing micro-fluidic channels [40] .
Clock Delivery
The key challenge when designing 3D clock networks is to ensure reliable, uniform and high-speed delivery of clock signals to all sinks which are spread across different dies. Besides thermal management, this is one of the main challenges for "real logicon-logic stacking" and has many implications for physical design. These concerns, along with promising solutions, are discussed next.
Redundancy and arrangement of TSVs: Since the advent of TSV technology, their reliability has been a major concern-failure of any clock or signal TSV would render the whole stack defective. Thus, different redundancy architectures have been proposed in general [41, 42] and for clock networks in particular [43, 44] .
The optimized assignment of clock signals to TSV arrays-which are generally more practical than separate TSVs [8] -has also been proposed for design of reliable and low-power clock networks [45] . Another flow [46] accounts for PG-and signal-TSV obstacles during placement of clock TSVs, but neglects their co-optimization while arranging those different types of TSVs.
Holistic and variation-aware design: The design of clock networks is highly coupled to most physical-design stages like placement, routing and timing optimization. All stages should account for the 3D clock network's properties, e.g., arrangement of clock TSVs and clock-trees on individual dies. Capabilities for tuning the network and to provide feedback to other stages are also needed.
Another requirement for clock delivery is the analysis of timing variations and parameter variations, which are a general concern for 3D chip stacks with their inter-die variations. These variations can be mitigated by implementation of multiple clock domains [47] . Note that variations are exacerbated in interposer stacks and TSVbased 3D ICs, mainly due to thermo-mechanical stress [48] .
Clock-network synthesis: Minimizing the clock skew is a key objective for clock-network synthesis. It is considered by most previous studies such as [43] [44] [45] [49] [50] [51] . Variation-aware timing simulation [49, 52] , statistical clock-skew models [53] or post-silicon techniques such as body biasing [50] have been applied as well.
As for network topologies, it has been shown that H-trees are superior to classical grid networks in terms of power and mean skew but may suffer from larger maximum skew due to variations [52] . Other proven approaches such as the method of means and medians (MMM) and deferred-merge embedding (DME) have been successfully extended as well [51] . The related approach in [51] allows to effectively trade-off TSV count and power consumption, to manage slew variations and to insert buffers, all within low runtime.
Testability and yield management: Stacking only known-good dies is essential in order to manage the overall yield of 3D chips. This stacking paradigm requires the clock networks to be tailored for pre-bond testability which can be achieved by encapsulating separate, fully functional clock networks into individual dies. 1 These separate networks should be subsequently linked by multiple TSVs to improve wirelength and clocking performance [43] .
Partitioning and Floorplanning
Traditionally, the main objectives for partitioning are to (i) minimize the cuts/connections between partitions and to (ii) reduce the overall design complexity by means of divide-and-conquer. For 3D stacks, however, the objective (i) has to be carefully adapted. For example, min-cut partitioning is not practical since it cannot account for the stacking order of partitioned layers/dies which, in turn, impacts the overall number of vertical interconnects [56] . Furthermore, depending on other, more pressing criteria such as thermal management, the minimization of cuts/connections may not be the primary concern.
In general, partitioning tools need to become more technologyaware in order to properly account for different stacking scenarios and their implications. For example, monolithic stacks are vertically interconnected by very small vias which can (and should) be employed in large numbers, to enable high-performance and lowpower transistor-level integration [19] . Nevertheless, related partitioning must account for routing congestion [20] . For TSV-based 3D ICs, it depends on the particular dies' stacking interface whether interconnects should be limited: a face-to-back (or back-to-back) interface requires TSVs which may need to be restricted, while a face-to-face interface allows for a large number of interconnects, comparable to monolithic stacks. Practical approaches for 3D partitioning must also address other design concerns such as powerdensity constraints [57] or performance-aware logic splitting [6] .
Floorplanning determines the module arrangement within (nowadays mainly pre-fixed) die outlines. For 3D chip stacks, the module arrangement is interdependent across the whole stack, rendering 3D floorplanning much more complex than 2D floorplanning. 2 Existing floorplanning methodologies already address key challenges such as thermal management [27, [34] [35] [36] , co-arrangement of modules and TSVs [59] or planning of system-level buses [27, 36, 60] . Still, 3D floorplanning is a highly technology-dependent and iterative process-fast, accurate and configurable design evaluation is currently targeted but still to be enhanced [61, 62] . So far, only few tools offer high-level, multi-objective exploration, which is needed for microarchitecture-focused 3D design [55, 63] . Furthermore, splitting and folding modules across multiple dies-to potentially reduce power and wirelength even more-has to be also explored during floorplanning (and during placement, see below). Related tools/flows are available [54, 55, 64, 65] , but they tend to neglect properties of stacking interfaces and/or apply stochastic optimization where results are largely irreproducible and unsteady.
Placement
As with floorplanning, placement for 3D chips becomes much more challenging. The main objectives are (i) wirelength minimization, (ii) thermal management, and (iii) vertical-interconnectaware placement or co-placement of cells and vertical interconnects (applies for TSV-based 3D ICs). Placement may further account for specific properties of 3D stacks, e.g., thermo-mechanical stress induced by TSVs (which can even be exploited for timing [66] ). There are three categories for placement approaches, discussed next.
Folding-based placement reuses 2D placement solutions/tools and derives a 3D placement by folding modules and applying local refinements [67] . The main benefit is that proven 2D placers can be applied; the disadvantage is the limited consideration of implications arising from die stacking, e.g., increased routing congestion. Recall that folding has been recently advocated for 3D floorplanning as well; the potential of orchestrated folding for floorplanning and placement, however, has not been addressed yet.
Analytic placement is based on numerical analysis of non-linear equations which encode objectives such as minimization of wirelength and cell overlap. In the context of 3D chips, analytic placement is either tailored as 2.5D or 3D placement: the 2.5D approach encodes cell-die assignments as fixed variables, but still accounts for any interdependencies of cells across different dies [37, 66, [68] [69] [70] ; the 3D approach sets cell-die assignments as flexible variables to be optimized as part of the overall placement solution [37, 71, 72] .
Analytic placement suggests itself for placement of 3D chipscells are distributed optimally ("only" with respect to the tailored equations) throughout the 2.5D/3D domain. However, it has some drawbacks: its complexity calls for techniques such as clustering and coarsening [72] which may lower the final design quality; some formulations (e.g., Huber functions) offer only local smoothness which requires further efforts for global smoothing across the third dimension [37] ; any issue/objective not readily accounted within the placement equations (such as thermo-mechanical stress induced by TSVs) may be notably exacerbated and difficult to post-process. Thus, analytic placement is usually not applied as stand-alone but rather as integrated stage (for global placement), as described next.
Hierarchical placement typically applies three stages: global placement, legalization and detailed placement. For 3D chip stacks, most hierarchical placers invoke partitioning and/or floorplanning before actual placement [37, 66, [68] [69] [70] 73] . This way, related efforts for module arrangement are acknowledged and cells are preassigned to particular dies (but not necessarily fixed to them). This, in turn, helps to improve the convergence of 3D placement. Note that hierarchical placement is not restrictive in terms of applicable placement techniques. For example, Kim et al. [69] apply forcedirected (2.5D/3D) global placement, Goplen et al. [73] promote 3D recursive-bisection placement, and Lu et al. [71] propose (globally smooth) electrostatics-based 3D placement.
For placement of TSV-based 3D ICs, different thermal-and TSVaware 3D placers have been proposed [68, 70, 73] . In this context, the co-placement of cells and TSVs has proven beneficial for thermal management [68, 70] as well as for reducing wirelength [69] and critical delays [66] . In order to avoid potentially opposing and/or unsteady TSV arrangement across partitioning, floorplanning and placement, a closer integration of those stages with dedicated feedback loops to orchestrate TSV (co-)placement is required.
Routability Estimation and Routing
Routability estimation provides fast but limited local and global predictions of routing supply and demand, without invoking actual routing. This is particularly relevant when designing 3D stacks in order to fully exploit their potential for massive interconnectivity and to enable high-performance and low-power 3D chips. The key challenge for both routability estimation and routing of 3D stacks arises from the 3D arrangement of nets, i.e., nets are spanning more than one die, which requires to account for vertical interconnects. Related EDA techniques are reviewed next.
Analytic estimation is mainly based on simple geometric and/or global estimates and, thus, not effective for controlling physicaldesign stages, e.g., to avoid local routing congestion. Nevertheless, analytic estimates are helpful for fast design evaluation; the well-known half-perimeter wirelength (HPWL) is such an estimate. However, note that the HPWL should be determined step-wise in 3D chips to achieve reasonable accuracy; all partial nets (i.e., net segments encapsulated in separate dies/layers) are to be independently estimated and subsequently summed up [8] . Kim et al. [74] proposed an interconnect model which is TSV-aware and emulates optimal buffer insertion. Thus, this model helps to efficiently evaluate wirelength, delay and power consumption of 3D chips.
Heuristic estimation applies computationally-inexpensive probabilistic models to capture local routing demand or congestion. Models for 3D chips have to account for vertical interconnects, along with their capacities, distribution and potential blockages. Fischbach et al. [75] extended the well-known concept of routing graphs which may serve as generic data structure or "backbone" for any model of choice. Depending on the desired accuracy-runtime trade-off, their extended graph encodes each die's multiple metal layers as one merged graph layer or as separate layers. In any case, the stacking and interconnects technology defines the graph's vertical capacities. Panth et al. [20] use such a 3D routing graph along with simplified construction of 3D rectilinear Steiner trees for efficient routability estimation in monolithic 3D chips.
Global routing for 3D chips has already been implicitly addressed by conventional methodologies due to the routing layers' "3D arrangement" in 2D chips. Nevertheless, dedicated global routers have been proposed for TSV-based 3D ICs to account for routing obstacles (induced by TSVs) and to facilitate thermal management [26, [76] [77] [78] [79] . Most of these routers construct Steiner or minimum spanning trees, use thermal-RC-network analysis, and (re)arrange signal TSVs, possibly along with dummy thermal TSVs and/or thermal wires. In general, accounting for signal, PG, clock and thermal TSVs along with their networks-all competing for placement and routing area-poses a major design challenge [80] . Thus, global routing (along with partitioning, floorplanning and placement) has to implement an orchestrated (re)arrangement of TSVs to enable design closure.
As for global routing of interposer stacks, early work by Minz et al. [81] focused on delay-, congestion-and crosstalk-aware redistribution of pins and nets to properly access and utilize the interposer's routing channels. In order to reduce TSV usage, Wang et al. [82] developed a 3D NoC with shared vertical links, and Foroutan et al. [83] optimized the assignment of such shared links in "vertically-partially-connected 3D NoCs". Loh et al. [16] outlined open challenges for large-scale and heterogeneous 3D NoCs, such as the synchronization of routers across sub-networks with potentially different topologies.
Detailed routing is not fundamentally different for 3D stacks; its task remains routing the net segments within metal layers (of individual dies). Assuming that multiple types of interconnects and obstacles may be modeled, available detailed routers can be reused.
Testing
As indicated when discussing testability of clock networks (see Sec. 3.2), stacking known-good dies is essential for 3D chips. Further testing approaches of 3D chip stacks are described next.
Fault models and at-speed testing has been proposed for both interposer stacks [84] [85] [86] and TSV-based 3D ICs [43, 87, 88] . These studies are typically focused on specific scenarios. For example, Taouil et al. [87] developed a methodology tailored for testing of memory-on-logic stacks and Deutsch et al. [88] applied thermo-mechanical-stress-aware generation of test patterns for TSV-based 3D ICs. Offering a more holistic approach, Wang et al. [84] enabled unified testing of wires, microbumps and TSVs for interposer stacks, with low overhead and compliance to the IEEE 1149.1 standard. Agrawal et al. [89] proposed an efficient heuristic for testflow selection which can be applied for different configurations of 3D chip stacks; it is notably more efficient and qualitatively competitive to an optimal approach when stacking up to three dies.
Since Design-for-Test (DfT) architectures require access to all modules within 3D chip stacks, such architectures should be based on well-defined components and testing interfaces (see Sec. 5).
Needs for Advances in Design Automation
We discussed how classical challenges become much more complex when designing 3D chip stacks, and we reviewed promising and available solutions. There is nevertheless a need to further improve "classical" design automation, with particular focus on:
• Synchronized arrangement of all types of TSVs during floorplanning, placement, design of clock networks and routing; • Perpetual consideration of variations-mainly induced by chip stacking in general and/or by thermo-mechanical stress around TSVs in particular; • Technology-and stacking-aware design planning and evaluation, in particular for partitioning and floorplanning; • Measures for 3D design-space exploration, e.g., by exploring options for module folding during partitioning and/or floorplanning, or by system-level arrangement of components and dies within the 3D stack; and • System-level design of global interconnects. This requires to optimize buses and vertical interconnects (mainly relevant for TSVs) along with design objectives/constraints for, e.g., bandwidth, power consumption, delay and signal integrity.
Note that some of these needs (e.g., 3D design-space exploration) have recently been addressed to some degree by high-level design automation (see Sec. 4).
As outlined in Fig. 3 , EDA tools should be unified into a workflow in order to meet (3D-specific) classical as well as novel design challenges. This calls for, among others, extensive code modularization, the design/use of internal/external APIs and for file formats tailored for exchanging 3D-chip designs (see Sec. 5 for the latter).
NOVEL DESIGN CHALLENGES AND EMERGING SOLUTIONS
With the aforementioned classical design challenges of 3D chip stacks being addressed at least to some degree by today's tools, the focus for EDA R&D is shifting towards more high-and systemlevel challenges such as simulation of multi-physics effects. These challenges are especially pressing for heterogeneous stacks, comprising components such as MEMS or photonics along with "regular" digital modules. We outline promising approaches and trends suitable to address such novel challenges next.
Pathfinding and Design Exploration
Due to the complex and diversified nature of 3D chip stacks, the automation of their high-level design is extremely challenging. Novel tools are required to explore (at an early stage) the complex impact arising from any decision/step taken in the design flow on criteria such as functionality and power consumption. Pathfinding is addressing this need by stepwise passing down high-level descriptions within customized EDA suites. Feedback loops are essential for this purpose, to pass down specifications (e.g., physical constraints) as well as to annotate findings back (e.g., simulation results). Pathfinding flows should cover the following three stages:
1. System-Level Design: A high-level description (e.g., in SystemC) is generated and/or iteratively adapted. Technology and stacking configurations have to be already considered, e.g., stacking impacts the options for high-level partitioning. 2. Component Design: Components are abstract design modules such as RTL modules-they serve as "bridging" parts between system design and physical design. Components must be modularized in order to encapsulate the high-level design and to enable design reuse for different 3D stacks. 3. Physical-Design Prototyping: The components are then fed to (customized) physical-design stages. In contrast to classical physical design, the focus lies initially on fast design exploration, i.e., to obtain "coarse layouts". The components are also to be annotated (e.g., with their power consumption) to facilitate guidance of subsequent pathfinding iterations.
Pathfinding for 3D chip stacks has been initially proposed in 2009 by Milojevic et al. [90] . They applied automated synthesis of RTL modules and physical-design prototyping using feedback loops. More recently, Martin et al. [18] proposed to encapsulate TSV arrays using parameterized multi-port modules. They do so in order to feed 3D chip stacks containing such and other components to electromagnetic solvers for analysis of crosstalk across the whole stack. Yazdani and Park [91] automated and optimized the arrangement of buffers, Cu pillars and package bumps for Wide I/O memory integration [92] in interposer stacks. Priyadarshi et al. [93] proposed transaction-level-based and thermal-aware pathfinding, complementing previous RTL-based approaches.
Besides these mainly academic efforts, commercial tool chains have been becoming available as well, for example Sphinx 3D Path Finder (used in [18] ) or Mentor Graphics Xpedition Path Finder Suite [94] .
Chip-Package Co-Design
The diverse nature of 3D chip stacking, along with its manifold technology configurations, calls for co-design of chip and package. Ideally, the whole 3D stack should be designed within one EDA suite. However, this is not necessarily practical due to several reasons such as:
1. The design of complex 3D chip stacks is typically the effort of a large team, with several engineers (potentially working in separate companies and/or different locations) being responsible for different parts of the whole stack. 2. Existing design know-how and EDA tools are mainly focused on individual parts or technologies, not whole 3D stacks. 3. The multitude of interfaces in 3D stacks require-both in terms of physical and functional design-measures for design orchestration and/or a unified data base; this is difficult to establish for heterogeneous design environments. As illustrated in Fig. 3 , it is more practical to introduce and agree on a common workflow, possibly with dedicated points for data handover and feedback iterations between different design parties.
An elaborate example for co-designing a mixed-signal 3D stack is given by Cederström in [95] . Multiple EDA tools have been leveraged, applied by different engineers, linked via custom scripts, and configured to exchange design data. A representation of the whole 3D stack, usable in such a heterogeneous design environment, is called for (see Sec. 5 for a related, novel solution). It has been stressed that TSVs are physically part of the interposer but functionally part of the transceivers (in the ICs). This required multiple iterations for design closure, along with notable manual effort. Cederström hence proposes that the top metal layers, pins and bumps of the ICs are managed by packaging engineers. This way, the iterations for designing the package and the transceiver-TSV links are eased and IC designers are involved more effectively.
Analysing TSVs more efficiently for their impact on whole 3D chip stacks has been successfully pursued for some time; the related modeling and simulation techniques (see "medium" design phases, Sec. 4.3) can also be extended for chip-package co-design [96] .
Pathfinding tools may be helpful for co-design as well; for example, Mentor Graphics Xpedition Path Finder [94] is tailored for chip-package design with features such as a "virtual die model" (to encapsulate details of IC design) or the generation of pin arrays.
Multi-Physics Simulation and Verification
Simulation is an integral part of design automation while verification is traditionally separated from the actual physical design. Because of strong coupling of the different physical domainsespecially but not exclusively occurring in complex, heterogeneous stacks-simulation and verification for 3D chip stacks should account for multi-physics effects (Fig. 4) . Such a framework has been proposed by Schneider et al. [97] . They advocate tailoring simulation techniques for each design level; they further promote the (re)use of parameterizable models.
Next, we review measures for simulation and verification of 3D chip stacks across different design levels.
At transistor level, i.e., the lowest level of abstraction, very detailed simulation models are required-they must capture all active and passive devices along with their physical properties. For the thermal and mechanical domains, models are implemented as finegrain meshes which are fed to finite element/difference analysis, e.g., using tools from ANSYS or COMSOL. These models are also used to verify the reliability and performance of 3D interconnects in detail, e.g., for TSV arrays [98] [99] [100] . For the electrical domain, well-known techniques such as SPICE simulations, LVS or ERC verification are applied as well, but with additional consideration of 3D interconnects and multi-physics effects [101] . Some tools (e.g., Mentor Calibre) have also been tailored for this need.
For "medium" design phases, such as placement and routing, models are more abstract and capture the system behaviour. Here, the main challenge for simulation is to survey the complex, multiphysics coupling between individual components and to evaluate the overall system behaviour, all in relatively short runtime. Models are often derived from detailed simulations of small components, e.g., single IP blocks or few TSVs. These "characterizing simulations for base components" are independent from the actual design process and can be conducted in advance by experienced engineers, who then encapsulate their findings, typically into parameterizable modules or analytical models. A prominent example for this is the simulation of TSV arrays-they have been successfully modeled using manifold approaches: as multi-port components along with S-parameters for simulation of signal coupling [17, 18, 102] ; as MIMO channels to regulate equalization of such coupling [103] ; via superposition of TSV's stress components to capture their impact on power and/or timing [48, 96] ; and via superposition of stress components to determine thermo-mechanical stress itself [9, 96, 100] . The automated generation of such models, also for other components than TSV arrays, remains an open challenge [104] .
High-level design renders simulation challenging due to its abstract nature; 3D chips are typically described as a chip-package stack in SystemC, VHDL or Verilog, or even in custom languages. As with pathfinding (Sec. 4.1), simulations are initially approximative (and rely on design experience); more accurate simulations will become feasible after subsequent design/simulation stages fed their findings back. In general, models have to be scalable to support 3D stacks with diverse, multi-scale geometries [102] . Regarding needs for EDA tools, a unified data handling of high-level descriptions is sought after, to ease streaming the 3D-chip configuration to and from different simulation and design tools [105] . A flexible configuration or even the generation of PDKs is required to streamline verification at this chip-package system level [105] ; Cibrario et al. [106] released such a PDK generator for the (currently widely adopted) 3D stacking technologies from CEA-LETI.
PROMOTING 3D CHIPS VIA DESIGN STANDARDS AND FILE FORMATS
Design standards and file formats for seamless data exchange are important measures to achieve the advances required in design automation for 3D chip stacks more efficiently. As for standardization, recent progress covers memory integration and testing:
• The JESD229 standard [92] , commonly known as Wide I/O, considers one up to four memory dies stacked on top of a controller die, with four up to eight 128-bit-wide memory channels. The standard further covers details on AC and DC characteristics, packages, and bump assignments. First Wide I/O stacks and design flows are available [12, 91, 96, 107 ].
• JESD235A [108] , better known as High Bandwidth Memory (HBM), is an alternative standard for memory integration and adopted by industry, e.g., by Hynix [109] and Nvidia [110] .
• Yet another standard, the Hybrid Memory Cube (HMC) [3] , has recently gained more attention as well. It is backed by Altera, Micron, Open-Silicon, Samsung and Xilinx.
• As for testing, standards are currently under development; IEEE P1838 evolves as the most promising option [111] . The therein proposed wrapper circuitry enables controllability and observability at the die boundaries. Besides the novel wrapper, existing test facilities are considered for reuse whenever possible: IEEE 1149.x for test access, IEEE 1500 for die test, and IEEE P1687 for internal debugging.
These standardization efforts are crucial for adoption of 3D chip stacks since they facilitate first commercial products (i.e., memory stacks) and are streamlining DfT, which is a pressing concern from industrial perspective. However, further aspects such as thermal management and power delivery (along with related analysis/verification) also call for standardization efforts [102] . As indicated before, another essential need is the seamless data exchange between different parties. Heinig and Fischbach [112] proposed an assembly design kit (ADK) for this purpose, which is analogous to the well-known PDK but tailored for 3D chip stacks. An ADK serves the exchange of both design data as well as manufacturing requirements. Using input from all parties, it encodes details of manufacturing steps and procedures, material properties, geometrical descriptions of the stack along with its components and interconnects, as well as assembly rules such as minimal pitches. Besides, an ADK contains customized data converters to "translate" any 3D-stack design into appropriate settings/commands for the manufacturing equipment. Ferguson and Ramadan (Mentor Graphics) also advocate the concept of ADKs [113] . They highlight its potential for simplified verification and sign-off procedures, especially for interfaces between dies and/or the package.
SUMMARY
In this paper, we discuss the most relevant aspects of automating the physical-design process for 3D chip stacks. We highlight how this process becomes increasingly difficult and demanding as compared to well-engineered design automation for 2D chips, and we review the state-of-art for such 3D-chip design automation.
Classical challenges such as thermal management, clock delivery and placement also became notably more complex, and have seen much academic and industrial R&D in the past. Novel, 3D-specific challenges, such as pathfinding or multi-physics simulation, have undergone similar progress recently. However, further problems remain such as the system-level design of global interconnects and stacking-aware partitioning. Other challenges to be addressed are more technology-dependent, e.g., the synchronized arrangement of all different types of TSVs. Finally, the automation and data management for high-level design is of particular importance-such measures are essential for initial design exploration, collaborative and distributed design teamwork, and final design closure.
We conclude that design automation for 3D chip stacks is challenging and not yet fully solved, though more and more sophisticated EDA solutions are being proposed. Recent efforts such as ADKs and pathfinding tools, among others, help to streamline the design process and, thus, render 3D chip stacks more applicable. 
ACKNOWLEDGMENTS
