Abstract-
I. INTRODUCTION
Transistor level 3-D integration is considered the most promising direction for replacing 2-D CMOS due to its density and performance benefits. Our proposal for transistor level 3-D integration called stacked horizontal nanowire based 3-D IC fabric (SN3D), is based on stacked horizontal nanowires and uses architected features for circuit functionality, interconnection, and thermal management. Previously we reported huge gains with SN3D design [1] [2] [3] . In this paper, we focus on cost aspects and show detailed comparison with 2-D CMOS, TSV based 3-D CMOS and monolithic 3-D integrations. Fig. 1 shows core components of the fabric and overview of circuit mapped SN3D fabric. Stacked suspended horizontal nanowires (Fig. 1A ) are the building blocks, which are prefabricated, and predopped. Architected fabric components: Gate-All-Around junctionless nanowire FETs ( 1E ) are used for input/output signals, and to carry excess routing to top metal layers, those maximizes the routability in SN3D fabric. Fig.2 presents the conceptual view of the four integration approaches (2D, 3D, M3D and SN3D) and illustrates their difference in terms of interconnection hierarchy. In 2D CMOS, active devices are on the substrate at the bottom and all the routings are carried out on top metal layers (Fig.2 A) . Similar are the active device layers and metal routing in conventional 3-D integration approaches (3-D and M3D), whereas, 3-D package and 3-D routing (Fig. 2 B & C) is through stacked prefabricated dies, and perforated thick inter-die vias (TSVs for 3-D and MIVs for M3D) respectively. Here, the die stacking is limited (typically two layer) and relatively very few vertical interconnects replace the global interconnects. Whereas, SN3D is an integrated device-interconnect-fabric ( fig. 2D and 1F) , where, active devices are implemented in stacked array of NWs, and 3-D interconnects are through architected fabric interconnect features. These fabric interconnect features replace the metal interconnects from local to global inter connect lengths ( fig. 2D ). Thus, SN3D offers dense and fine-grained package of both devices and interconnects, yielding to huge foot print and interconnects length reduction. These implies circuits implemented through SN3D fabric offer increased funtionality and improved performance at reduced power consumption [1] [2] [3] . However, any such benefits of an IC fabrication approach ultmately has to translated to cost benefits or at least to endurable cost trade-off, inorder to adopt the aprroach to mainstream or research. This paper presents the benefits of SN3D implementation in cost form. The following section presents the cost model used to anlyse SN3D fabric.
II. COST MODEL
The cost model used for 2-D CMOS, TSV based 3-D IC, monolithic 3-D and SN3D is illustrated in the Fig. 3 . As SN3D IC follows a new fabrication approach that does not have prior cost data, the early cost estimation model given is built up on the prior estimates of the major cost dictating aspects, such as, interconnect, die area, complexity of process steps, and metal layers requirement. Initially, the die area and the interconnect requirement is estimated based on SN3D design rules, guidelines and circuit design principles developed (Fig. 3) . Also, the 3D interconnects replacing the metal interconnects are quantitatively expressed. Later, interconnects and area estimated are input to the metal layer estimation algorithm to calculate the number of metal layers required for the design considered. Finally, the estimated die area and and metal layer count serve as the independent variables in the cost function developed, where, the complixity and total number of different fabrication process steps required are parameterized as proportionality parameters in the cost expressions. Also, to note, the cost model presented is applicable to four of the fabrication approches considered, because the underlying fabrication processes are same for all, therefore the model enables the fair relative comparsion of SN3D cost estimates with the rest. Towards which, the next sections present the scheme for interconnect, die area and metal layers estimation.
A. Interconnect Estimation for SN3D
Interconnect projection for a placement of logic gates is an important variable for early estimation of wiring space requirement, metal layer requirement, delays, power dissipation, and cost. For that, this section shows the hierarchical partitioning and placement strategy considered for SN3D fabric and estimates the interconnect density for circuits implement in SN3D. The interconnect estimates obtained would be used in a later sections for estimation of metal layers, and cost (Fig.3) . For a hierarchically partitioned system (as shown in fig.4 ) implemented in SN3D, the number of metal interconnect terminals ( ) required by a module containing blocks is given by Rent's [6] empirical expression as = (1) Where is the average number of terminals (input and output) per each block ( fig.4) , and is the rent's constant which specifies the complexity of the design considered. Also, the total number of interconnects ( ) required for a circuit design is proportional number of terminals available, and it is given by the expression [7] = (1 − )
Where is the fraction of the on-chip terminals that are sink terminals and is related to average fanout, . ., as = . . . . +1 ⁄ . In SN3D approach, a significant number of these interconnects are replaced by fabric interconnects, reducing the interconnect routing on top metal layers. These interconnects reduction is expressed quantitatively next; a modified rent's expression and an interconnect density function is formulated for SN3D. This method is adopted from [10] , which suits the SN3D fabric precisely. Consider a circuit design with gates as represented by fig.4 , they can be equally distributed in to nanowire transistor layers (we consider =10 in this paper) in the SN3D fabric. This partitioning scheme is represented in fig.5A , here each layer incorporates gates. The number of terminals emanate from each layer ( ) due to such partition can be given by Rent's expression as
Where is the layer's number and is the total number of layers. Summation of over all the layers ( fig.5A) gives the total number of terminals !" available in SN3D over all the layers.
Next, the fig.5B respresents partitioning and placement scheme in SN3D fabric, where, the total terminals ( !" ) emanting are utilized in two distinct ways: 3-D interconnect features specific to SN3D, such as CG, CC and HB ( fig.1 and  2D )consume +,-. terminals; and metal layer interconnects consme terminals. Hence, !" is comprised of ( fig.5B )
Now, from 01(1), 01(4) and 01(5)
From which, the average number of terminals consumed by fabric's 3-D interconnects features per each nanowire layer is +,-., = 789:
Similarly, the number of terminals contributing top metal layer interconnects per each nanowire layer ( , ) is the difference of total number of terminals per each layer ( ) and layer's fabric interconnect terminals ( +,-., )
Comparing +,-., and , with its Rent's equivalent for each layer (eq (3)), the effective average number of terminals per each block consumed for fabric interconnects and metal interconnects are ; !",+,-. = (1 − 6& ) and ; !", = ( 6& ) (6) Now, the total number of fabric interconnects can be expressed quantitatively by replacing with ; !",+,-. in 01 (2) ,+,-. = = (1 − 6& ) (1 − )
Similarly, the metal interconnects of different lengths can be calculated by replacing with ; !", in an interconnect estimation model. These variables capture the characteristics of given logic design considered and the silicon floor architecture. considers size of the logic design i.e., number of logic blocks in the circuit considered;
is the rent's coefficient which defines the complexity of the circuit considered; ? is the manhattan distance which depends on architecture of the silicon floor; and is the average number of terminals per each block in the design. As discussed in eq (5), the effect of fabric interconnects on metal interconnects is expressed through reduction in ; !", . Thus, employing ; !", for k, the modified expression of (?) for SN3D is Region I: 
Where is the fraction of the on-chip terminals that are sink terminals and is related to average fanout, . ., as = +.P.
+.PQ&
The normalization factor H is given as H = This impact is pictorially depicted in Fig.7 : the quantity of 3-D interconnects is given by ,+,-. (eq (6)) which is represented in the figure by interconnect lines among the blocks; and the reduced metal layer interconnects is given by !" M2A N (eq (9)) which is represented as metal lines on the top. The reduction in total interconnects !" M2A N over G" M2A N is 54% (this can be further enhanced by exploring the efficient routing options in SN3D fabric). This reduction reduces the average interconnect length and total wiring requirement of the chip, consequently alleviates the metal layers requirement, wire delays, power loss and cost.
B. Die Area Estimation
Die area for gate-limited designs [11] is proportional to the total number of gates available in the design, accordingly, it can be formulated for 2D, 3D, M3D and as
Where is the total number of gates in the design and \ is the average footprint area of each gate, which is an empirical constant. The average gate area in 2D CMOS \ ,G" is taken from [11] , which is 3125 λ 2 . Correspondingly, supposing an efficient 2 layer 3-D integration, the average gate area for TSVs ] is estimated from [11] , and ^_] for a transistor-level-monolithic 3-D design could be expressed from eq (6) as (monolithic 3-D is limited to 2 tier design)
,
Similarly, die area for SN3D is given by \ ", !" = \ , !"
The average gate area \ , !" for SN3D is calculated as 432λ 2 based on our previous SN3D layout designs [1] [2] [3] . The fig.8 represents a single gate layout [2] in SN3D consuming the footprint of only one NW transistor (other transistors are implemented in subsequent nanowires on the bottom layers) plus the footprint for fabric vias, and HBs. It is to be noticed that block out area for fabric vias is incorporated in the gate area of SN3D. Thus, solving the given die area expressions, fig.9 shows the die area estimates for 2-D CMOS, 3-D, M3D and SN3D approaches for 5,10 and 20 million logic gate designs. From the figure, SN3D shows a huge reduction in area compared to other approaches. These area estimates are used for metal layer estimation and cost projection in subsequent sections.
C. Metal layer Estimation
The metal layer estimation algorithm [19] [11] is based on iterative bottom up placement of interconnect distribution on to the available routing area in successive metal layers. The total interconnect routing length available on a metal layer is given by [19] e -f, = g \ ", !" − \ f -d, h
Where \ ", !" is the SN3D die area, indicates the metal layer count, g is layer's routing efficiency, \ f -d, is the layer's total via block out area and h is the layer's metal wire pitch. \ f -d, can be calculated by determining the total number of vias passing through the metal layer . Each interconnect of the fanout terminal of the gate routed on metal layers requires two vias , one to receive the signal from the active devices or lower metal layers, and the other, to pass the signal to active devices or other metal layers. Now, considering a metal layer , each fanout interconnect routed in metal layers above the requires 2 vias passing through the current layer. Therefore, the number of vias passing through a given metal layer is twice the number of fanout interconnects routed on metal layers above the current layer
Where \ f, is block out area of single via on layer , is the total gate count, . is the average fanout per each gate and (? ) is the cumulative interconnect density which gives the total number of interconnects routed until current layer . Here, ( . ) is the total number of fanout interconnects available in the design, thus, ( . − (? )) gives total number of fanout interconnects routed above current metal layer . is the factor that accounts for fraction of interconnects shared with in a common net [19] . Upon completion of all the interconnects available in interconnection distribution, the routing algorithm terminates, and the value of gives the estimation of metal layers required for the design considered. Table 1 . Presents the results of metal layer estimation for 2D, 3D, M3D and SN3D approaches for 5,10 and 20 million logic gate designs. SN3D shows 2.5x reduction in metal layer requirement, which would reduce the cost of metal layers, that is discussed in the next section. Projection of this constituent costs for a particular fabrication approach needs both prior cost data from foundry and the design phase information available. As SN3D is a new fabrication approach, which has no previous cost data from the foundry, we have developed a cost model which best captures the design phase information available and presents a forecast proportional to actual cost, thence, enables a faithful comparison of SN3D cost estimates with 2D, 3D and M3D approaches. The design phase information like die area (\ G" ), metal layers ( ) and fabrication process steps required, serve as the determinants. Accordingly, the general expression for die cost and metal cost can be expressed as l m , + l ,.-I = (q m \ " , ) + (q \ " , ) (r)
Where: \ " , and (estimated is previous sections) are the variables dependent on physical design of the logic complex; q m and q are the proportionality constants which are parameterized corresponding to the process steps involved in the fabrication method. q m is a parameter proportional to process steps involved in fabrication of active devices on the die, and q is the parameter proportional to process steps involved in fabrication of metal layers on top of die. These parameters for different fabrication approaches are formulated next.
The sequential processes involved in any IC fabrication can be classified into five major process steps, photolithography, diffusion, deposition, etching, and implantation. These process steps can be parameterized in terms of an arbitrary cost. Let n be the arbitrary cost, which is the cost a unit area of silicon chip consumes when subjected to each of the five processes. This is conveyed in the fig.11A , where, a unit area of substrate is subjected to one photolithography, one diffusion, one implantation, one deposition and one etching process steps. Thus, n can be expressed into constituent cost constants of photolithography st , diffusion "u , deposition "s , etching v and implantation _^ n = st + "u + v + "s + _^ Now, fig.11C [12] depicts the relative cost of these major process steps involved in semiconductor industry, it is taken from [12] . From these relative cost statistics ( fig.11C) , the above constituent cost constants can be appropriately deduced to fraction of n i.e., st = 0.32 n ; "u = 0.22 n ; v = 0.18 n ; "s = 0.16 n ; and _^= 0.12 n Next, determining the number of different process steps from the process sequence [2] and for unit area of a metal layer [18] . Substituting this process steps count in 01(q) and from 01y (r) and (=), the final cost expressions formulated for 2D, 3D, M3D and SN3D are l G" = 6.26 n \ G" + 2 n \ G" + l nPPI o l !" = 7.26 n \ !" + 2 n \ !" + l pP m o + l nPPI o l^! " = 7.26 n \^! " + 2 n \^! " + l pP m o + l nPPI o l !" = .34 n \ !" + 2 n \ !" + l nPPI o
Where: 6.26, 7.26, 7.26, and 26.54 are the die process parameters (q m ) calculated for 2D, 3D, M3D and SN3D respectively; whereas, metal layer's process parameter q is same (i.e., 2) for all the approaches because metal layer's process steps are identical in all the approaches. \ G" , \ !" , \^! " , and \ !" are the die areas estimated in section B for 2D, 3D, M3D and SN3D respectively, and metal layers required ( ) in different approaches are as estimated in section C (table.1). Hence, by evaluating these expressions, SN3D cost is estimated and compared to conventional fabrication approaches. Such kind of cost estimate would enable the fair relative cost comparison among different fabrication approaches. Fig.12 shows the cost estimates evaluated for four of the integration approaches along with its components costs. SN3D cost estimate show huge saving due to following reasons. First, though number of process steps increase in SN3D due to voluminous package of devices (this is reflected in process constant 26.54) huge reduction in the die area reduces the die cost, moreover, as noted in table.2 and fig.11C , SN3D shifts the processes to relatively cheaper deposition and etching steps compared to extreme lithography steps needed in sub nanometric regime in conventional approaches. Second, reduction in metal interconnects and metal layer requirement alleviates the metal layer cost for SN3D. Third, SN3D does not require any bonding cost ( fig.12 ), whereas 3D and M3D incurs significant bonding cost. The bonding costs presented are estimated based on relative cost data from [11] . Fourth, the cooling cost is proportional to temperature of the chip, 3D and M3D requires relatively high cooling cost due to extreme temperatures in stacked dies and least heat escape paths. Whereas SN3D owing to its fine-grained thermal management scheme heats up less [20] [][] compared to other three, therefore, it incurs low cooling cost. As shown in the the fig.12 , SN3D consumes xx% and xx% lower cost than 3D and M3D, proving it to be promising 3D integration direction to research.
