Performance Analysis of 3-D Monolithic Integrated Circuits by Bobba, Shashikanth et al.
 Performance Analysis of 3-D Monolithic 
Integrated Circuits 
 
 
    Shashikanth Bobba║1, Ashutosh Chakraborty║2, Olivier Thomas║3, Perrine Batude║3, Vasilis F. Pavlidis║1, and 
Giovanni De Micheli║1 
║1 Integrated Systems Laboratory (LSI), EPFL, Switzerland 
║2 Department of Electrical & Computer Engineering, University of Texas Austin, USA 
║3 CEA-LETI/MINATEC, 17 rue de Martyrs, 38000 Grenoble, France 
 
 
Abstract 
3-D monolithic integration (3DMI), also termed as sequential 
integration, is a potential technology for future gigascale circuits. Since 
the device layers are processed in sequential order, the size of the 
vertical contacts is similar to traditional contacts unlike in the case of 
parallel 3-D integration with through silicon vias (TSVs). Given the 
advantage of such small contacts, 3DMI supports stacking active layers 
such that fine-grain integration of 3-D circuits can be implemented. This 
paper extends the idea of constructing the standard cells across two 
active layers, forming 3-D cells, to reduce the overall area and 
interconnect wirelength of a circuit. To demonstrate the effect of the 
3DMI technology on these important parameters of circuit design, two 
important communication blocks are evaluated. Specifically, a low-
density-parity-check (LDPC) decoder as a sample of interconnect-
dominated circuit and a data-encryption-standard (DES) block, which is 
good instance of a gate dominated circuit, are investigated. By 
employing 3-D cells in the conventional design flow chain, there is more 
than 10% decrease in wirelength for both circuits (in wirelength driven 
placement mode). However, when subjected to timing driven placement 
a slight reduction in delay (1.6%) is observed for an LDPC decoder, 
whereas for the DES block considerable delay reduction (14.22%) is 
achieved. 
Keywords  
3D, Monolithic Integration, Cell Design. 
 
1. Introduction 
      3-D integration provides an effective platform for realizing 
future gigascale circuits by integrating multiple layers of active 
devices on a single 3D chip [1, 2]. 3-D fabrication technologies 
can be broadly classified into two groups according to preferred 
integration scheme: a) 3-D parallel integration (or TSV-based 
technology) in which each active layer, along with the respective 
interconnect metal layers, is fabricated separately and is 
subsequently stacked via TSVs [3, 4], and b) 3-D monolithic 
integration, in which the stacked transistors are grown 
sequentially on the same wafer. Developing TSV manufacturing 
technologies is expensive in terms of cost, yield and area. For 
example, the TSV pitch is usually around (5 µm -10 µm) [11] 
compared to 100 nm contact dimensions offered by 3DMI 
technology [9].   
    The performance of ICs in advanced technology nodes is 
dominated by the interconnect delay [18]. In 3-D integration, the  
 
Contact Email: shashikanth.bobba@epfl.ch 
benefits in terms of wirelength, latency, and power depend on 
the granularity level at which the circuit is partitioned [8]. In 
the case of TSV technology, due to low precision of the 
alignment capability of the equipment and the relatively large 
size of TSVs, circuit integration at transistor/gate level is not 
feasible. Consequently, 3DMI is a promising choice for ultra-
high density 3-D circuits. The cross-section view of a 3D 
monolithic die with two active layers is illustrated in Figure 1. 
In 3-D monolithic integration, top transistor layers are 
processed sequentially on the lower transistor layers.  As the 
alignment of the top transistor-lithography-levels occurs after 
the bonding of the new top active layer, the alignment 
precision is only linked to the performance of the stepper (for 
example, 3σ = 10 nm for 45 nm node equipment [5]). To date, 
3-D contact dimensions of ~100 nm have been demonstrated 
[6]. Alternatively, in 3-D parallel integration (or 3D-TSV 
integration), two wafers are stacked after the individual 
processing of each wafer, thereby demanding high alignment 
precision (3σ = 1 µm [11]).  
 
 
Si substrate
P-type devices
Metal 1 
( Copper)
Metal 2
Metal 9
N-type devices
More Active 
layers like in TSV 
technology
3D contact
 
 
Figure 1. Cross-section view of a 3D monolithic die with 2 active 
layers. 
 
 
Achieving small 3-D contact pitch in monolithic 
integration is an important asset. However, growing high 
quality top FET at low temperatures to preserve the electrical 
characteristics of bottom FETs and metal interconnections is a 
challenging task. To achieve reasonable performance for the 
 top transistors, a maximum value of 600°C-650°C for the overall 
thermal budget is needed. Recently, Batude et al., have 
demonstrated top and bottom transistors with similar 
characteristics [9]. Advancements in this technology have been 
demonstrated both for memory [6] and logic applications [10]. 
This paper extends the idea of folding standard cells across 
multiple active layers; thereby forming these cells in 3-D [9]. In 
this study, we consider two active layers wherein the pull-down 
network (PDN) of complimentary logic (comprising of n-type 
devices) is realized at the bottom active layer and pull-up network 
(PUN) is laid in the top active layer. Traditional ASIC design 
flow is considered in our experimental setup where the impact of 
3DMI technology on two case studies, LDPC decoder and DES 
block is investigated 
The rest of the paper is organized as follows. In Section 2, we 
present the planar to 3-D transformation of a standard cell. In 
Section 3, we explain the experimental setup and showcase 
various benchmark circuits, with quantified benefits. Finally, we 
conclude in Section 4. 
 
2. Planar to 3-D Transformation of a 
Standard Cell 
    In this section, we present the layout partitioning at a standard 
cell granularity. Standard cells implement a pre-defined logic 
function (for example, AND gates, OR gates and flip-flops) and 
have fixed height but varying widths. The structure of a typical 
standard cell laid in 2-D is shown in Figure 2a. The power and 
ground rails are located at the top and bottom of the cell. Active 
region height (HACT) of the cell is where the transistors are 
fabricated. The distance between two diffusion regions is called 
diffusion gap region, where we place the input pins. Since 3DMI 
technology offers multiple active layers adjacent to each other, 
the layout of the standard cell can be folded in multiple layers [9]. 
For instance, as illustrated in Figure 2b, p-type devices (forming 
the PUN)  are realized on the top active layer and n-type devices 
 
 
p-diffusion
n- diffusion
polysilicon
IO pins
p-diffusion (top active layer) over 
n-diffusion (bottom active layer)
(a)
(b)
Power Rail
Ground Rail
Planar to 3-D 
transformation
H|IO
H|PdiffH|IO
H|Pdiff
H|Ndiff
Power Rail
Ground Rail
 
 
Figure 2. A two-input Standard cell (a) Typical cell in 2-D (planar) 
configuration and (b) Cell designed in 3-D by realizing the PUN on the 
top active layer and the PDN in the bottom active layer. 
 (forming the PDN) at the bottom active layer. Since the PUN 
is typically larger than the PDN, the active region height for a 
3-D cell (HACT3D) is limited by the height of the P-diffusion 
(HPdiff). The active region height of a 3-D library is given by 
the following equation, when mapped directly from a 2-D 
library: 
 
        HACT3D  =  HACT2D - HNdiff  + HIO                      (1) 
 
    In the above example, it can be observed that the reduction 
in the height of a 3-D cell is due to the N-diffusion region. 
Moreover, there can be a slight increase in the space needed 
for input-output (I/O) pins in the 3-D layout, as the design 
rules should be obeyed, considering the close proximity of 
wide power rails. Table I shows a comparison of the standard 
cell height of existing 2-D (planar) standard cell libraries 
before and after the cell transformation. We have benchmarked 
across two important cell libraries at 45 nm and 65 nm 
technology node. A virtuoso snapshot of a D-flipflop at 65 nm 
technology node is depicted in Figure 3. 
 
Table I. Normalized height of existing standard cell libraries before 
and after cell transformation 
 
 
Cell Height 45 nm Nangate 
Library  
45 nm commercial 
library  
65 nm commercial 
library 
Planar (2-D) 100 % 100 % 100 % 
  3-D library 71.43 % 71.61 %  69.05 % 
 
 
 
 
 
 
 
Figure 3. Virtuoso snap-shot of a D-flipflop built in 3-D at 65 nm 
technology node. 
 
 
      By local stacking, all the cells are spread across two active 
layers (3-D cell) fetching ~29% gain in the standard cell height. 
One of the primary advantages of this transformation is the 
ease in integration with the conventional design flow, as the 
design effort consists of developing only the 3-D cell library. 
Hence to realize circuits with these new 3-D cells, the existing 
2-D physical design tools can be employed without any 
modifications.  
       Batude et al., [9] have demonstrated the area gain by 
designing the cells in 3-D. However, their study is limited only 
to the cell design. We infer that, place and route information 
 plays a vital role to determine the impact of 3DMI technology. In 
this work we employ the 3-D cell library into the conventional 
RTL-to-GDSII design flow to benchmark various circuits. The 
original contribution of our work is to study the impact of 
interconnect and in the process to find the promising circuits, 
which fetch in performance gains, for the 3DMI technology. 
 
3. Experimental Setup and Results 
 
    In this section, we focus on the implementation details along 
with the experimental setup. The standard cells in the Nangate 
Open Cell Library [13] are mapped to the corresponding 3-D 
equivalent (Section 2) by changing the physical-attributes of the 
cells. For instance, the height of the cells is reduced by 30% 
without modifying any width. The size of the I/O pins is retained 
as in the case of a 2-D cell while the location is altered. This 
approximation holds true as the traditional placement tools take 
the LEF (Layout Exchange Format) file as an input for placing 
the cells. Since the driving strength of the gates is not altered 
similar delay characteristics are assumed as in the planar case, as 
the main difference arises from the intra-cell parasitic which 
have relatively small impact on the overall delay of the cell. 
Accurate library characterization is considered as part of our 
future work.  
 
3.1. Benchmark Circuits 
    To quantify the benefit of the 3DMI technology, two important 
benchmarks circuits are synthesized with the Nangate 45 nm 
Open Cell Library [13] before and after the local-stacking 
transformation. A low-density-parity-check (LDPC) decoder and 
a data-encryption-standard (DES) block from opencores [7] have 
been considered for our analysis. The specifications of these 
blocks are listed in Table II. In LDPC decoders, interconnect 
plays a dominant role as half of the total wires traverse the chip 
from one end to the other. Furthermore, the I/O pin count is high. 
Alternatively, the DES block falls on the other end of the 
spectrum where the I/O-pins are limited and is not interconnect 
dominated. In order to study the effect of interconnect (after place 
and route) on the performance of the circuits the following tools 
are employed. 
      Synopsys Design Compiler (A-2007.12-SP4) [14] is used for 
mapping the RTL of the benchmarks onto target 3-D standard 
cell library. Cadence SOC Encounter (v8.1) [15] is used as the 
physical synthesis engine to generate the virtual seed placement 
in timing driven mode. Timing analysis is performed with 
Synopsys PrimeTime (D-2009.12-SP2) using the capacitance 
table of the NCSU design kit at 45 nm technology node [12]. 
 
3.2. Results and Discussion 
    The benchmark circuits used to quantize the benefit of 3-D 
cells in the context of 3DMI technology are reported in Table II. 
The Dmin of the circuit indicates the minimum possible delay 
achievable if no changes in the circuit netlist are allowed during 
placement.  Note that Dmin sets the starting seed value for timing 
optimization. We performed optimization in two configurations: 
in the first mode, wirelength driven placement is performed, and 
in the second mode, timing driven optimization along with in-
place optimization is applied which performs various 
optimization tasks, such as buffer insertion, gate sizing, and 
cell replication. 
    Experimental results are summarized in Table II. In this 
Table the total wirelength, total power and critical path delay 
of different benchmarks are reported after placement is 
performed using the two cases mentioned above.  All numbers 
are reported using Cadence Encounter (EDI) v8.1 (2010 
release).  The power numbers include all components of the 
power dissipation namely leakage and switching power. 
 
 
Table II. Wirelength, delay, and power results of LDPC decoder 
and DES block when realized with 2-D (planar) and 3-D standard 
cells. 
 
 
Circuit Objective Wirelength Driven 
Timing driven + In-
Place Opt. 
 2D 3D 2D 3D 
LDPC       
#Nets = 48K     
#Cells = 44K      
#Pins =4100 
Dmin=6.904 
Wirelength 15.4E+5 13.8E+5 18.3E+5 16.0E+5 
Circuit delay (ns) 8.503 7.312 2.461 2.421 
Power (mW) 1201 1105 1554 1461 
DES         
#Nets = 59K     
#Cells = 56K      
#Pins = 298 
Dmin=2.532 
Wirelength 5.84 E+5 5.08 E+5 6.71E+5 5.81E+5 
Circuit delay (ns) 3.316 3.518 1.132 0.971 
Power (mW) 536.5 526.8 620.2 608.2 
 
 
In wirelength driven placement mode the primary objective 
is to reduce the overall wirelength of the design on the die. We 
can observe from the above table that the LDPC decoder is an 
interconnect dominated circuit. In a planar case, the total 
wirelength of the LDPC decoder is roughly three times higher 
than the DES block, whereas the cell and net count of an 
LDPC is 20% less than that of DES. Consistent improvement 
in wirelength reduction is exhibited for both the benchmark 
circuits, fetching 13.01% wirelength reduction for the DES 
block and 10.39% for the LDPC decoder. The percentage 
improvement in performance and area are plotted in Figure 4.  
 
 
 
 
Figure 4. Performance improvement in area, wirelength, and delay 
of LDPC decoder and DES block realized with 3-D cells when 
compared to the respective planar implementations. 
 Note that when subjected to wirelength driven placement, delay 
should not be compared. Hence, care should be taken when 
analysing the data from Table II. 
In the timing driven placement with in-place optimization 
mode, the placer has flexibility to apply any synthesis or timing 
optimization transforms to the netlist, improving the timing.  For 
this set of experiments a very high time requirement is set to 
maximize the timing improvement. In this manner, we can test the 
best performance that each of the techniques can provide. 
Compared to the 2-D case, use of 3-D cell library can reduce the 
critical path delay further by 14.22% for the DES block and 
1.63% for the LDPC decoder.  
    In summary, designs with 3-D standard cells are considerably 
effective in reducing the overall area. However, the reduction of 
the total die area comes at a cost of more expensive technology 
as two active layers are employed. It can be inferred from the 
experiments that the proposed design approach is more effective 
for circuits, which are not interconnect dominated. This result 
differs when compared to the similar study for traditional 3D 
technology with TSVs [16, 17]. Employing 3-D cells for 
interconnect dominated circuits leads to more congestion and 
since the same backend metal lines are used limited delay gains 
can be achieved. On the other hand, it turns out to be very 
effective for gate-delay dominated circuits as we reduce the 
overall wirelength thereby reducing the delay. 
 
4. Conclusions 
3DMI technology, offering 3-D contacts with sizes in the 
order of ~100 nm, is an effective vehicle for future gigascale 
circuits. In this work, we support the idea of realizing compact 
standard cells designed in 3-D (employing two active layers) as 
the building blocks for an ASIC design flow. This paper has 
demonstrated the impact of 3DMI on two different benchmark 
circuits, LDPC decoder as a case-study for interconnect 
dominated circuits and DES block representing a gate dominated 
circuit. 3DMI technology with 3-D standard cells provide 
significant reduction in total wirelength (>10%) and overall 
circuit footprint (~30%) for both the circuits. However, when 
timing driven placement with in-place optimization is applied the 
proposed approach reduces the critical path delay for the DES 
block by 14%. In the case of LDPC decoder due to the congestion 
issues the improvement in delay is considerable smaller (1.6%). 
The analysis presented in the paper, explore the scope of 
3DMI technology for ASIC design where the building blocks 
(standard cells) are designed in 3-D. However, we envisage 
higher performance and area gains when new design 
methodologies are applied. For instance, in this approach of 3-D 
cells, the number of neighbors remains the same like in a planar 
case for the placement tool. One way to incorporate the advantage 
of 3-D integration, is by keeping cells planar and by placing cells 
in two different layers during the placement phase; thereby 
doubling the neighboring cells. However, none of the exiting 
physical design tools can perform this task and new CAD tools 
should be developed [19]. These issues will be focus of our future 
work.  
 
Acknowledgments 
This work is primarily funded by the European grant: ERC-
2009-AdG-246810, and partly supported by the ST-IBM-LETI 
alliance program.  
References 
[1] Banerjee, K.; et al., "3-D ICs: a novel chip design for improving 
deep-submicrometer interconnect performance and systems-on-
chip integration," Proceedings of the IEEE , vol.89, no.5, pp.602-
633, May 2001 
[2] V. Pavlidis and E. Friedman, Three-Dimensional Integrated 
Circuit Design. Morgan Kaufmann, 2009. 
[3] Koester, S. J.; Young, A. M.; Yu, R. R.; Purushothaman, S.; 
Chen, K.-N.; La Tulipe, D. C.; Rana, N.; Shi, L.; Wordeman, M. 
R.; Sprogis, E. J.; , "Wafer-level 3D integration technology," IBM 
Journal of Research and Development , vol.52, no.6, pp.583-597, 
Nov. 2008 
[4] Sillon, N.; Astier, A.; Boutry, H.; Di Cioccio, L.; Henry, D.; 
Leduc, P.; , "Enabling technologies for 3D integration: From 
packaging miniaturization to advanced stacked ICs," Electron 
Devices Meeting, 2008. IEDM 2008. IEEE International , vol., no., 
pp.1-4, 15-17 Dec. 2008 
[5] www.itrs.net/Links/2009ITRS/2009Chapters_2009_Tables 
[6] S-M Jung; et al., "High Speed and Highly Cost effective 72M bit 
density S3 SRAM Technology with Doubly Stacked Si Layers, 
Peripheral only CoSix layers and Tungsten Shunt W/L Scheme for 
Standalone and Embedded Memory," Proc. VLSI Tech.,pp.82-83, 
2007 
[7] www.opencores.org  
[8] Loh, Gabriel H.; Xie, Yuan; Black, Bryan; , "Processor Design in 
3D Die-Stacking Technologies," Micro, IEEE , vol.27, no.3, 
pp.31-48, 2007 
[9] P. Batude., et al., "GeOI and SOI 3D monolithic cell integrations 
for high density applications," VLSI Technology, 2009 Symposium 
on , vol., no., pp.166-167, 16-18 June 2009  
[10] P. Batude., et al., "Advances in 3D CMOS sequential 
integration," Electron Devices Meeting (IEDM), 2009 IEEE 
International , vol., no., pp.1-4, 7-9 Dec. 2009 
[11] MIT 3D Desisgn Kits, version 3DEM. 
[12] NCSU FreePDK45 Design Kit. 
[13] “Nangate 45nm Library.” http://www.nangate.com/. 
[14] “Synopsys Design Compiler.”  
[15] “SOC Encounter tool.”  
[16] Puttaswamy, K.; Loh, G.H.; , "The impact of 3-dimensional 
integration on the design of arithmetic units," Circuits and 
Systems, 2006. ISCAS 2006. Proceedings. 2006 IEEE 
International Symposium on , vol., no., pp.4 pp., 0-0 0 
[17] Puttaswamy, K.; Loh, G.H.; , "Scalability of 3D-Integrated 
Arithmetic Units in High-Performance Microprocessors," Design 
Automation Conference, 2007. DAC '07. 44th ACM/IEEE, vol., 
no., pp.622-625, 4-8 June 2007. 
[18] Havemann, R.H.; Hutchby, J.A.; , "High-performance 
interconnects: an integration overview," Proceedings of the IEEE , 
vol.89, no.5, pp.586-601, May 2001. 
[19] Das, S.; Chandrakasan, A.; Reif, R.;, "Design tools for 3-D 
integrated circuits," Design Automation Conference, 2003. 
Proceedings of the ASP-DAC 2003. Asia and South Pacific , vol., 
no., pp. 53- 56, 21-24 Jan. 2003. 
 
