Exploration of 2D EDA tool impact on the 3D MPSoC architectures performance by Jabbar, Mohamad Hairol et al.
 Exploration of 2D EDA Tool Impact on the 3D MPSoC Architectures 
Performance 
 
Mohamad Hairol Jabbar1,2,3, Abir M’zah3, Omar Hammami3, Dominique Houzet2 
1Department of Computer Engineering, FKEE, UTHM, Johor, Malaysia 
2GIPSA-Lab, Saint Martin D’Heres, France 
3ENSTA Paristech, France 
E-mail: 1hairol@uthm.edu.my, 2abir.mzah@ensta-paristech.fr, 3omar.hammami@ensta-paristech.fr, 
4dominique.houzet@gipsa-lab.grenoble-inp.fr 
 
Abstract 
The need for higher performance devices to enable more 
complex applications continues to drive the growth of 
electronic design especially in the mobile markets. 3D 
integration is one of the feasible technologies to increase the 
system’s performance and device integration by stacking 
multiple dies interconnected using through silicon vias 
(TSV). NoC-based Multiprocessor System on Chip 
(MPSoC) architecture has become the primary technology to 
provide higher performance to support more complex 
applications. In this paper, we perform an exploration and 
analysis of 2D EDA tool parameters impact on the 3D 
MPSoC architectures (3D Mesh MPSoC and heterogeneous 
3D MPSoC stacking) performance in terms of timing and 
power characteristics. Exploration results show that the 2D 
EDA tool parameters have strong impact on the timing 
performance compared with power consumption. 
Furthermore, it is also shown that heterogeneous 3D MPSoC 
architecture has less footprint area, higher speed and less 
power consumption compared with 3D Mesh MPSoC for the 
same number of processing elements suggesting that it is a 
better design approach considering the limitation capability 
of 2D EDA tools for 3D design. 
Keywords 
3D IC, EDA tool, Exploration, MPSoC, NoC, Physical 
design 
1. Introduction 
ITRS [1] projected that the number of processing cores 
will be increased in the near future. 3D integration has 
become the alternative technology to continue producing 
higher performance electronic devices stacking of multiple 
dies or wafers interconnected using through silicon vias 
(TSVs). For future manycore architecture with Network on 
Chip (NoC) architecture, 3D IC technology is very important 
in the sense that it provides many advantages which are not 
available through traditional 2D architecture design methods 
such as higher memory bandwidth [2] and higher inter-core 
communication performance through vertical connections 
[3]. Design space exploration is one of the important things 
to be concerned helping designers to evaluate different 
architectural implementations possibility before it is 
implemented in real hardware and is particularly important 
for 3D architecture to be able to choose the best architectural 
candidate with the most performance gain.  
As there are no design implementation tools for 3D IC 
design to date, we want to examine how much performance 
impact on the use of 2D EDA tool for designing 3D 
architecture. The reason is because deep understanding 
about how much performance is affected by different EDA 
tool parameters as well as by different 3D architecture 
implementations are essential to be able to find the best 
architectural candidate to fully benefit from 3D IC 
technology. This work is based on the previously work in [4] 
where additional results and analysis have been added in this 
work by comparison two different 3D MPSoC architectures. 
The contributions of this work can be summarized as 
follows: 
1. Analyze the impact of 2D EDA tool parameters on 
the timing and power characteristics of 3D 
Multiprocessor System on Chip (MPSoC) 
architectures. 
2. Perform physical implementation analysis of 
different 3D MPSoC architectures (which have 
different critical paths location) showing the 
advantages of heterogeneous 3D MPSoC 
architecture when compared with 3D Mesh MPSoC 
especially with respect to the use of 2D Electronic 
Design Automation (EDA) tool to design and 
optimize its performance. 
This paper is organized as follows. Section 2 reviews 
some of the previous works on the heterogeneous 3D 
stacking to justify our work. Section 3 describes the 
Tezzaron 3D IC technology used in this work followed by 
exploration methodology in section 4. Section 5 presents the 
3D MPSoC architectures to be used for the exploration. 
Section 6 presents the experimental results for different 
performance metrics (timing slack and power consumption) 
and finally we conclude the work with directions for future 
works. 
2. Related works 
A limited number of works have been reported with 
regards to the design space exploration of 3D architecture. 
System level design space exploration for 3D architecture is 
proposed by [5] enabling exploration of different stacking 
and partitioning schemes and their effect on the 
performance, power and temperature. Another design space 
exploration for 3D stacked architecture is presented in [6] 
[7] focusing different 3D packaging solutions with logic and 
memory integration. Design space exploration of 3D 
978-1-4799-1314-5/13/$31.00 ©2013 IEEE                            249                  5th Asia Symposium on Quality Electronic Design

 architecture focusing on microprocessor and memory 
architecture is presented in [8]. Our previous work of 3D 
design space exploration is limited to only single tier using 
simple architecture [9] whereas our work use more complex 
architecture implementing on two-tier 3D technology.  
3D heterogeneous architectures have been studied by 
many researchers but mostly restricted to analysis based on 
software simulation results. The most common approach to 
implement heterogeneous 3D stacking is using memory-on-
logic stacking primarily to achieve higher memory 
bandwidth due to advantage of huge amount of vertical 
interconnections. In [10], authors have designed and 
implemented memory on logic architecture for the 64 
multicore processors where each data memory for each core 
is placed in another layer on top of its logic layer. The 
instruction memory is placed in the logic layer in order to 
have maximum size for data memory for each core. To 
achieve maximum memory bandwidth, the processor core is 
designed specifically to consume memory bandwidth at 
every cycle from the 3D stacked memory by allocating one 
slot for the memory instruction. However, they do not use 
NoC architecture for the communication architecture due to 
the stable, predictable and regular communication pattern in 
their data parallel application. Instead, they use buffer based 
architecture to allow processors communicate between its 
neighboring blocks. In [11], heterogeneous memory on 
memory architecture is studied stacking SRAM cache with 
logic in a layer on the 3D DRAM layer with the aim to 
optimize both performance and energy efficiency. By folding 
the DRAM bank layers into 4 layers and then share the same 
TSVs bus to the logic layers, it reduces the energy from 
transferring entire row signals. Another work on 
heterogeneous stacking is done by [12], where they stacked 
heterogeneous DRAM layers on processor layers. 
Performance analysis is done using software simulation 
based on modified CACTI and M5 simulators for full system 
simulation with multicore processor.  
This study conducts an experiment measuring the impacts 
of 2D EDA tool parameters impact on the 3D MPSoC 
architectures performance. Several placement and routing 
options in SoC Encounter place and route tool have been 
chosen and their impacts on the timing slack and power 
consumption of the 3D MPSoC architectures have been 
evaluated. Due to the unavailability of 3D design tools 
capable of doing 3D synthesis, 3D placement, 3D CTS and 
3D routing, designing using 2D EDA tools is the only 
solution for the time being. The aim of this study is to 
analyze how 2D EDA tool are affecting the overall 3D 
architecture performance since it will not be an issue when 
using a true 3D design tool. We have extended our previous 
work in [9] by integrating a complete 3D design exploration 
flow to get more accurate results and analysis.  
In contrast to the previous reported works, we based 
upon the work in [10] to further investigate the performance 
of heterogeneous 3D stacking for NoC-based MPSoC 
architecture with slight modification for the implementation 
to be more realistic considering the area of router and 
processor from the fabricated designs. In particular, a part of 
the processor component is placed in the same layer with the 
NoC architecture to cover the empty area due to the smaller 
NoC area than the processor which will be more detailed 
later in this paper. Using two-tier Tezzaron technology, we 
carried out physical design implementation of the 
heterogeneous 3D stacking architecture and compare its 
performance with the 2D architecture from architectural 
point of view. This study provides additional architectural 
exploration for the homogeneous stacking of 3D NoC 
architectures that have been done by us previously as well as 
design implementation analysis of the GALS style 
architecture in 3D technology. 
3. 3D technology 
This 3D integration technology is based on Tezzaron 
[13] that uses TSV for peripheral IOs. The two-tier 3D 
stacking method is based on wafer-to-wafer bonding, face-
to-face method with via-first approach which has been 
explained in our previous paper [14]. 
4. Exploration configurations 
In this section, we explain the EDA tool parameters and 
the design flow used in the exploration. 
4.1. Parameters exploration 
We explore placement and routing options in the SoC 
Encounter in this design space exploration as shown in Table 
1. We focus on timing and power optimization options in the 
2D EDA tool to study how this 2D optimization process 
affects the 3D MPSoC architecture performance in terms of 
timing slack and power consumption. In addition, the chosen 
small number of options for the exploration is also because 
we have limited time to explore all other options since every 
exploration for each tier requires about 4-5 hours of run 
time. 
4.2. Exploration flow 
Figure 1 shows the design flow used in this work to 
explore placement and routing options in the place and route 
tool. Synopsys Design Compiler was used for the logic 
synthesis while Cadence SoC Encounter was used for place 
and route of both tiers that is run in parallel during the 
exploration. 3D timing analysis and power analysis has been 
performed on the routed netlists of both tiers using Synopsys 
PrimeTime and PrimePower tool. The design space 
exploration is conducted using a combination of Shell and 
TCL scripts in Linux environment that automatically 
modifies the EDA tool options at each exploration iteration. 
It has been run for several days to complete all the 
exploration options. 
 
 RTL Design and 
partitioning
Logic synthesis with 
timing budgeting
Placement
Optimization 
CTS
3D power analysis
3D timing analysis
Placement
Optimization 
CTS
Placement and 
routing options 
modification
Placement and 
routing options 
modification
Design import / bumps 
assignment
Design import / bumps 
assignment
Exploration analysis
Complete 
exploration?
YES
NO NO
Optimization Optimization 
Optimization Optimization 
Routing Routing 
 
Figure 1: Exploration flow 
 
Table 1: Exploration parameters 
Design 
ID 
Placement options Routing options 
Timing 
Driven 
Power 
Driven 
Timing 
Driven 
Route Timing 
Driven Effort 
1 False False False 5 
2 False False False 10 
3 False False True 5 
4 False False True 10 
5 False True False 5 
6 False True False 10 
7 False True True 5 
8 False True True 10 
9 True False False 5 
10 True False False 10 
11 True False True 5 
12 True False True 10 
13 True True False 5 
14 True True False 10 
15 True True True 5 
16 True True True 10 
 
 
 
5. 3D MPSoC architecture 
In this section, we will present the 3D MPSoC 
architecture implementations in order to compare its 
performance. This MPSoC architecture are based on the 
NoC (a router and a network interface unit, NIU) and 
Openfire processor which have been described in details in 
[15].  
5.1. 3D Mesh MPSoC 
In this architecture, the 3D NoC is implemented on two 
tiers where each tier has identical blocks as shown in Figure 
2 and Figure 3. This is the straight forward extension of 2D 
Mesh NoC architecture where we just take a copy of a tile (a 
router and a NIU) and put it on top of another tile. 
Compared with the area of 2D Mesh NoC, this architecture 
has about 50% less footprint area. This 4x2x2 mesh NoC 
architecture is based on 3D router architecture that has 
vertical links for inter-tier connections between routers. It 
provides latency improvement through reducing its network 
diameter (reducing number of hops through vertical links) 
from six to five hops. From implementation perspective, this 
architecture has both 2D and 3D critical paths. The 2D 
critical paths are for the NoC from bottom layer to the top 
layer while the 3D critical paths are for the processor 
architecture since it is placed completely separate on each 
layer. 
 
til
e
 
Figure 2: Block diagram of 3D Mesh MPSoC 
 
Figure 3: 3D Mesh MPSoC routed layout 
5.2. Heterogeneous 3D MPSoC 
The partitioning method for heterogeneous 3D MPSoC 
architecture is shown in Figure 4 and the layouts are shown 
in Figure 5and Figure 6. It is mainly separating the NoC 
architecture from the processor architecture in different 
layers such that both architectures can be optimized 
independently. Since NoC architecture is smaller than the 
processor architecture, based on the real implementation in 
[16] [17], thus we place the instruction memory on the top 
layer to balance the area of both tiers. Vertical connections 
are made of NIU to data memory and processor to 
instruction memory. In contrast with the 3D Mesh MPSoC 
architecture, this architecture has only 2D critical paths for 
both the processor as well as the NoC and therefore able to 
demonstrate the benefit of implementing 2D critical paths 
 when designing 3D MPSoC architecture to take advantage of 
2D optimization capability of the 2D EDA tool. Comparing 
both 3D MPSoC architectures in Table 2, the difference for 
core footprint area and total core area is not large. However, 
heterogeneous 3D MPSoC architecture has about 5 times 
higher number of microbumps than 3D Mesh MPSoC due to 
the vertical signals from NIU and processor to the memories. 
We use 3 ns and 10 ns for the NoC and processor timing 
constraint. 
 
 
Figure 4: Heterogeneous 3D MPSoC partitioning 
 
 
Figure 5: Top tier routed layout for heterogeneous 3D 
MPSoC (a) floorplan (b) routed layout 
 
3910 um
3190 um
(a)
(b)
IMEM
2D 
ROUTER
 
Figure 6: Bottom tier routed layout for heterogeneous 3D 
MPSoC (a) floorplan (b) routed layout 
 
Table 2: 3D MPSoC architectures summary for the 
exploration 
Parameters 3D Mesh MPSoC 
Heterogeneous 
3D MPSoC 
Core footprint  
area (mm2) 10.58 10.40 
Total core  
area (mm2) 21.16 20.80 
Total microbumps 595 3011 
Microbumps per tile 74 188 
NoC clock period 3 ns 
Processor clock 
period 10 ns 
6. Exploration results 
In this section we discuss the exploration results based on 
physical design metrics which are processor timing slack, 
NoC timing slack and power consumption. 
6.1. Processor timing slack (WNS)  
For processor clock, the results from the exploration are 
shown in Figure 7 and Figure 8 for 3D Mesh MPSoC and 
heterogeneous 3D MPSoC respectively. The difference 
between the highest slack and lowest timing slack is about 
2.9% for the 3D Mesh MPSoC while the value is reduced to 
1.6% for the heterogeneous 3D MPSoC. Looking at the 
value of timing slack distribution for both graphs (y-axis), 
we clearly see that the timing slack is much lower for 
heterogeneous 3D MPSoC (maximum slack 0.16 ns) than for 
3D Mesh NoC (maximum slack 0.4 ns). The reason is 
 because for heterogeneous 3D MPSoC, the tile structure has 
been simplified (comparing the layouts of both 3D MPSoC 
architectures) due to the partitioning approach which 
separates the NoC architecture to the other tier (top tier). In 
contrast, the 3D Mesh MPSoC has higher placement and 
routing density for the tile structure which contains 3D 
router, NIU and processor components making it more 
difficult for the place and route tool (NanoRoute in SoC 
Encounter) to route the design due to higher complexity. In 
general, it can be concluded that 2D EDA tool options have 
a positive impact on the 2D timing performance of the 3D 
MPSoC architecture. In addition, it is shown that 
heterogeneous 3D MPSoC architecture has better timing 
performance than 3D Mesh MPSoC. 
 
 
Figure 7: Processor timing slack (WNS) for 3D Mesh 
MPSoC 
 
 
Figure 8: Processor timing slack (WNS) for heterogeneous 
3D MPSoC 
6.2. NoC timing slack (WNS) 
The results for NoC timing slack are shown in Figure 
9and Figure 10 for 3D Mesh MPSoC and heterogeneous 3D 
MPSoC respectively. For 3D Mesh MPSoC, the different 
between the highest and the lowest slack is about 13% but it 
is lower for the case of heterogeneous 3D MPSoC (about 
7%), a reduction of 6%. For the 3D Mesh MPSoC, 
Exploration ID 15 shows the worst slack even though the 
timing-driven placement and timing-driven routing options 
have been used. This result suggests that the placement and 
routing options do not affect the 3D timing performance (3D 
Mesh MPSoC has 3D critical paths for NoC). Looking at the 
timing slack distrubtion values (y-axis) of both graphs, it is 
clearly shown that heterogeneous 3D MPSoC architecture 
has lower slack distribution (maximum slack 0.3 ns) than 3D 
Mesh MPSoC (maximum slack 1.75 ns). The reason for this 
high reduction is because heterogeneous 3D MPSoC 
architecture has 2D critical paths and thus the tool able to 
optimize it better by considering it as a normal 2D design. 
Moreover, the simplified tile structure on the top tier (NoC 
architecture) also contributes to this timing performance 
improvement which has been explained in the case of 
processor timing slack. In general, it can be concluded that 
2D EDA tool options have a negative impact on the 3D 
timing performance of the 3D MPSoC architecture. 
Additionally, it has been shown that heterogeneous 3D 
MPSoC architecture has better timing performance than the 
3D Mesh MPSoC architecture. 
 
 
Figure 9: NoC timing slack (WNS) for the 3D Mesh 
MPSoC 
 
 
Figure 10: NoC timing slack (WNS) for the heterogeneous 
3D MPSoC 
6.3 Power consumption 
The results for 3D power consumption are shown in 
Figure 11 and Figure 12 for 3D Mesh MPSoC and 
heterogeneous 3D MPSoC respectively. From these figures, 
 it is clear shown that the 3D power consumption for both 3D 
MPSoC architectures does not varied very much which is 
about 40 mW between the highest and the lowest value in 
each graph. Using power driven in placement option reduces 
the total 3D power consumption as shown in ID5-ID8 and 
ID14-ID15 while using timing driven and power driven 
placement option produces the worst power consumption 
compared with other options for the 3D Mesh MPSoC. 
Considering the average power consumption value between 
both graphs, heterogeneous 3D MPSoC architecture has 
lower power than the 3D Mesh MPSoC (about 60 mW or 
3% lower). In general, it can be concluded that 2D EDA tool 
options have no big impact on the power characteristic of 3D 
MPSoC architectures. 
 
 
Figure 11: Power consumption for 3D Mesh MPSoC 
 
 
Figure 12: Power consumption for heterogeneous 3D 
MPSoC architecture 
7. Conclusion 
In this paper, we have presented a design space 
exploration of 2D EDA tool impact on the 3D MPSoC 
architectures by analyzing the effect of different placement 
and routing options to the final 3D MPSoC architecture 
performance in terms of timing and power characteristics. 
Results showed that timing slack for both processor and NoC 
varied greatly than power consumption and total wirelength 
due to exploration option of timing driven properties in the 
place and route tool. Furthermore, it is also shown that to 
take benefits from 3D technology as well as to fully utilize 
the capability of the state of the art 2D EDA tool to design 
3D architecture, ensuring critical paths in 2D paths rather 
than in 3D paths in the target 3D architectures is one of the 
possible design approaches to be employed until the real 3D-
aware design tool is commercially available. 
References 
 
[1] ITRS, “ITRS Report,” 2001. [Online]. Available: 
http://www.itrs.net. 
[2] P. Jacob, A. Zia, O. Erdogan, P. M. Belemjian, J.-
W. Kim, M. Chu, R. P. Kraft, J. F. Mcdonald, and 
K. Bernstein, “Mitigating Memory Wall Effects in 
High-Clock-Rate and Multicore CMOS 3-D 
Processor Memory Stacks,” Proceedings of the 
IEEE, vol. 97, no. 1, pp. 108–122, 2009. 
[3] A. W. Topol, D. C. La Tulipe, L. Shi, D. J. Frank, 
K. Bernstein, S. E. Steen, A. Kumar, G. U. Singco, 
A. M. Young, K. W. Guarini, and M. Ieong, “Three-
dimensional integrated circuits,” IBM Journal of 
Research and Development, vol. 50, no. 4.5, pp. 
491–506, 2006. 
[4] M. H. Jabbar, A. Mzah, O. Hammami, and D. 
Houzet, “3D MPSoC Design Using 2D EDA Tools: 
Analysis of Parameters,” in DATE 2013 Workshop: 
3D Integration - Application, Technology, Design, 
Automation and Test, 2013, pp. 1–2. 
[5] S. Priyadarshi, J. Hu, W. H. Choi, S. Melamed, X. 
Chen, W. R. Davis, and P. D. Franzon, “Pathfinder 
3D: A flow for system-level design space 
exploration,” in 3D Systems Integration Conference 
(3DIC), 2011 IEEE International, 2012, pp. 1–8. 
[6] D. Milojevic, T. E. Carlson, K. Croes, R. Radojcic, 
D. F. Ragett, D. Seynhaeve, F. Angiolini, G. Van der 
Plas, and P. Marchal, “Automated Pathfinding tool 
chain for 3D-stacked integrated circuits: Practical 
case study,” in 3D System Integration, 2009. 3DIC 
2009. IEEE International Conference on, 2009, pp. 
1–6. 
[7] D. Milojevic, R. Radojcic, R. Carpenter, and P. 
Marchal, “Pathfinding: A design methodology for 
fast exploration and optimisation of 3D-stacked 
integrated circuits,” in System-on-Chip, 2009. SOC 
2009. International Symposium on, 2009, pp. 118–
123. 
[8] Y. Xie, G. H. Loh, B. Black, and K. Bernstein, 
“Design Space Exploration for 3D Architectures,” J. 
Emerg. Technol. Comput. Syst., vol. 2, no. 2, pp. 
65–103, Apr. 2006. 
 [9] A. M’zah, O. Hammami, and J. Mouine, “The 
Impact of EDA Tools in 3D IC Design Space 
Exploration: A Case Study,” in DATE 2012 
Workshop: 3D Integration - Application, 
Technology, Design, Automation and Test, 2012. 
[10] M. B. Healy, K. Athikulwongse, R. Goel, M. 
Hossain, D. H. Kim, Y.-J. Lee, D. L. Lewis, T.-W. 
Lin, C. Liu, M. Jung, B. Ouellette, M. Pathak, H. 
Sane, G. Shen, D. H. Woo, X. Zhao, G. H. Loh, H.-
H. S. Lee, and S. K. Lim, “Design and analysis of 
3D-MAPS: A many-core 3D processor with stacked 
memory,” in Custom Integrated Circuits Conference 
(CICC), 2010 IEEE, 2010, pp. 1–4. 
[11] D. H. Woo, N. H. Seong, and H.-H. S. Lee, 
“Heterogeneous die stacking of SRAM row cache 
and 3-D DRAM: An empirical design evaluation,” in 
Circuits and Systems (MWSCAS), 2011 IEEE 54th 
International Midwest Symposium on, 2011, pp. 1–
4. 
[12] H. Sun, J. Liu, R. S. Anigundi, N. Zheng, J.-Q. Lu, 
K. Rose, and T. Zhang, “Design of 3D DRAM and 
Its Application in 3D Integrated Multi-Core 
Computing Systems,” Design & Test of Computers, 
IEEE, vol. 26, no. 5, pp. 36–47, 2009. 
[13] R. S. Patti, “Three-Dimensional Integrated Circuits 
and the Future of System-on-Chip Designs,” 
Proceedings of the IEEE, vol. 94, no. 6, pp. 1214–
1224, 2006. 
[14] M. H. Jabbar, D. Houzet, and O. Hammami, “3D 
multiprocessor with 3D NoC architecture based on 
Tezzaron technology,” in 3D Systems Integration 
Conference (3DIC), 2011 IEEE International, 2012, 
pp. 1–5. 
[15] O. Hammami, A. Mzah, M. H. Jabbar, and D. 
Houzet, “3D IC Implementation for MPSOC 
Architectures: Mesh and Butterfly Based NoC,” in 
Quality Electronic Design (ASQED), 2012 4th Asia 
Symposium on, 2012, pp. 169–173. 
[16] S. R. Vangal, J. Howard, G. Ruhl, S. Dighe, H. 
Wilson, J. Tschanz, D. Finan, A. Singh, T. Jacob, S. 
Jain, V. Erraguntla, C. Roberts, Y. Hoskote, N. 
Borkar, and S. Borkar, “An 80-Tile Sub-100-W 
TeraFLOPS Processor in 65-nm CMOS,” Solid-
State Circuits, IEEE Journal of, vol. 43, no. 1, pp. 
29–41, 2008. 
[17] P. Salihundam, S. Jain, T. Jacob, S. Kumar, V. 
Erraguntla, Y. Hoskote, S. Vangal, G. Ruhl, and N. 
Borkar, “A 2 Tb/s 6 x 4 Mesh Network for a Single-
Chip Cloud Computer With DVFS in 45 nm 
CMOS,” Solid-State Circuits, IEEE Journal of, vol. 
46, no. 4, pp. 757–766, 2011.  
 
