Early selection of system implementation choice among SoC, SoP and 3-D integration. by Weerasekera, Roshan et al.
Early Selection of System Implementation
Choice among SoC, SoP and 3-D Integration
Roshan Weerasekera, Li-Rong Zheng
ECS/ICT/KTH,
ELECTRUM 229,
164 40 Kista, Sweden.
Email: {roshan,lirong}@imit.kth.se
Dinesh Pamunuwa
Centre for Microsystems Engineering
Lancaster University





164 40 Kista, Sweden.
Email: hannu@imit.kth.se
Abstract—Recently there is a tendency for shifting the
planar SoC single-chip solutions to different alternative
options as tiled silicon and single-level embedded modules
as well as 3-D integration, and the designers confronted
with several system design options. To get a true improve-
ment in performance, a very careful analysis using detailed
models at different hierarchical levels is crucial. In this
work, we present a cohesive analysis of the technological,
cost and performance trade-offs for implementing digital
and mixed-mode systems considering the choices between
2-D and 3-D integration and their ramifications.
I. INTRODUCTION
As consumer demand for products that keep getting
smaller, lighter and offer more functionality and perfor-
mance for less power continues unabated, experimen-
tal electronic system implementation technologies are
migrating towards 3-D solutions [1]. However, even as
designers are presented with an extra spatial dimension,
the complexity of the layout and the architectural trade-
offs also increase. To get a true improvement in perfor-
mance, a very careful analysis using detailed models at
different hierarchical levels is crucial. Even though several
previous works have addressed this issue [2][3][4], they
mostly concentrate on isolated model development, or
target some specific type of system. In this work, we
collate existing models from the literature, and modify
them and also derive new models as necessary. The
main contribution of this paper is in developing a generic
methodology for performance and cost estimations of
3D systems that can be modified for different applica-
tions, and a comprehensive set of estimation models as
building blocks. We also use this methodology to provide
detailed estimates for two applications that showcase the
potential benefits of 3D integration.
Previous works that addressed cost and performance
trade-offs include [2] and [3], where Liu et. al. discuss the
mapping from 2-D to 3-D under the constraints of perfor-
mance, cost and temperature. However, they omit many
3-D technological details. The authors of [4] describe a
yield and cost model for 3-D stacked chips with particular
emphasis on how the yield is affected by the number of
through-hold vias.
3-D integration techniques can be basically catego-
rized into two major schemes: Folding and Stacking. In
folding, a planar assembly with flexible substrate is folded
into several layers in order to form a very compact shape.
In this approach the interconnect length is longer than
in the stacked approach described below, but a very
compact size can be achieved. Stacking can be done
at the chip level with either chip-to-chip (C2C), Package-
on-Package (PoP) or MCM-to-MCM bonding using epoxy
or glues and creating electrical connections by wire-
bonding techniques. As an alternative to chip stacking,
3-D integration can be performed at the wafer-level too.
Different blocks can be processed on separate wafers,
and they can be interconnected vertically using through-
hole vias (THV) or through-Si vias (TSV) to form global
communication links. Wafer-Level integration (WLI) can
be performed in two ways; entire wafers can be bonded
together before dicing (an approach herein after termed
3D-W2W) or KGDs are bonded on top of a host wafer
containing other KGD sites termed (3D-D2W) [5].
In this analysis, we concentrate on stacking method-
ologies and compare between 3D-SiP, 3D-D2W and 3D-
W2W technologies.
The rest of the paper is organized as follows; first,
we present our methodology for cost and performance
estimation, including all models; and then in Section III,
we discuss the cost and performance issues for two
different applications in detail. We end with a discussion
and our conclusions.
II. COST AND PERFORMANCE ESTIMATION MODELS
A. Yield and Cost Analysis
The yield of a bare silicon die, Ydie, depends on
electrical defects on each mask layer in the fabrication
process and the total area of the chip. As given by [6], a







where D0 is the average electrical defect density, S is
the shape factor of (what is assumed to be) the Gamma
187
Authorized licensed use limited to: Lancaster University Library. Downloaded on November 30, 2009 at 09:19 from IEEE Xplore.  Restrictions apply. 
distribution of electrical defect density, N is the number
of mask layers, and A is the chip area. If not provided by
the IP vendor the area of a digital module implemented
in some target technology can be estimated in a straight-
forward manner, using gate information and technology
scaling. However, the area of an analog chip depends
not only on the number of transistors and their sizes (in
practice, minimum size transistors are not used in analog
circuits), but also the circuit architecture.
The core area (Acore) occupying the transistors and
their interconnects can either be interconnect-capacity








where Ng is the number of total number of gates, Ag
is the gate area (Ag), and dg is the gate dimension.




Rm is the average interconnect length, which can be
determined from Donath’s model [7]. When it comes to
packaging the core, the number of I/Os to be connected
to the outside must be arranged around the periphery
and may require a larger perimeter than dictated by the
core area in order to facilitate their placement according












where Pp is the peripheral in-line pad pitch and Np is the
total number of IO pads.
When Ndie is the number of dies that can be processed





The yield after each testing process depends on the
fault coverage level (Fc) of the testing process, and is
Y
(1−Fc)
d . The cumulative cost per die at the end of each





where C1,i−1 is the accumulated cost of all the steps up
to but not including the present step and Ci, is the cost
of the present step.
The package type is assumed to be a peripheral I/O
single chip plastic package and its cost is calculated
using a price vs pin count assumption as in [9].
B. Interconnect Performance Models
1) On-Chip Wire Delay: Typically the delay over a
global on-chip wire is RC dominated. The delay with a
capacitive load,CL, connected at the far-end constitutes
the driver delay and the distributed wire delay:





(a) Intra chip wire model
LbndLbnd
Cpad Cbnd
Cbnd + Cpad CL
TL
Driver
(b) Inter chip wire model
Fig. 1. Delay models for Intra and Inter Chip Interconnections
Parameter On-Chip Off-Chip
Physical
W (nm) 290 15










ON-CHIP AND OFF-CHIP WIRE PARAMETERS
where Cd is the driver drain-diffusion capacitance. There-
fore the propagation delay on the on-chip wire, as shown
in Figure 1(a), is the sum of cascaded buffer delay (tdrv)
and the Elmore delay of the RC wire: tintra = tdrv + trc.
2) Off-Chip Wire Delay: For the inter-chip commu-
nication link shown in Figure 1(b) the following delay
expression can be derived[10]:












Z0(Cd + Cpad + Cbnd + 0.5CL) +
Lbnd
Z0
+rwL(Cpad + Cbnd + CL)] + 0.4rwcwL2, (9)
where Cpad is capacitance of the pad, and Cbnd and Lbnd
are the capacitance and the inductance of the bond wire.
Finally, the total delay for the inter-chip communication
link is the summation of cascaded driver delay (tdrv) and
the RLC-wire delay (tRLC): tinter = tdrv + tRLC .
III. TRADEOFF ANALYSIS FOR THE CASE STUDIES
To make the comparison, we begin by selecting two
mixed-signal systems. first system is a Wireless Sensor,
which contain a 2Mb DRAM, and an ASIC and Micro-
processor with gate count of 500k and 300k respectively.
It also contains an Analog/RF block occupying an area
of 2 mm2. Finally , it contains a MEMS sensor with
an area of 1mm2. The second system is a 3G mobile
terminal. We consider a similar architecture as the first
188
Authorized licensed use limited to: Lancaster University Library. Downloaded on November 30, 2009 at 09:19 from IEEE Xplore.  Restrictions apply. 
one but with a larger memory of 128 Mb DRAM, and
a CMOS image sensor with a pixel size of 1.75 µm ×
1.75 µm, and resolution of 8 Megapixel [11]. Further, in
the analysis, we consider the ASIC and Microprocessor
together as a single logic block. For all the integration
schemes, the underlying manufacturing process is a 65
nm, 11-metal, CMOS process with a wafer diameter of
300 mm and a lower-level wire pitch of 136 nm. We
also assume peripheral in-line pad arrangement and wire
bond packaging. The worst-case delay for 2-D systems is
estimated diagonally from chip edge to chip edge, while
it is estimated from one edge of the bottom chip to the
opposite side edge of the top most chip for 3-D systems.
Based on the manufacturers data, the power density
for the constituent sub-modules in our case studies can
be estimated. The power density for a DRAM is esti-
mated to be 0.02W/mm2 [12], and for a logic block,
0.12W/mm2[13]. A CMOS Image sensor has an average
power density of 0.016W/mm2. The power dissipation of
the MEMS sensor is assumed to be 50mW , and that for
the Analog/RF block, 500mW .
For the stacked arrangement, we assume that the
logic block is close to the heat sink and other blocks
are in the following order: DRAM, Analog/RF block, and
MEMS/CMOS Image sensor.
A. Monolithic SoC
The integration mixed signal systems in a single die
is a merging of several technologies, such as logic,
memory, analog/RF, and this results in increased process
complexity and a area change. For example merging logic
circuits with memory results in a lower circuit density
and hence a larger circuit area, than their logic-only or
memory-only counter parts. The area of a single chip
implementation is estimated as stipulated in [2]. The total
cost for an SoC implementation is given in (38).
B. 2D-SoP
In the 2D-SoP implementation, we assume that four
chips (DRAM, RF, Logic and MEMS/Image Sensor) are
assembled as a multi chip module (MCM). Hence, the
cost of implementing the MCM includes the total cost
for each chip including the testing, the assembly cost,
the substrate cost, the rework cost, and finally the MCM
test cost and packaging cost. The SoP can provide some
reworking capability whereas SoC and wafer-level 3-D
integration do not. If one rework cycle is assumed for SoP,
the yield in assembly is improved from Ya to (2− Ya)Ya.












A 3D-SiP implementation is similar to the SoP package
integration, except that the SiP implementation integrate
dies on top of each other vertically. The cost formula
is the same, but the MCM substrate area is reduced,
compared to the 2D-SoP implementation.
D. 3D-WLP
The yield of each 3D-implementation method is the






where Y2D is the yield of 2D process (fabrication yield),
and Ya is the yield loss due to the 3D-assembling pro-
cess. In the case of D2W stacking, die yield after the
KGD testing should be considered. So, the overall yield
for implementing our target system in 3D-W2W and 3D-
D2W methods are as follows [14]:
Y3D w2w = YdramYlogicYrfYotherY 3a (44)











The total cost for 3-D Wafer-Level integration are given
in equations (40) and (41). Due to the limitations in the
wafer level processing, there is no possibility of rework-
ing.
IV. DISCUSSION
Results for our case studies are shown in Table II. It is
quite obvious that 3-D integration provides very compact
designs compared to its 2-D planar counterpart. Except
for the 3D-SiP method, 3D-WLI has lower interconnect
delays over the 2-D implementations. 3D-SiP and 2D-
SoP implementations are more or less equal in imple-
mentation cost, but 3D-SiP has a lower interconnect
delay. Where the wireless sensor node is concerned, the
SoC solution is the better choice, while wafer-level 3D
integration provides lower area and higher performance.
A SoC solution is the best option for such low memory ap-
plications because it is less expensive. However, though
quite expensive, for high performance systems, 3D-WLI
is the best choice.
The scenario is different when it comes to a mobile
terminal. In this case, the overall chip area is 4.25 times
larger than that of the wireless sensor node. 3D-WLI
technologies outperform SiP implementation technolo-
gies, due to the very long RC wires. Also, single chip
solution has a very low yield. All the other implementa-
tions methods show a lower cost than the single chip
implementation. 3D-SiP seems to be the best design
choice for low-cost, and high performance in this case.
The case studies show that when the system size
becomes very small the thermal resistance becomes high
the temperature rise in the top-most chip is unbearable.
Hence, extra cooling solutions such as thermal-vias oc-
cupying some area, or very thin layers have to be used
in the system implementation.
189


















































Case Wireless Sensor Node 3G Mobile Terminal
Parameter Single Chip 2D-SoP 3D-SiP 3D-W2W 3D-D2W Single Chip 2D-SoP 3D-SiP 3D-W2W 3D-D2W
Normalized Area 1.00 3.92 0.78 0.71 0.71 1.00 1.94 0.75 0.71 0.71
Yieldoverall 0.95 0.98 0.98 0.92 0.94 0.56 0.98 0.98 0.71 0.94
Normalized Cost 1.00 4.11 4.04 1.14 2.96 1.00 0.40 0.40 0.38 0.33
Delay (ps) 127.37 176.36 148.33 83.9 83.9 317.88 205.37 168.34 259.63 259.63
∆T (oC) 39.16 12.39 52.8 312.74 312.74 26.38 14.67 36.9 73.96 73.96
TABLE II
RESULTS OF COST PERFORMANCE ANALYSIS FOR CASE-STUDIES. NOTE THAT ∆T = Ttop layer − Tambient .
V. CONCLUSION
In this paper, we developed a detailed yield and
quantitative cost models, and a quantitative performance
metric for 3-D integration. Further, we derived simple yet
useful thermal models for 2-D and 3-D integrated circuits.
The overall methodology is suitable for early analysis
in system explorations for future nanoscale electronic
systems. Through some example contemporary mixed
signal systems we demonstrate the methodology outlined
for different implementations and conclude that the imple-
mentation strategy must be carefully selected depending
on the circuit complexity, as else the move to 3-D may
have a detrimental effect. Design choice early in the
design cycle will have a significant impact throughout the
design and production lifecycles , and the models and
methodology presented in this article can be an important
aid in this choice.
REFERENCES
[1] The International Technology Roadmap for Semiconduc-
tors(ITRS), 2005. [Online]. Available: http://www.itrs.net
[2] M. Shen, L.-R. Zheng, and H. Tenhunen, “Cost and performance
analysis for mixed-signal system implementation: System-on-
chip or system-on-package,” Electronics Packaging Manufacturing,
IEEE Journal of, vol. 25, no. 4, pp. 262–272, October 2002.
[3] C. Liu, J.-H. Chen, R. Manohar, and S. Tiwari, “Mapping system-
on-chip designs from 2-d to 3-d ics,” in Circuits and Systems, 2005.
ISCAS 2005. IEEE International Symposium on, 2005, pp. 2939–
2942 Vol. 3.
[4] P. Mercier, S. Singh, K. Iniewski, B. Moore, and P. O’Shea, “Yield
and cost modeling for 3d chip stack technologies,” in Conference
2006, IEEE Custom Integrated Circuits, September 2006, pp. 357–
360.
[5] T. Fukushima, Y. Yamada, H. Kikuchi, and M. Koyanagi,
“New three-dimensional integration technology using chip-to-wafer
bonding to achieve ultimate super-chip integration,” Japanese
Journal of Applied Physics, vol. 45, no. 4B, pp. 3030–3035, 2006.
[6] A. George, J. Krusius, and R. Granitz, “Packaging alternatives
to large silicon chips: tiled silicon on mcm and pwb substrates,”
Components, Packaging, and Manufacturing Technology, Part B:
Advanced Packaging, IEEE Transactions on [see also Compo-
nents, Hybrids, and Manufacturing Technology, IEEE Transactions
on], vol. 19, no. 4, pp. 699–708, 1996.
[7] W. Donath, “Placement and average interconnection lengths of
computer logic,” Circuits and Systems, IEEE Transactions on,
vol. 26, no. 4, pp. 272–277, 1979.
[8] P. A. Sandborn and H. Moreno, Conceptual Design of Multichip
Modules and Systems. Kluwer Academic Publishers, 1994.
[9] D. Ragan, P. Sandborn, and P. Stoaks, “A detailed cost model
for concurrent use with hardware/software co-design,” in Design
Automation Conference, 2002. Proceedings of IEEE/ACM, 2002,
pp. 269–274.
[10] G. Sai-Halasz, “Performance trends in high-end processors,” Pro-
ceedings of the IEEE, vol. 83, no. 1, pp. 20–36, 1995.
[11] Micron CMOS Image Sensor Part Catalog, March 2007. [Online].
Available: http://www.micron.com
[12] Micron 128MB SDRAM Part Catalog, 2007. [Online]. Available:
http://www.micron.com
[13] ARM Cortex-A8 Processor Product Brief, March 2007. [Online].
Available: http://www.arm.com
[14] Y. Deng and W. P. Maly, “2.5-dimensional vlsi system integration,”
very large scale integration (VLSI) systems , IEEE Transactions
of, vol. 13, no. 6, pp. 668–677, June 2005.
190
Authorized licensed use limited to: Lancaster University Library. Downloaded on November 30, 2009 at 09:19 from IEEE Xplore.  Restrictions apply. 
