TSV placement optimization for liquid cooled 3D-ICs with emerging NVMs by Mohanram, Sundararaman




TSV placement optimization for liquid cooled 3D-
ICs with emerging NVMs
Sundararaman Mohanram
Follow this and additional works at: http://scholarworks.rit.edu/theses
This Thesis is brought to you for free and open access by the Thesis/Dissertation Collections at RIT Scholar Works. It has been accepted for inclusion
in Theses by an authorized administrator of RIT Scholar Works. For more information, please contact ritscholarworks@rit.edu.
Recommended Citation
Mohanram, Sundararaman, "TSV placement optimization for liquid cooled 3D-ICs with emerging NVMs" (2013). Thesis. Rochester
Institute of Technology. Accessed from
TSV Placement Optimization for Liquid Cooled
3D-ICs with Emerging NVMs
by
Sundararaman Mohanram





Department of Computer Engineering
Kate Gleason College of Engineering




Dr. Dhireesha Kudithipudi, Associate Professor
Thesis Advisor, Department of Computer Engineering
Dr. Satish Kandlikar, Professor
Committee Member, Department of Mechanical Engineering
Dr. Sonia Lopez Alarcon, Assistant Professor
Committee Member, Department of Computer Engineering
Thesis Release Permission Form
Rochester Institute of Technology
Kate Gleason College of Engineering
Title:
TSV Placement Optimization for Liquid Cooled 3D-ICs with Emerging NVMs
I, Sundararaman Mohanram, hereby grant permission to the Wallace Memorial Library





To my family, for their constant love and support
iv
Acknowledgments
I would like to thank my thesis advisor Dr. Dhireesha Kudithipudi for her guidance,
encouragement and support during the course of my thesis work. I am grateful to Dr.
Satish Kandlikar and Dr. Sonia Lopez Alarcon for taking time out of their busy schedules
to serve as committee members. Finally, I would also like to thank my colleagues David
Brenner, Cory Merkel and Ganesh Khedkar from Nanocomputing Research Lab for their
help and support during my thesis work.
v
Abstract
Three dimensional integrated circuits (3D-ICs) are a promising solution to the performance
bottleneck in planar integrated circuits. One of the salient features of 3D-ICs is their ability
to integrate heterogeneous technologies such as emerging non-volatile memories (NVMs)
in a single chip. However, thermal management in 3D-ICs is a significant challenge, ow-
ing to the high heat flux (∼ 250 W/cm2). Several research groups have focused either on
run-time or design-time mechanisms to reduce the heat flux and did not consider 3D-ICs
with heterogeneous stacks. The goal of this work is to achieve a balanced thermal gradient
in 3D-ICs, while reducing the peak temperatures. In this research, placement algorithms
for design-time optimization and choice of appropriate cooling mechanisms for run-time
modulation of temperature are proposed. Specifically, an architectural framework which in-
troduce weight-based simulated annealing (WSA) algorithm for thermal-aware placement
of through silicon vias (TSVs) with inter-tier liquid cooling is proposed for design-time. In
addition, integrating a dedicated stack of emerging NVMs such as RRAM, PCRAM and
STTRAM, a run-time simulation framework is developed to analyze the thermal and perfor-
mance impact of these NVMs in 3D-MPSoCs with inter-tier liquid cooling. Experimental
results of WSA algorithm implemented on MCNC91 and GSRC benchmarks demonstrate
up to 11 K reduction in the average temperature across the 3D-IC chip. In addition, power
density arrangement in WSA improved the uniformity by 5%. Furthermore, simulation
results of PARSEC benchmarks with NVM L2 cache demonstrates a temperature reduction
of 12.5 K (RRAM) compared to SRAM in 3D-ICs. Especially, RRAM has proved to be
thermally efficient replacement for SRAM with 34% lower energy delay product (EDP)
and 9.7 K average temperature reduction.
vi
Nomenclature
1T1R One transistor one resistor
3D-IC Three dimensional integrated circuit
3D-MPSoC Three dimensional multiple processor system on-chip
CMOS Complementary metal oxide semiconductor
CPD Cumulative power distribution
CTTM Compact transient thermal model
DA Density arrangement
DVFS Dynamic voltage frequency scaling
EDP Energy delay product
FP Footprint
HPD Highest power density
HPWL Half perimeter wire length
MLC Multi-level cell
NVM Non-volatile memory
PCRAM Phase change random access memory
PDA Power density arrangement
RRAM Resistive random access memory
vii
SA Simulated annealing
SRAM Static random access memory
STTRAM Spin-transfer torque random access memory
TSV Through silicon vias
WSA Weight-based simulated annealing
viii
Contents
Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
1 Introduction and Background . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Three Dimensional Integrated Circuits . . . . . . . . . . . . . . . . . . . . 1
1.2 Design-Time TSV Placement . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.1 Through Silicon Vias (TSVs) . . . . . . . . . . . . . . . . . . . . . 4
1.2.2 Meta-Heuristic Algorithms . . . . . . . . . . . . . . . . . . . . . . 6
1.2.3 Thermal-Aware TSV Placement . . . . . . . . . . . . . . . . . . . 7
1.3 Liquid Cooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Emerging Non-Volatile Memories . . . . . . . . . . . . . . . . . . . . . . 9
1.5 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2 Related Work and Contributions . . . . . . . . . . . . . . . . . . . . . . . 14
2.1 Thermal-aware Placement of TSVs . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Liquid Cooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Emerging Non-Volatile Memories . . . . . . . . . . . . . . . . . . . . . . 17
2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3 Weight-Based Simulated Annealing (WSA) . . . . . . . . . . . . . . . . . . 19
3.1 Description of WSA Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Weight-Based TSV Planning . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3 Cost Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.4 Power Density Arrangement (PDA) . . . . . . . . . . . . . . . . . . . . . 24
3.5 Interconnect Length (Lwire) . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.6 TSV Rearrangement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
ix
4 Design-Time and Run-time Simulation of 3D-ICs . . . . . . . . . . . . . . 28
4.1 Design-time Simulation of Thermal-aware Placement of TSVs . . . . . . . 28
4.1.1 Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.1.2 Simulation Methodology . . . . . . . . . . . . . . . . . . . . . . . 30
4.1.3 MCNC’91 and GSRC Benchmarks . . . . . . . . . . . . . . . . . 32
4.2 Investigation of Thermal Performance of NVMs in 3D-MPSoCs . . . . . . 33
4.2.1 Characteristics of NVMs . . . . . . . . . . . . . . . . . . . . . . . 33
4.2.2 Architecture of NVMs . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2.3 Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2.4 Simulation Methodology . . . . . . . . . . . . . . . . . . . . . . . 40
4.2.5 PARSEC Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . 42
4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5 Result and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.1 Design-Time Optimization of TSV Placement . . . . . . . . . . . . . . . . 43
5.1.1 Effect of Different Parameters of WSA on Area, Interconnect length
and Temperature . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.2 Investigation of Emerging NVMs . . . . . . . . . . . . . . . . . . . . . . . 49
5.2.1 CASE STUDY: Ferret and Blackscholes . . . . . . . . . . . . . . . 54
5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6 Conclusions and Future work . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
x
List of Tables
4.1 Floorplan and Thermal analysis parameters used in the simulation of 3D-ICs 30
4.2 MCNC’91 and GSRC benchmark attributes used in the simulation of 3D-ICs 33
4.3 Physical Characteristics of NVMs . . . . . . . . . . . . . . . . . . . . . . 34
4.4 Cell size of SRAM and 1T1R NVM cell . . . . . . . . . . . . . . . . . . . 36
4.5 Architectural description of 3D MPSoC used for the investigation of emerg-
ing NVMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.6 Parameters used in thermal simulation of emerging NVMs in 3D-MPSoC . 40
4.7 PARSEC benchmarks used for the investigation of emerging NVMs in 3D-
MPSoCs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
xi
List of Figures
1.1 A 3-tier liquid cooled 3D-IC where inter-tier connections are achieved us-
ing TSVs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 A 3-tier 3D-IC where inter-tier networks are connected using TSVs . . . . 5
1.3 A 2-tier liquid cooled 3D-IC implemented using micro-channels, a pump
and a heat exchanger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 A metal-insulator-metal RRAM cell structure. The switching occurs by the
drift of oxygen vacancies between high and low oxygen vacancy concen-
tration regions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.5 A chalcogenide PCRAM cell. The switching between crystalline and amor-
phous state occurs due to application of heat by the heater. . . . . . . . . . 11
1.6 A magnetic tunnel junction of STTRAM cell. The switching achieved by
passing high current in the free layer. . . . . . . . . . . . . . . . . . . . . . 11
2.1 Design-flow of TSV placement used to reduce the peak temperature in 3D-
ICs adapted from [18] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1 Flowchart of Weight Simulated Annealing (WSA) used for the TSV place-
ment in 3D-ICs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2 Flowchart of weight-based planning in WSA algorithm . . . . . . . . . . . 22
3.3 Interconnect Length Calculation using (a) HPWL (b) TSV height and TSV
to pins (TSV-PH) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.1 Design-time TSV placement framework . . . . . . . . . . . . . . . . . . . 29
4.2 Compact Transient Thermal Model (CTTM) (a) solid thermal cell (b) liquid
thermal cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.3 Stack configuration of 3-tier 3D-IC used in TSV placement . . . . . . . . . 32
4.4 Generic 1T1R Architecture Representation . . . . . . . . . . . . . . . . . . 36
4.5 Generic NVM L2 Cache Organization adapted from [25] . . . . . . . . . . 37
4.6 PCRAM SET and RESET operation adapted from [25] . . . . . . . . . . . 38
4.7 Design-flow of Run time simulation framework used for the investigation
of emerging NVMs in 3D-MPSoCs . . . . . . . . . . . . . . . . . . . . . 39
xii
4.8 Stack configuration of 3D-IC used for the investigation of thermal impact
of emerging NVMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.1 Comparison of area of 3D-ICs obtained using SA and WSA algorithm in
the TSV placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.2 Comparison of interconnect length measurement in 3D-ICs using HPWL
and TSV-PH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.3 Comparison of interconnect length of 3D-ICs before and after TSV rear-
rangement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.4 Comparison of average temperature of 3D-ICs with and without Liquid
Cooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.5 Temperature profile of 3D-ICs with ami33 benchmark comparing (a) with-
out liquid cooling (b) with liquid cooling (c) PDA with liquid cooling . . . 47
5.6 Effect of different parameters of WSA algorithm on area, interconnect
length and temperature of 3D-ICs . . . . . . . . . . . . . . . . . . . . . . 48
5.7 EDP of L2 cache (normalized to SRAM EDP) simulating different PAR-
SEC benchmarks for 4-tier 3D-MPSoCs . . . . . . . . . . . . . . . . . . . 49
5.8 Average temperature of SRAM and NVMs L2 cache in 4-tier 3D-MPSoCs
for different PARSEC benchmarks . . . . . . . . . . . . . . . . . . . . . . 50
5.9 Average temperature of 3D-MPSoCs with air-cooling for different PAR-
SEC benchmarks (a) 2-tier (b) 4-tier . . . . . . . . . . . . . . . . . . . . . 51
5.10 Average temperature of 4-tier 3D-MPSoC with liquid-cooling for different
PARSEC benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.11 Average temperature of 3D-MPSoC with different cache size (based on
NVM density) simulating different parsec benchmarks (a) EDP (b) overall
average temperature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.12 Number of Read, Write and Miss in 4-tier 3D-MPSoC (a) Ferret (b) Blacksc-
holes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.13 Average temperature of L2 cache in 4-tier 3D-MPSoC for ferret and blacksc-
holes benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.14 Temperature profile of 3D-MPSoC simulating blackscholes benchmark (a)
SRAM L2 cache (b) RRAM L2 cache (c) PCRAM L2 cache (d) STTRAM
L2 cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.15 Temperature profile of 3D-MPSoC simulating ferret benchmark (a) SRAM




In the past few decades, the demand for high performance and improved functionality in
planar integrated circuits is achieved through scaling the transistor sizes. As the transistors
scaled down, the interconnects become a limiting factor to the performance of the planar
integrated circuits. Three dimensional integrated circuits (3D-ICs) is emerging as a promis-
ing solution to the performance bottleneck by providing higher bandwidth through shorter
interconnect length. In addition, 3D-ICs offer improved functionality, reduced power con-
sumption and reduced form factor compared to the planar integrated circuits.
1.1 Three Dimensional Integrated Circuits
3D-ICs stack multiple dies vertically as shown in Figure 1.1. Each die consists of an active
layer placed on the silicon substrate and a metal layer for interconnection within the die.
The dies are bonded using dielectric glues and interconnected using through silicon vias
(TSVs). TSVs use micro-bumps for interconnection in the bonding layer between the dies.
Figure 1.1 shows 3D-IC with liquid cooling mechanism integrated between the tiers. A
fluid pipe runs through the stack 1 vertically to supply coolant between the tiers. All the
dies are vertically stacked on the package substrate.
Vertical stacked 3D-ICs offer number of advantages compared to planar integrated cir-
1Stack and tiers are used interchangeably throughout this document. Stack denotes the vertical arrangement of dies in 3D-ICs. Tiers

















Figure 1.1: A 3-tier liquid cooled 3D-IC where inter-tier connections are achieved using TSVs
cuits. 3D-ICs provide higher performance and reduced power consumption by shortening
the interconnect length between the functional units. In addition, 3D-ICs supports het-
erogeneous integration of different technologies such as emerging non-volatile memories
(NVMs), RF, MEMS, analog and optical systems [64] in a single chip. 3D-ICs have high
package density and small form factor with reduced footprint and weight [67]. Moreover,
3D-ICs reduce the cost for large designs (more than 100 M gates) compared to planar inte-
grated circuits [24]. For large designs, the reduction in the cost of metal layers overcomes
the cost due to increased die area of 3D-ICs.
Despite these advantages, one of the major problem in 3D-ICs is the high chip temper-
atures [54], due to increased power density (∼ 250 W/cm2) per unit surface area of the
stack. The high power density is caused by vertical stacking of active devices and poor
thermal conductivity of the bonding dielectric used between the tiers of 3D-ICs. For in-
stance, the thermal conductivity of epoxy used as dielectric between the tiers is very low
(1.7 W/mK) compared to the thermal conductivity of copper (400 W/mK) and silicon




power density of corresponding 2D die (where N is the number of stacks and the dies are
assumed to be homogeneous with equal power density) [32]. The power density can be
further exacerbated with heterogeneous integration.
The high temperature effects the performance, power, reliability and life span of 3D-ICs
[19]. At high temperature, the speed of the transistors is reduced due to the degradation
in mobility of the carriers. The performance of clock buffers degrades ( 1.2%-1.32% for
every 10 K increase [64]) and the resistivity of the metal interconnects increases with tem-
perature. The leakage power of 3D-ICs also varies linearly with temperature. For instance,
every 30 K increase in temperature increases the leakage power by 30% [62]. Moreover,
the reliability of 3D-ICs exponentially depends on temperature. For example, the mean
time to failure is reduced by a factor of 10 with every 30 K rise in temperature [63]. It is
also estimated that 10%-15% increase of temperature causes 50% reduction in the life span
of the device. Furthermore, hotspots can permanently damage the 3D-ICs.
Therefore, thermal management plays a pivotal role in controlling the temperature of
3D-ICs. Design-time and run-time thermal management techniques are used for mitigat-
ing the high heat dissipation in 3D-ICs [57]. Design-time thermal management aims at
achieving a thermal-aware 3D-IC floorplan design using methods such as floorplanning,
TSV and thermal-via placement [16, 17]. Run-time thermal management involves con-
tinuous monitoring and controlling of the temperature during the run-time. Methods such
as task scheduling, task migration and dynamic voltage frequency scaling (DVFS) and,
cooling mechanisms such as air-cooling and liquid cooling are used for run-time thermal
management [54]. In addition, the temperature can be reduced in 3D-ICs by exploiting
the heterogeneous integration and inherent capability of emerging non-volatile memories
(NVMs). Emerging NVMs such as such as resistive random access memory (RRAM),
phase change random access memory (PCRAM) and spin-transfer torque random access
memory (STTRAM) which have low static power can be integrated with 3D-IC to reduce
4
the overall power density and thereby the associated thermal gradient. In this work, the fo-
cus is on using design-time thermal management techniques for controlling the temperature
in 3D-ICs. A detailed overview of these techniques are discussed below.
1.2 Design-Time TSV Placement
The design-time TSV placement utilizes the heat dissipating capability of TSVs to reduce
the temperature in 3D-IC stack. TSVs with their vertical interconnection and high thermal
conductivity are capable of dissipating heat from the tiers of 3D-ICs. The characteristics,
types and thermal-aware placement of TSVs are comprehensively discussed in this section.
1.2.1 Through Silicon Vias (TSVs)
TSVs are vertical interconnects placed between the tiers in 3D-ICs. A 3-tier 3D-IC where
inter-tier networks are connected using TSVs is shown in Figure 1.2. Polysilicon, copper
and tungsten are most commonly used TSV materials [56]. As shown in Figure 1.2, dielec-
tric oxide is used for the isolation between TSV and silicon substrate. The micro-bumps
are used to connect TSVs in the bonding layer between the tiers.
Based on the purpose, TSVs are classified into power, signal and thermal vias. TSVs
used for the power distribution in the 3D-IC stack are called power TSVs, while TSVs used
for inter-tier networks are called signal TSVs. The size of power TSVs are larger than the
signal TSVs to reduce the voltage drop and meet the current density requirements [33]. For
instance, Lee et al. [37] used the 40 µm and 10 µm size for the power and signal TSVs in
a 45 nm technology. In addition, TSVs used specifically for the heat dissipation are called
thermal vias. The size of the thermal vias depends on the implementation and maximum
temperature of the 3D-ICs. It is estimated that thermal vias consume 10% - 20% of the chip
























Figure 1.2: A 3-tier 3D-IC where inter-tier networks are connected using TSVs
the fabrication, TSVs are classified using via-first and via-last technologies. In via-first ap-
proach, TSVs are fabricated before/during the bonding of dies. While in via-last approach,
TSVs are fabricated after the bonding of dies. The size of the TSVs varies between the
via-first and via-last technology, due to the aspect ratio requirement corresponding to the
thickness of the wafer used in these technologies [71]. For instance, the dimension of the
TSVs range from 1 µm to 90 µm in via-first to via-last technologies respectively [71].
These TSVs occupy a large area between the functional units in 3D-ICs. For instance,
90000 signal TSVs of 5 µm dimension occupy an area equal to one million gates of 1.5
µm [34]. In addition, TSVs induce stress in the surrounding regions due to the difference
in the co-efficient of thermal expansion between TSVs and the silicon substrate. Hence,
TSVs require a minimum distance (pitch size) from other TSVs and functional units. The
pitch of the TSVs ranges from 10 µm to 200 µm [65] in via-first to via-last technologies
6
respectively. Furthermore, the cost of fabrication of these TSVs are very high using current
technologies [42]. Hence, the placement of the TSVs are extremely important in reducing
the temperature of 3D-ICs. Meta-heuristic algorithms offer good floorplanning solutions
for the placement of TSVs.
1.2.2 Meta-Heuristic Algorithms
Meta-heuristic algorithms use an iterative process to optimize a combinational problem and
improve the solution through successive iterations. In practice, the combinational problems
such as arrangement, grouping, ordering or selection [51] have a very large solution space.
An exhaustive search to find the most optimal solution to these problems requires extremely
large amount of time and is sometimes infeasible [55]. Hence, these meta-heuristic algo-
rithms use randomness in the search process to find the optimal or near optimal solution for
the combinational problems. Meta-heuristic algorithms are not problem specific and con-
tain mechanism to escape from the local optimum to settle at the global optimum. Some of
the most commonly used meta-heuristic algorithms are simulated annealing, genetic algo-
rithm, tabu search, ant colony optimization and iterated local search [55].
Simulated annealing (SA) is a simple meta-heuristic algorithm and widely used ap-
proach to find global optimum in a large solution space. SA algorithm uses annealing
process of metals, which assume low energy equilibrium when slowly cooled from high
temperatures [55]. The pseudo code for SA algorithm is shown in Algorithm 1 [2]. SA
algorithm consist of current (S), temporary (Stemp) and best (Sbest) solutions. The current
solution is updated at each iteration using random assignment. The temporary solution is
updated when it finds a lower cost or with a probability to escape from local minimum. The
best solution stores the lowest cost solution. SA algorithm iterates till the current solution
reaches the value of the threshold. When the current solution reaches the threshold, the
best solution gives the optimized result of the combinational problem.
7
SA algorithm has smaller simulation time and uses less computational resources com-
pared to other general optimization methods [1]. In addition, SA algorithm is capable of
scaling with the problem size [7]. However, the SA algorithm heuristic has to be tuned for
fast and efficient convergence to the global optimum [26].
The TSV placement is a large solution space problem in which the meta-heuristic algo-
Algorithm 1 pseudo code for Simulated Annealing algorithm
initialize S, Stemp, Sbest
while S > threshold do
S = random-assignment(Stemp)
if cost(Stemp) > cost(S) then
Stemp = S
else if probability (((cost(Stemp)− cost(S)) /T ) < random()) then
Stemp = S
end if




rithms such as SA is used to narrow down to global optimum solution. The TSV placement
is a multi-objective problem, where in addition to the temperature other parameters of the
chip such as area, interconnect length etc should be considered for optimization.
1.2.3 Thermal-Aware TSV Placement
Thermal-aware placement of TSVs uses meta-heuristic algorithm to place the TSVs be-
tween the functional units (macro-blocks) such that the entire arrangement results in re-
duced temperature. Meta-heuristic algorithm uses an iterative method for placing TSVs
near potentially high temperature regions (hotspots).
TSV placement begins with the placement of macro-blocks in the tiers of 3D-ICs. Af-
ter the placement of macro-blocks, TSVs are placed to form the inter-tier networks. The
meta-heuristic algorithm iterates through the placement of TSVs and evaluates the floor-
plan using a cost function. The cost function evaluates the TSV placement by considering
8
multiple objectives such as area, interconnect length, temperature etc. Meta-heuristic al-
gorithm iterates for a specified number of iterations and analyzes the temperature of the
floorplan. When the temperature of the floorplan reaches the threshold, the best solution in
the meta-heuristic algorithm gives the optimized TSV placed floorplan.
Although the thermal-aware placement of TSVs minimizes the temperature of 3D-ICs,
the number of TSVs is limited by its large area and cost of fabrication. Hence, inter-tier
liquid cooling is considered in design-time to achieve a better temperature reduction in 3D-
ICs.
1.3 Liquid Cooling
Liquid cooling is a cooling mechanism in which fluid coolant is passed through microchan-
nels / pin-fins, integrated between the tiers of 3D-ICs. Various cooling mechanisms have
been considered to alleviate heat generated in the 3D-IC stack. Cooling mechanisms such
as conventional heat sink and micro-channel cold plates are inadequate to dissipate large
heat from 3D-ICs [61]. While TSVs are limited in number due to their cost of fabrication
and large area, inter-tier liquid cooling is an efficient cooling mechanism capable of remov-
ing high heat dissipated in a 3D-IC stack [61].
A generic schematic of a liquid cooling system used for a 2-tier 3D-IC is shown in
Figure 1.3 [58]. The heat sinks are embedded between the tiers of the 3D-IC stack. The
fluid coolant moves through the heat sink and absorbs the heat from the tiers. The liquid
cooling system needs one or more fluid pump to control the pressure of the fluid, flowing
through the heat sink in the 3D-ICs. The heated coolant is cooled by passing through the
heat exchanger. The coolant is recirculated again through the heat sink by the fluid pump.
The amount of heat removed from the tiers depends on the incoming temperature of







Figure 1.3: A 2-tier liquid cooled 3D-IC implemented using micro-channels, a pump and a heat exchanger
coolant. By managing the above three parameters the heat flux as high as 3.9 kW/cm3 can
be extracted from the tiers of the 3D-IC stack [53]. However, sufficient amount of energy
is spent in the fluid pump and heat exchanger for controlling the pressure and temperature
of the coolant respectively. Hence, the liquid cooling mechanism is used with other tem-
perature reduction methods in order to maintain the energy efficiency of the 3D-ICs. The
temperature in 3D-ICs can also be reduced by exploiting the heterogeneous integration and
inherent nature of emerging NVMs which is discussed in detail in the next section.
1.4 Emerging Non-Volatile Memories
Heterogeneous integration in 3D-ICs offers integration of novel technologies such as emerg-
ing NVMs in a single chip. Emerging memory technologies such as RRAM, PCRAM and
STTRAM are being actively researched in the recent times [70, 59, 6]. Compared to the
10
SRAM, these emerging technologies are non-volatile, requiring a little/zero power to main-
tain the stored state. Due to the reduced static power consumption, these NVMs can be
used to reduce the power density and thereby, overall thermal gradient of the 3D-IC stack.
Semiconductor companies such as HP, Toshiba and Samsung are exploring the possibility
of replacing conventional SRAM on-chip memory with these emerging NVMs in the near
future [66].










Figure 1.4: A metal-insulator-metal RRAM cell structure. The switching occurs by the drift of oxygen
vacancies between high and low oxygen vacancy concentration regions.
switching in thin film memristors to store the data in high and low resistance states. The
schematic of an RRAM cell is shown in Figure 1.4. RRAM memory cell consist of oxide
layer sandwiched between two metal electrodes, forming a metal-insulator-metal (MIM)
structure. The oxide layer is divided in to high and low doping oxygen concentration re-
gions. The switching process is described by drift of oxygen vacancies in the oxide region.
Several metal oxides are used for the fabrication of RRAM such as hafnium oxide (HfOx),
titanium oxide (TiOx), copper oxide (CuOx), aluminium oxide (AlOx), nickel oxide (NiOx)
etc. [68].
PCRAM switches between crystalline and amorphous state in phase change material
















Figure 1.5: A chalcogenide PCRAM cell. The switching between crystalline and amorphous state occurs due
to application of heat by the heater.
MOSFET access device is shown in Figure 1.5. The phase change material used is a
chalcogenide glass (Ge2Sb2Te2) which is alloys of germanium, antimony and tellurium
[30]. The crystalline phase and amorphous phase shows low resistivity and high resistivity
respectively. A heater is placed below the PCM for applying the heat for switching. The
crystalline state is achieved by heating the PCM above crystallization temperature. While,
amorphous state achieved by heating and quenching the PCM.




Figure 1.6: A magnetic tunnel junction of STTRAM cell. The switching achieved by passing high current in
the free layer.
junction (MTJ). The schematic of STTRAM is shown in Figure 1.6. MTJ consists of two
ferro magnetic layers separated by a dielectric layer. The ferro magnetic layer which has
12
the fixed magnetization direction forms the hard layer. While, the ferro magnetic layer that
can change the direction of magnetization on passing sufficient current through the layer is
called free layer. The relative magnetization between the hard layer and free layer deter-
mines the resistance of the MTJ.
The above emerging NVMs offer high density compared to the conventional SRAM.
In addition, these NVMs have relatively greater thermal performance compared to con-
ventional SRAM. Furthermore, these NVMs have good compatibility in integration with
CMOS technology. Despite these advantages, these NVMs are characterized by high
read/write latencies compared to conventional SRAM. Hence, the performance impact of
these NVMs should be analyzed with their thermal impact in 3D-ICs.
1.5 Contributions
The goal of this thesis work is to achieve uniform temperature profile in 3D-ICs, while
reducing the peak temperatures. Specific contributions of this thesis work are
• A WSA algorithm is proposed to optimize the TSV placement for a balanced thermal
gradient in 3D-ICs with inter-tier liquid cooling.
• The performance and thermal impact of 3D-MPSoC with emerging NVMs are in-
vestigated and compared with SRAMs. In addition, the impact of liquid cooling in
reducing the temperature of 3D-IC at run-time is analyzed.
• Design-time and run-time thermal simulation frameworks are custom-developed in
this work for investigating different 3D-IC configurations.
The rest of the document is organized as follows: The related work of this thesis are dis-
cussed in Chapter 2. Chapter 3 describes the proposed WSA algorithm used for the TSV
placement. The architecture and characteristics of NVMs are discussed in Chapter 4. The
13
simulation framework and methodology used for the TSV placement and investigation of
thermal profile of NVMs are presented in Chapter 5. Simulation results and analysis are
presented in Chapter 6. The conclusions and future work are presented in chapter 7.
14
Chapter 2
Related Work and Contributions
Thermal management is a significant constraint in 3D-ICs owing to the high temperature of
the chip. Many research groups have focused on design-time placement optimizations and
integration of cooling solutions to reduce the temperature of 3D-ICs. In addition, emerging
NVMs can be integrated to the 3D-ICs to reduce the temperature of the stack. In this chap-
ter, the related work corresponding to these design-time techniques are discussed in detail.
2.1 Thermal-aware Placement of TSVs
Several groups have explored the design-time thermal management using TSV placement
in 3D-ICs. Different approaches such as partitioning-based methods, multi-level placement
were used with the meta-heuristic algorithms to place TSVs in 3D-ICs. A multi-objective
cost function was used in these meta-heuristic algorithms to optimize parameters such as
area, interconnect length, number of TSVs in addition to temperature.
Cong et al. [17] developed a multi-level TSV planning framework with integrated adap-
tive lumped resistive thermal model. The design flow of the multi-level TSV planning
framework is shown in Figure 2.1. The multi-level TSV planning framework is tested us-
ing MCNC’91 and GSRC benchmark suites. The macro-blocks from the benchmarks are
initially assigned to the tiers of 3D-ICs through floorplanning. The TSVs are formed from
the inter-tier networks and distributed between the macro-blocks. A maze search algorithm
is used for the TSV placement which iterates through TSV distribution, assignment and
15
adjustment between the macro-blocks. The peak temperature of 3D-IC is reduced to 85◦ C
with 80% lesser thermal vias and a trade off of 2% increase in the interconnect length.
Goplen et al [27] proposed an multi-level analytical and partition based TSV placement
MCNC and GSRC 
benchmarks










Figure 2.1: Design-flow of TSV placement used to reduce the peak temperature in 3D-ICs adapted from [18]
approach to explore the trade off between the interconnect length, via count and temper-
ature. The placement of TSVs is carried out in three stages: global placement, coarse
legalization and detailed legalization. The global placement is focused on initial placement
of TSVs and macro-blocks in 3D-ICs with an objective to minimize interconnect length,
via count and temperature. While coarse legalization is used for shifting, moving and swap-
ping of the TSVs and macro-blocks to improve the objective function. Detailed legalization
16
removes the overlap by placing the TSVs in the nearest free space that has minimal impact
on the objective function. The temperature of 3D-ICs is reduced by 20% with a trade off
of 1% increase in interconnect length and 10% higher TSVs.
The meta-heuristic algorithms used in the above mentioned research works contain ran-
domness in determining the position of TSVs. In addition, they use half perimeter wire
length (HPWL) which does not consider the height and position of the TSVs in calculating
interconnect length. Furthermore, as the TSVs and thermal vias are restricted due to large
area and fabrication cost, the above works do not consider any cooling mechanism such as
liquid cooling with the placement of TSVs.
In this work, a WSA algorithm that replace the randomness with a weight-constraint is
proposed. In addition, a new interconnect length calculation TSV-PH which considers the
height and position of TSVs is introduced in this work. A power density estimate is also
added to the cost function to achieve a balanced thermal gradient throughout the 3D-IC
stack. Furthermore, the effect of liquid cooling with TSV placement is analysed in this
work.
2.2 Liquid Cooling
Several research groups focused on implementation of liquid cooling in 3D-ICs to reduce
the peak temperatures. Lee et al [37] analysed the thermal effect of liquid cooling on 3D-
IC stack using ISPD 2006 benchmark suite. In addition, various physical, electrical and
thermo-mechanical requirements of power, signal and thermal TSVs are evaluated with
liquid cooling. They suggested that 2.5% of the routing area is occupied by power TSVs,
50% by micro-channel liquid cooling and the remaining 47.5% can be used for the place-
ment of TSVs. In addition, they reduced the peak temperature of 3D-IC to 85◦ C using
micro-channel liquid cooling with coolant temperature of 20◦ C and 70 kPa pressure drop.
17
Later, Lee et al [38] focused on the reliability analysis and optimization of various design
parameters of TSVs with inter-tier liquid cooling.
On the other hand, Sridhar et al [61] have proposed a flexible compact transient thermal
model (CTTM) for inter-tier liquid cooling using micro-channels and enhanced cavity pin-
fins. The CTTM model offers a significant speed up of simulation time with a small error
rate.
Although the above works focused on reducing the peak temperature using liquid cool-
ing, the TSVs are not considered as a heat dissipating element with liquid cooling imple-
mentation. In this work, the heat dissipation of TSVs are considered with liquid cooling
implementation. In addition, the thermal impact of emerging NVMs in 3D-ICs are also
analyzed with inter-tier liquid cooling.
2.3 Emerging Non-Volatile Memories
Several research groups have focused on optimizing the energy and read/write latencies of
the emerging NVMs. Yoon et al. [70] used a large PCRAM last level cache to reduce the
off-chip traffic with a little effect on power consumption. They explored different cache hi-
erarchies using SRAM, DRAM and PCRAM and concluded that large PCRAM last-level
cache is very efficient for cache-friendly and memory intensive applications. Smullen et
al. [59] explored the trade off between energy delay product (EDP) and non-volatility of
STTRAM cache. The area of the cell is reduced to minimize the write energy of STTRAM
cell. A cache model is developed to explore the trade off between the non-volatility, latency
and energy in STTRAM. They achieved more than 70% reduction in energy delay product
by using STTRAM L2 and L3 caches.
On the other hand, Brenner et al. [6] from our research group evaluated run-time thermal
management policies in 3D-MPSoCs with RRAM and SRAM L2 cache. They investigated
18
thermal and performance impact of RRAM and SRAM in 3D-MPSoCs. Additionally, the
thermal effect of liquid and air cooling were also studied in 3D-MPSoC with RRAM and
SRAM. They concluded that RRAM based caches lowered the overall maximum tempera-
tures by 24 K compared to SRAM based caches in 3D-MPSoC with inter-tier liquid cool-
ing.
Most of the above research works focus on improving the functionality, performance or
energy in NVMs. A detailed analysis and comparison of performance and thermal impact
of NVMs has not been addressed so far. In this work, The performance, power and thermal
impact of emerging NVMs such as RRAM, PCRAM and STTRAM in 3D-MPSoC is ana-
lyzed and compared with conventional SRAM memory.
2.4 Summary
The meta-heuristic algorithms used for TSV placement in previous research works have
randomness in the placement of TSVs. In addition, these works uses HPWL which ig-
nores the TSV height and position for the calculation of interconnect length. Furthermore,
a comprehensive analysis on the performance and thermal impact of emerging NVMs in
3D-MPSoCs is not addressed so far. In this work, a WSA algorithm which replaces the ran-
domness in SA algorithm with a weight-constraint is proposed. A wholistic interconnect
length calculation is introduced in the cost function of WSA algorithm which considers the
position and height of TSVs. Also, the performance and thermal characteristics of emerg-
ing NVMs in 3D-MPSoCs is analyzed in detail. Furthermore, the thermal impact of liquid
cooling in 3D-ICs is analyzed in this work.
19
Chapter 3
Weight-Based Simulated Annealing (WSA)
WSA algorithm uses a weight constraint for the placement of TSVs in 3D-ICs. Simu-
lated annealing (SA) algorithm used in previous works [16, 41] contains randomness in the
placement of TSVs. WSA algorithm replaces the randomness in SA algorithm by a weight
constraint, to reduce the free space created on inserting TSVs in the floorplan. WSA algo-
rithm proposed in this work for the TSV placement is discussed in this chapter.
3.1 Description of WSA Algorithm
The inputs and constraints used in WSA algorithm for the TSV placement problem are de-
scribed by the following statement.
Problem Statement: The input to the WSA algorithm is a set of macro-blocks Mi with
width Wi, height Hi and power Pi. The macro-blocks are interconnected by the networks
Nj . The TSVs are identified from the inter-tier network and grouped together as via groups
Vl. via group represents a cluster of TSVs. The constraints on the placement of via groups
are a) via groups should not overlap with each other; and b) via groups should be enclosed
within the footprint of the chip. The goal of the TSV placement is to find an optimized
floorplan with minimum footprint FP , interconnect length Lwire and power density ar-
rangement PDA.
The general flow of WSA is shown in Figure 3.1. The placement of the macro-blocks
in 3D-ICs is given as a input to the WSA algorithm. TSVs are identified from the inter-tier
20
networks and are grouped together as via groups. The via groups are placed between the
macro-blocks using WSA algorithm to achieve an optimized floorplan. The WSA algo-
rithm iterates over perturbation of the macro-blocks and weight-based planning of the via
groups, to achieve a minimum cost floorplan. In each iteration, the floorplan is evaluated
using a cost function Fcost after the placement of the via groups.
The WSA algorithm keeps track of temporary and permanent solutions to narrow down
the optimum TSV based floorplan. The temporary floorplan is updated whenever it finds
a lower cost through weight-based TSV planning. In addition, the temporary floorplan is
also updated with a probability (based on incremental cost) to a higher cost to escape from
the local minimum. The best floorplan stores the lowest cost solution of the entire WSA
iterations. The WSA algorithm iterates till a specified number of iterations and checks
whether the current temperature of the floorplan is below the threshold temperature.
The WSA approach can be described by the following equation
minimize Fbest
subject to T < Tthreshold
Fbest ≤ (F1, F2, . . . , Fn)
(3.1)
where T and Tthresold represent the current and minimum threshold temperature of the
3D-IC respectively. Fbest is the best floorplan and F1, F2, . . . , Fn are the new floorplans
generated during the iterations.
3.2 Weight-Based TSV Planning
The general flow of weight-based TSV planning is shown in Figure 3.2. The weight based
TSV planning is carried out by either modifying the aspect-ratio or the position of the via
groups in the floorplan. The weight used in placement of viagroups is calculated based on
21
Initial optimized floorplan F
Perturb the floorplan
Weight-based TSV placement
(cost (F) < cost (F'))
 or P(e-(∆C/T))





Create TSVs from inter-tier
 networks
Form via groups 









Figure 3.1: Flowchart of Weight Simulated Annealing (WSA) used for the TSV placement in 3D-ICs
the free space created from changing the position of the via groups. The weight of the via






[(xl −Xj) · (yl − Yj)] (3.2)
22
where, Wl denotes the weight of the via groups vl. The co-ordinates of the via groups are
Figure 3.2: Flowchart of weight-based planning in WSA algorithm
xl and yl. The co-ordinates of the pins in the macro-blocks are represented by Xj and Yj .
WSA verifies the overlap between the via groups, after satisfying the weight constraint.
Via groups vl and vm overlap each other, if at least one of the following conditions is not
satisfied [40]
xl + wl ≤ xm xl ≥ xm + wm
yl + hl ≤ ym yl ≥ ym + hm
(3.3)
23
where, wl, wm and hl, hm represents the width and height of the via groups vl and vm re-
spectively. The via groups are enclosed within the boundary of the chip with the following
constraints [40]
xl ≥ 0 xl + wl ≤ FPwidth
yl ≥ 0 yl + hl ≤ FPheight
(3.4)
where FPwidth and FPheight are the width and height of the footprint of the 3D-IC respec-
tively. When all the above constraints are satisfied, the via positions are updated in all tiers
of the 3D-IC. The cost function is used for evaluation of the placement is discussed in next
section.
3.3 Cost Function
WSA algorithm uses the cost function to evaluate the TSV based floorplan. The cost func-
tion of WSA is given by the following equation
cost = α · FP + β · Lwire + γ · PDA (3.5)
where, α, β and γ are weight factors. The weighting factors are chosen such that all the
parameters have equal contribution towards the cost function of WSA. Lwire represents the
interconnect length measurement, which is calculated as the sum of inter-layer and intra-
layer wire length. PDA denotes the power density arrangement in the cost function, which
signifies the uniform power density distribution on all the tiers of 3D-IC. FP represents
the footprint of the 3D-IC stack. FP is calculated using the following equation
FP = Wmax ·Hmax (3.6)
where Wmax and Hmax represent the maximum width and height among all the tiers respec-
tively. PDA and Lwire used in the cost function are discussed in the following sections.
24
3.4 Power Density Arrangement (PDA)
PDA is added in the cost function to achieve a uniform temperature profile through the
arrangement of macro-blocks in 3D-IC stack. The power density arrangement is calculated
by the following equation
PDA = δ ·DA + ε ·HPD + ζ · CPD (3.7)
where δ, ε and ζ are the weighting factors.
Density arrangement DA denotes the arrangement of macro-blocks based on the power
density. At the inlet of the heat sink, the coolant is at the lowest temperature. Therefore,
the thermal gradient between the coolant and the macro-blocks is high. When the liquid
flows through the chip, the temperature of the coolant is increased due to the absorption
of the heat from the tiers. Hence, the coolant absorbs more heat near the inlet than in any
other region [53]. The density arrangement DA is a measure that quantifies the density of
high power macro-blocks near the inlet.
CPD represents the cumulative power distribution. It ensures that the high power func-
tional units are surrounded by low-power functional units to maintain a uniform temper-
ature profile. The tiers in the 3D-IC stack are divided in to cubic grids. CPD is the
maximum value of the power density matrix grid, where each value is calculated as the
sum of current grid value and the neighbouring grid values.
HPD is the highest power density. It denotes the highest power grid in the grid matrix,
that contributes to the maximum temperature of the 3D-IC stack.
25
3.5 Interconnect Length (Lwire)
Prior work [29, 16, 43, 11] in TSV placement used half perimeter wire length (HPWL),
to calculate the interconnect length in 3D-ICs. The interconnect length calculation using
HPWL is shown in Figure 3.3(a). HPWL is sum of half-perimeter width and height of
the farthest placed macro-blocks in the tier. HPWL does not consider the impact of TSV
position and height while calculating the interconnect length. One of the problems with
this approach is that the placement of the macro-blocks in the floorplanning stage might
actually underestimate the interconnect length.












Figure 3.3: Interconnect Length Calculation using (a) HPWL (b) TSV height and TSV to pins (TSV-PH)
network to the TSVs. Furthermore, the interconnect length also takes in to account of the
TSVs height between the tiers of 3D-ICs. The interconnect length calculation with TSV-









[(xl − Pxn) + (yl − Pyn)] (3.8)
where l,m and n represents the number of the TSVs, position of the tier of the TSV, and
pins connected to the TSV, respectively. k and q are the layers inter connected by the TSV
26
and Pxn and Pyn represents the coordinates of the pins connected to the TSV.
3.6 TSV Rearrangement
The interconnect length in the optimized floorplan is further optimized by interchanging
the TSVs between the via groups. Algorithm 2 shows the pseudo code for the TSV rear-
rangement.
The interconnect length is calculated for each TSV corresponding to all the via groups,
Algorithm 2 pseudo code for TSV Rearrangement used in the optimized 3D-IC floorplan
for all i in TSV s do
for all j in via groups do






for all i in TSV s do
for all j in via groups do
if (i not in j) and (i and j connect the same tiers) then






connected to the same tiers. Based on the estimate, the TSVs are compared between the via
groups. If a lower interconnect length is identified when TSVs are connected to the same
tiers, the TSVs are rearranged between the via groups.
3.7 Summary
This chapter presents the WSA algorithm for the thermal-aware placement of TSVs. WSA
algorithm achieves the optimized TSV placement with minimized interconnect length, area
27
and temperature. A power density arrangement is used in the cost function of WSA al-
gorithm to promote uniform temperature profile in 3D-ICs. A new interconnect length
calculation which takes in to account of TSV position and TSV height is introduced in




Design-Time and Run-time Simulation of 3D-
ICs
In this chapter, design and run-time thermal simulation framework developed for the inves-
tigation of different 3D-IC configurations are presented. The parameters and assumptions
of 3D-ICs used in the simulation frameworks are also discussed in this chapter.
4.1 Design-time Simulation of Thermal-aware Placement of TSVs
Design-time framework incorporates WSA algorithm for the TSV placement in 3D-ICs
with inter-tier liquid cooling. The framework and simulation methodology used for the
TSV placement are discussed in this section.
4.1.1 Framework
The general flow of design-time TSV placement framework is shown in Figure 4.1. The
framework (developed in C++) integrates a customized floorplanning tool (3DFP) [29]
and thermal simulator (3D-ICE) [60]. The framework uses macro-blocks, TSV definition,
netlist and design information to generate an optimized TSV placed floorplan for 3D-ICs.
The initial placement of macro-blocks is achieved in 3DFP by using a multi-objective cost
function to optimize area and interconnect length in 3D-IC stack. 3DFP uses SA algo-
rithm for the floorplanning and B-tree algorithm for enclosing the macro-blocks in 3D-IC
29
stack. 3DFP is modified to incorporate the WSA algorithm for TSV placement. A TSV
rearrangement module is added as a wrapper to the 3DFP to rearrange the TSVs between
the via groups in the optimized floorplan. In addition, 3D-ICE is integrated to 3DFP as a
customized software thermal library for the thermal analysis of optimized 3D-IC floorplan.


















Figure 4.1: Design-time TSV placement framework
The parameters used in the thermal analysis are shown in Figure 4.1. The floorplan and
stack configuration of 3D-ICs generated from 3DFP are given as inputs to the 3D-ICE for
thermal analysis. 3D-ICE uses compact transient thermal model (CTTM) to simulate the
thermal profile of 3D-ICs and solve the resulting equation matrix using SuperLU [20], a
30
Table 4.1: Floorplan and Thermal analysis parameters used in the simulation of 3D-ICs
Parameter Value
TSV size 20µm X 20 µm
Number of layers 3
silicon thickness 50µm
layer thickness 2µm
pin-fin cavity height 100µm
pin-fin diameter 50µm
pin-fin pitch 100µm
silicon thermal conductivity 130 W/m.K
coolant thermal conductivity 4.172 W/m.K
coolant incoming temperature 300
pin distribution inline
Average coolant darcy velocity 1.1066 m/s
sparse linear system solver. The CTTM model for the solids and liquids used in the sim-
ulation are shown in Figure 4.2(a) and 4.2(b) [60] respectively. The six resistances of the
thermal cells represent the conduction of heat from their respective directions, and capac-
itance represents the heat stored in the thermal cell [60]. The temperature controlled heat
source in the liquid thermal cell denotes the convective heat transfer to the liquid flowing
between the tiers of 3D-IC.
4.1.2 Simulation Methodology
The 3-tier 3D-IC configuration used for the TSV placement is shown in Figure 4.3. The
architecture was tested using MCNC’91 (ami33, ami49, hp and xerox) and GSRC (npu and
ncpu2) benchmarks. The initial placement of the macro-blocks is achieved using SA algo-
rithm with operations such as swap, move, rotate, inter-layer swap and inter-layer move.
The placement of the macro-blocks is given as an input to the WSA algorithm for the TSV
placement. The TSVs are identified from the inter-tier networks of the initial placement
and grouped together as via groups. A maximum of 10 TSVs are used to form a via group.
31
(a) Solid thermal cell
(b) Liquid thermal cell
Figure 4.2: Compact Transient Thermal Model (CTTM) (a) solid thermal cell (b) liquid thermal cell
The via groups placed in the floorplan using the WSA algorithm discussed in the previous
chapter. The WSA algorithm uses a maximum of 15 perturbation moves and 400 iterations
for the optimized placement of TSVs.
The thermal analysis of 3D-IC floorplan is performed for every specific number of iter-
ations in WSA algorithm. The stack configuration used for the thermal analysis of 3D-ICs
is shown in Figure 4.3. The steady state analysis is performed on the 3D-IC with enhanced
cavity pin-fin liquid cooling. The initial temperature of thermal simulation is set to 300 K.
A 20 µm length and 20 µm width cubic cell is used in the thermal analysis of 3D-ICs.
The area, interconnect length and temperature of 3D-ICs obtained from the simulations
are analyzed to investigate the efficiency of WSA algorithm. Additionally, the thermal
impact of liquid cooling in 3D-ICs with TSV placement is also analysed. MCNC’91 and










Figure 4.3: Stack configuration of 3-tier 3D-IC used in TSV placement
4.1.3 MCNC’91 and GSRC Benchmarks
Microelectronics center for North Carolina (MCNC) benchmark suite [35] were published
at the MCNC’91 workshop on the logical synthesis. MCNC’91 benchmarks (ami33, ami49,
hp and xerox) are collected from the industry which ranges from simple circuit to advanced
circuit. Giga scale system research center (GSRC) released similar cpu benchmarks (ncpu
and ncpu2) which differs in size of macro-blocks and power consumed. Both MVNC’91
and GSRC benchmarks contain definition of the macro-blocks, description about the pins
connecting the macro-blocks and the inter-network between the macro-blocks. These
benchmarks are given as an input to 3DFP tool in YAL file format. The macro-blocks
and netlist details of the MCNC’91 and GSRC benchmarks is shown in Table 4.2.
The MCNC’91 and GSRC benchmarks are used in the design-time TSV placement
framework to achieve a optimized 3D-IC floorplan. In addition to the TSV placement, the
emerging NVMs are integrated to 3D-IC design to reduce the temperature. The investiga-
tion of these NVMs are discussed in the next section.
33








4.2 Investigation of Thermal Performance of NVMs in 3D-MPSoCs
The characteristics of different NVMs used in this work are presented in this section. In
addition, the architecture, framework and simulation methodology used for the investiga-
tion of different NVMs are also discussed in this section.
4.2.1 Characteristics of NVMs
Table 1 summarizes the characteristics of NVMs based on the data collected from [69, 21,
45, 25, 48, 46, 14, 12]. RRAM provides high storage capacity due to smaller cell size. In
addition, RRAM offers low operating voltage and multi-level cell (MLC) storage. A 40
nm 3-bit/cell and 2-bit/cell RRAM operation was demonstrated by Macronics [15]. They
achieved a endurance of 103 with operating voltage of 0.4 V . In addition, RRAM has a sim-
ple metal-insulator-metal structure for fabrication compared to multiple magnetic structure
in STTRAM and combination of different sized components such as heaters for PCRAM
[10]. Unity semiconductor demonstrated the largest test array of 0.13 µm 64 MB multi-
layered RRAM using conductive metal oxide technology [10, 13]. Furthermore, RRAM
has the ability to withstand a very high temperature. For instance, RRAM withstanding the
temperature up to 200◦ C was demonstrated by Lee et al. [36]. They achieved an endurance
of greater than 106 cycles and 10 years of data retention at 200◦ C. However, RRAM has
the problem of limited endurance (105-1010 cycles).
34
PCRAM has low static power and MLC storage. IBM, Qimonda and Macronix jointly
Property RRAM PCRAM STTRAM
1T1R Cell size 20F 2 36F 2 50.67F 2
Density high medium low
Static power very low very low very low
Dynamic power low high very high
Endurance 105 - 1010 105 - 109 > 1016
Retention (at 85◦ C) 10+ years 10+ years 10+ years
Multi-level cell storage 3-bit/cell 4-bit/cell 2-bit/cell
Thermal performance high low medium
Table 4.3: Physical Characteristics of NVMs
demonstrated the operation of PCRAM of 4-bit/cell and 2-bit/cell [48]. They tested a 10x10
array structure of 4-bit/cell for 3x109 read cycles and 32 kB page of 2-bit/cell for 109 read
cycles. In addition, Samsung has demonstrated a 512 MB PCRAM array with a write
endurance of 105 cycles with a data retention time of 10 years at 85◦ C [49]. The ther-
mal performance of PCRAM is limited by its crystallization temperature, as switching is
achieved through heating the PCM [10]. For instance, Pellizer et al. [52] demonstrated the
thermal performance of 110◦ C with 10 years of data retention time.
STTRAM has relatively very high endurance of greater than 1016 cycles, compared to
RRAM and PCRAM. In addition, STTRAM has low static power and MLC storage. For
instance, a 2-bit/cell STTRAM operation using MgO-based MJT was demonstrated by
Seagate technology [46]. Furthermore, STTRAM has fast access time compared to RRAM
and PCRAM [23]. Hitachi and Tohoku university jointly demonstrated a 1.8 V 0.2 µm
2 Mb STTRAM using MgO tunneling barrier and demonstrated a cell efficiency of 40%
[31]. Moreover, STTRAM has a good thermal performance among the NVMs. For in-
stance, STTRAM withstanding a temperature of 150◦ C with 10 years of data retention
was demonstrated by Ono et al. [50]. However, STTRAM has a large cell size which leads
to lower storage capacity compared to RRAM and PCRAM.
35
Although, PCRAM and STTRAM has low leakage power, the dynamic power consump-
tion is high in these NVMs. PCRAM consumes a high dynamic energy due to the high cur-
rent required to switch between crystalline and amorphous states [9]. In addition to high
dynamic energy, PCRAM also requires a minimum distance between the cells to avoid
undesirable heating (thermal cross talk) from the neighbouring cells. While, STTRAM
requires a high current to change the direction of the magnetic layer during the write op-
eration [72]. On the other hand, RRAM consumes a very low dynamic energy compared
to the PCRAM and STTRAM [50]. Despite the above challenges, the advantages such as
low static power, relatively high thermal performance compared to SRAM and possibility
of MLC storage makes the NVMs to be considered as a thermal efficient replacement of
SRAM in this work. The architecture used in this work to integrate these NVMs in 3D-
MPSoCs is discussed in the following section.
4.2.2 Architecture of NVMs
NVMs such as PCRAM, STTRAM and RRAM use one-transistor one-resistor (1T1R)
structure for one-bit memory cell. The generic 1T1R architecture used for NVM L2 cache
block is shown in Figure 4.4. Each one-bit NVM cell is composed of a single NMOS tran-
sistor and a single NVM element.
One-bit cell sizes of 6T SRAM and 1T1R NVMs used in this work is listed in Table
4.4. The RRAM has the lowest 1T1R cell size of 20 F 2 which is ∼ 7X improvement over
the SRAM 146 F 2 cell size. While, PCRAM and STTRAM have∼ 4X and∼ 3X improve-
ment over SRAM respectively. The small cell sizes of these NVMs provide an opportunity
to integrate higher cache density (compared to SRAM) in 3D-ICs.
The L2 cache is organized in three hierarchical levels bank, mat and sub-arrays. The
organization of L2 cache is shown in Figure 4.5 [25]. The bank is the top-most memory











































STTRAM ∼ 51F 2
Table 4.4: Cell size of SRAM and 1T1R NVM cell
blocks. The mat blocks are interconnected through a H-tree structure. Each mat block
consists of multiple sub-arrays. Each sub array consists of 1T1R NVM array structure (as
shown in Figure 4.4) and, the peripheral circuitry such as input/output drivers, row decoder,
column multiplexer and sensing amplifier.
The read and write operations is different in different NVMs. During read and write
operations, the word line(WL) for the specific memory cell is held high as shown in Figure
4.4. The following sections describe the read and write operations of each NVMs.
37
Figure 4.5: Generic NVM L2 Cache Organization adapted from [25]
Read and Write Operation of RRAM
RRAM 1T1R cell store logic ’0’ and logic ’1’ in LRS and HRS respectively [22]. During
the read operation, a small voltage is applied on the bit line (BL) of the 1T1R RRAM cell.
The read voltage is maintained typically small, to ensure the read operation does not dis-
turb the value of the RRAM cell. A current or voltage sense amplifier detects the state of
RRAM cell [6].
During write operation, a positive voltage is applied on the BL to store high resistance
state in RRAM cell. While, a negative voltage is applied on the BL to store a low resistance
state in RRAM cell [47].
Read and Write Operation of PCRAM
PCRAM represent logic ’1’ and logic ’0’ as crystalline and amorphous states respectively
in PCM [8]. Read operation involves application of a small positive voltage on the BL and
a current sense amplifier is used to sense the data in PCRAM cell. The PCM is heated by
passing the electric current to switch between crystalline and amorphous state. During the
write operation, a moderate current passed through the PCM for longer duration to achieve
amorphous state. While, a high current is passed to heat the PCM above crystallization
38
temperature and quenched suddenly to achieve crystallization state. The temperature and
pulse duration for crystalline and amorphous state is shown in Figure 4.6. [25]
Figure 4.6: PCRAM SET and RESET operation adapted from [25]
Read and Write Operations of STTRAM
STTRAM uses parallel and anti-parallel state to represent logic ’0’ and logic ’1’ respec-
tively [3]. Read operation involves the application of very small negative voltage across
the BL and source line (SL) line. The value of current through the MJT is determined by
its resistance. A current sense amplifier is used to sense the current and identify the state of
MJT. During write operation, a positive voltage or negative voltage is applied on the BL.
The current required is passed through the cell to write the data.
A run-time simulation framework that supports the above NVMs L2 cache architec-
ture in 3D-MPSoC is developed in this work. The following section discuss the run-time
simulation framework in detail.
4.2.3 Framework
The run-time simulation framework integrates different NVMs as L2 cache in 3D-MPSoCs.
The framework uses architectural description and benchmark characteristics as inputs to
simulate the power, performance and thermal profiles of 3D-MPSoCs. The general flow of
39
run-time thermal and performance simulation framework developed in this work is shown
in Figure 4.7. Gem5 [5] is used an architectural simulation platform for the proposed 3D-
MPSoC architecture. The architectural configuration and cache access latencies used in the
simulation are shown in Table 4.5. The cache access latencies of SRAM and NVMs are
generated using performance and energy simulator, NVSim [25].





















Figure 4.7: Design-flow of Run time simulation framework used for the investigation of emerging NVMs in
3D-MPSoCs
PAT power simulator. McPAT [39] models dynamic power, sub-threshold leakage and gate
leakage power for 3D-MPSoC architecture. NVM power tracer calculates the NVM L2
cache power using the power values obtained from McPAT and NVSim.
3D-ICE uses CTTM model to generate the thermal profile of 3D-MPSoCs with inter-
tier liquid cooling. The properties of liquid coolant and the stack material used for the
thermal simulation are shown in Table 4.6. The entire flow of run-time simulation frame-
work is controlled through a python script.
40
Parameters Value
Number of cores 8, 16
Voltage 1.25 V
L1 instruction cache 16KB
L1 data cache 8KB
L1 I and D cache line size 64 B
Clock 1.4 GHz
L2 cache size 4MB (8-core), 8MB (16-core)
L2 cache access latencies 33.05 ns for SRAM, 72.4 ns for RRAM,
61.4 ns for STTRAM and 77.31 ns for
PCRAM
L2 cache line size 64 B
Issue width 2
Issue out of order
Functional unit 2 IntAlu, 2 IntMulDiv, 1 FpAlu, 1 FpMult-
Div and 1 load/store
main memory 1 GB SDRAM
Table 4.5: Architectural description of 3D MPSoC used for the investigation of emerging NVMs
Parameters Value
Silicon thickness 50 µm
Layer thickness 2 µm
Metal layer thickness 10 µm
Heat transfer co-efficient (air cooling) 10−7
W/µm2K
Liquid coolant flow rate 42 ml/min
Liquid coolant incoming temperature 300 K
Liquid coolant volumetric heat capacity 4.172 W/m.K
Table 4.6: Parameters used in thermal simulation of emerging NVMs in 3D-MPSoC
4.2.4 Simulation Methodology
The architecture used in this work is loosely based on 65 nm UltraSPARC T2 processor.
The 2-tier and 4-tier configurations are used for thermal analysis of NVMs in 3D-MPSoCs.
The 2-tier configuration which has 8-cores and 4 L2 cache blocks is shown in Figure 4.8.
While, the 4-tier configuration (with 16-cores and 8 L2 cache blocks) is assembled in a sim-
ilar manner. The multi-threaded PARSEC benchmark is used as the target workload for the
3D-MPSoC architecture. The eight-threaded out-of-order CPU model is used for the full
system simulation (Linux OS) of the target workload in 3D-MPSoC architecture. A 4 MB
cache block is used for SRAM and NVMs for the purpose of comparison in 3D-MPSoCs.
41
In addition, equivalent high density NVM L2 cache size (32 MB RRAM, 16 MB PCRAM
and 16 MB STTRAM) relative to 4 MB SRAM L2 cache are also studied.












Cache 2 Cache 3
Core 0 Core 1 Core 4 Core 5
Core 2 Core 3 Core 6 Core 7
Crossbar
Figure 4.8: Stack configuration of 3D-IC used for the investigation of thermal impact of emerging NVMs
from the simulation of NVSim. The configuration parameters used in the NVSim simula-
tion are derived from [21, 45, 69]. The power values calculated from NVM power tracer
are merged with the floorplan (shown in Figure 4.8) and given as an input to the thermal
simulation. 3D-ICE generate the thermal profile of 3D-MPSoCs using the stack configura-
tion shown in Figure 4.8. Micro-channel liquid cooling is chosen as it is scalable with the
number of tiers in 3D-MPSoCs.
The efficiency of NVM L2 cache with high latency and low power is calculated us-
ing energy delay product (EDP). In addition, the temperature profile of these NVMs L2
cache are analyzed and compared to study their overall thermal impact in 3D-MPSoCs.




The multi-threaded PARSEC benchmarks [4] was jointly developed by Intel and Princeton
university. The PARSEC benchmarks are developed from real-time application such as
computer vision, engineering, animation etc. The PARSEC benchmarks are cross compiled
and executed in the Linux system using gem5 on the proposed architecture. The PARSEC
benchmarks offer many sizes of input such as test, simdev, simsmall, simmedium, simlarge
and native. simsmall is used in this work as they are suited for micro-architectural studies.
The paralleization, working set, application and data sharing of PARSEC benchmarks used
in this work is shown in Table 4.7.




bodytrack Computer vision data-parallel medium high
Fluid Animate Animation data-parallel large low
Ferret Similarity search pipeline unbounded high
Facesim Animation data-parallel large low
Freqmine Data mining data-parallel unbounded medium
Table 4.7: PARSEC benchmarks used for the investigation of emerging NVMs in 3D-MPSoCs
4.3 Summary
The frameworks used for the simulation of different 3D-IC configurations are presented
in this chapter. Design-time TSV placement framework incorporates WSA algorithm to
achieve a optimized TSV placed floorplan. In addition, run-time simulation framework is




In this chapter a detailed discussion of the results from the TSV placement and WSA op-
timization algorithm run on different benchmarks is presented. Also, the thermal profiles
of emerging NVMs as L2 caches when running different PARSEC benchmarks are exten-
sively studied.
5.1 Design-Time Optimization of TSV Placement
WSA algorithm is simulated and tested for the TSV floorplanning using the simulation
framework presented in the previous chapter. The impact of the TSV placement using
WSA algorithm on interconnect length, area and temperature is presented in this section.
All the results of MCNC’91 and GSRC benchmarks are taken from an average of 50 inde-
pendent simulations.
The WSA algorithm replaced the randomness in the SA algorithm with a weight con-
straint. The area of the 3D-IC floorplan achieved using WSA and SA algorithm is compared
in Figure 5.1. It is observed that the WSA algorithm is more efficient in optimizing the area
of 3D-ICs compared to SA algorithm. The WSA algorithm reduced the area from 2% to
16% across the benchmarks compared to the SA algorithm. The decrease in area using
WSA algorithm depends on the number of TSVs and free space created while inserting the
TSVs. Furthermore, the reduction in area scales with the size and number of macro-blocks
in 3D-ICs.
44
A new interconnect length calculation (TSV-PH) is introduced in the cost function of
Figure 5.1: Comparison of area of 3D-ICs obtained using SA and WSA algorithm in the TSV placement
the WSA algorithm for the TSV placement. Figure 5.2 shows the difference between in-
terconnect length calculated using HPWL and TSV-PH. In general, the TSV-PH enhanced
the estimation of interconnect length over the MCNC’91 and GSRC benchmarks. A maxi-
mum and minimum difference of 1053 mm and 148 mm is observed between HPWL and
TSV-PH. HPWL ignores the TSV height, position of TSVs and number of pins in the inter-
connect length calculation. This results in lower values for interconnect length and thereby,
affecting the accuracy of the optimized floorplan.
After achieving the optimized floorplan using WSA algorithm, the TSVs are rear-
ranged between the via groups to reduce the interconnect length without affecting the area
and temperature. The interconnect length before and after TSV rearrangement is compared
in Figure 5.3. In general, TSV rearrangement reduced the interconnect length throughout
the benchmarks. A maximum of 33% interconnect length reduction is achieved using TSV
rearrangement. The decrease in interconnect length using TSV rearrangement depends on
the number of inter-tier networks and, the distance between the TSV and macro-blocks in
45
Figure 5.2: Comparison of interconnect length measurement in 3D-ICs using HPWL and TSV-PH
the 3D-IC floorplan. For instance, a 5% decrease in interconnect length is observed in hp
due to fewer interconnects, and a 33% decrease is observed in xerox due to large number
of interconnects.
Liquid cooling is implemented with the TSV placement to achieve a reduced tem-
perature in 3D-ICs. The difference between the average temperature of 3D-IC with and
without liquid cooling is shown in Figure 5.4. As expected, liquid cooling implementation
has reduced the temperature across the benchmarks. Both ami33 and ncpu2 benchmarks
demonstrate a higher reduction (8 K) in temperature with implementation of liquid cool-
ing. The decrease in temperature depends on the flow rate of the coolant, temperature of
the coolant and the placement of the macro-blocks.
A case study on ami33 is performed to investigate the thermal impact of liquid cool-
ing and PDA of WSA algorithm. Figures 5.5(a), 5.5(b) and 5.5(c) shows the temperature
profile of 3D-IC stack for ami33 benchmark without liquid cooling, with liquid cooling
and liquid cooling with PDA respectively. From Figure 5.5(a) and Figure 5.5(b), it is ob-
served that liquid cooling reduced the temperature of the functional units and the number
46
Figure 5.3: Comparison of interconnect length of 3D-ICs before and after TSV rearrangement
Figure 5.4: Comparison of average temperature of 3D-ICs with and without Liquid Cooling
of hotspots in the 3D-IC stack significantly. The temperature reduction of 8 k is observed
with liquid cooling implementation. In addition, the temperature profile (Figure 5.5(c))
shows more uniformity with fewer hotspots by using PDA in WSA algorithm. The average
47
(a) Without liquid cooling
(b) With liquid cooling and without PDA
(c) With liquid cooling and PDA
Figure 5.5: Temperature profile of 3D-ICs with ami33 benchmark comparing (a) without liquid cooling (b)
with liquid cooling (c) PDA with liquid cooling
temperature of 3D-IC is reduced further by 3 K using PDA in WSA algorithm.
From the above results, it is observed that different parameters of WSA algorithm such
as weight constraint, PDA, TSV-PH and TSV rearrangement has impact on the optimized
3D-IC floorplan. A comprehensive analysis on the impact of these parameters on intercon-
nect length, area and temperature of 3D-ICs is presented in the next section.
48
5.1.1 Effect of Different Parameters of WSA on Area, Interconnect length and Tem-
perature
The impact of different parameters of WSA algorithm on area, interconnect length and
temperature of 3D-ICs is shown in Figure 5.6. The weight constraint of WSA algorithm
reduce the area and thereby, interconnect length of 3D-ICs. However, the temperature of
3D-ICs increases as the placement of the macro-blocks are close to each other with weight
constraint. On the other hand, PDA of WSA algorithm improves the temperature unifor-
mity across the 3D-IC stack. PDA controls the placement of the macro-blocks and hence,
the interconnect length and area may increase or decrease depending on the placement.


















Figure 5.6: Effect of different parameters of WSA algorithm on area, interconnect length and temperature of
3D-ICs
height to improve the interconnect length in 3D-ICs. TSV-PH provide more accurate inter-
connect length measurement for the TSV placement and thereby, improving the optimiza-
tion of TSV placement. On the other hand, the TSV rearrangement decrease the intercon-
nect length by rearranging the TSVs between the via groups. The TSV rearrangement does
not affect the area and the temperature, as the position of via groups and macro-blocks are
not affected during the rearrangement. The integration of above optimization parameters
49
in WSA algorithm results in increased simulation time for the TSV placement.
From the above results, it is observed that the proposed WSA algorithm reduced the
overall area, interconnect length and temperature of 3D-IC design. In addition to opti-
mization of TSV placement, the emerging NVMs are integrated in to 3D-ICs to reduce the
temperature of 3D-ICs. The performance and temperature impact of emerging NVMs is
presented and analyzed in the next section.
5.2 Investigation of Emerging NVMs
The run-time framework that supports different NVMs in 3D-ICs is simulated using PAR-
SEC benchmark suite to investigate the performance and thermal impact of NVMs in 3D-
ICs. The EDP and temperature profile of these NVMs in 3D-ICs is analyzed and compared
with SRAMs in this section.








































































Figure 5.8: Average temperature of SRAM and NVMs L2 cache in 4-tier 3D-MPSoCs for different PARSEC
benchmarks
is shown in Figure 5.7. It is observed that SRAM L2 cache demonstrate a higher EDP
compared to NVMs, due to its high static power consumption. Among the NVMs, RRAM
has lower EDP than PCRAM and STTRAM. This is attributed to the low static power and
dynamic energy of RRAM. Although, STTRAM has minimum access time, the high dy-
namic energy resulted in higher EDP than other NVMs. A maximum EDP difference of
18% is observed between RRAM and PCRAM, and 34% is observed between PCRAM and
STTRAM.
The average temperature of SRAM and NVMs L2 cache for 4-tier 3D-MPSoC is shown
in Figure 5.8. STTRAM L2 cache has higher temperature among the NVMs. The reason
is attributed to high dynamic energy of STTRAM. Specifically, in benchmarks such as fer-
ret with high cache accesses, STTRAM L2 cache demonstrates a high temperature, with a
temperature difference of 1.4 K than SRAM. On the other hand, RRAM L2 cache demon-
strates low temperature than SRAM and other NVMs. For instance, RRAM L2 cache has










































































Figure 5.9: Average temperature of 3D-MPSoCs with air-cooling for different PARSEC benchmarks (a) 2-tier
(b) 4-tier
to STTRAM.
The above temperature reduction in NVMs L2 cache affects the overall average tem-
perature of 3D-MPSoCs. The overall temperature of 2-tier and 4-tier 3D-MPSoC with air
52
cooling for different NVMs L2 cache is shown in Figure 5.9(a) and 5.9(b) respectively.
From the simulations, NVMs L2 cache has lowered the overall average temperature of 2-
and 4-tier 3D-MPSoCs. It is observed that there is an increase of temperature reduction
from 2-tier to 4-tier 3D-MPSoCs. For instance, RRAM has reduced the temperature to a
maximum of 2.5 K in 2-tier to 12.5 K in 4-tier 3D-MPSoCs. The temperature reduction
is attributed to the low static power consumption of these NVMs. However, the overall av-
erage temperature for 4-tier is still higher, demanding additional cooling mechanisms and
associated cost. The cooling mechanism consumes considerable amount of energy in the
form of pump and heat exchanger.
Liquid cooling is implemented in 3D-MPSoC with NVM L2 cache to reduce the high
temperatures. The overall average temperature of 4-tier 3D-MPSoC with liquid cooling
is shown in Figure 5.10. It is observed that liquid cooling reduced the overall average
temperature of 4-tier 3D-MPSoCs. A maximum difference of 37 K is observed in 4-tier
3D-MPSoCs. The decrease in temperature depends on the flow rate of the coolant, place-
ment of the functional units and incoming temperature of the liquid coolant.
Emerging NVMs has higher density compared to the conventional SRAM memory.
The thermal impact of these high density NVMs is analyzed in 3D-MPSoCs. The L2 cache
size of 32 MB RRAM, 16 MB PCRAM and 16 MB STTRAM were used equivalent to
4 MB SRAM. The EDP of 4-tier 3D-MPSoC (with L2 cache size based on NVM density)
is shown in Figure 5.11(a). The EDP of NVMs varies with the number of write accesses
in the benchmark. For benchmarks such as blackscholes with fewer cache access, PCRAM
has lower EDP compared to the RRAM and STTRAM. The reason is due to the increase
in leakage power of RRAM and high dynamic energy consumption of STTRAM. On the
other hand, for benchmarks such as ferret with high write accesses, the RRAM has lower
EDP than PCRAM and STTRAM. The high write accesses in these benchmarks resulted








































Figure 5.10: Average temperature of 4-tier 3D-MPSoC with liquid-cooling for different PARSEC benchmarks
The overall average temperature of 4-tier 3D-MPSoCs with L2 cache size based on the
NVM density is shown in Figure 5.11(b). In general, SRAM has higher average tempera-
ture than RRAM and PCRAM. A maximum difference of 6 K is observed between RRAM
and SRAM and, 9 k is observed between PCRAM and SRAM. The difference between
SRAM and STTRAM varies with the number of write accesses in the benchmarks. Among
the NVMs, PCRAM has lower temperature compared to RRAM and STTRAM. This is due
to the increase in leakage power consumption of RRAM L2 cache. A maximum difference
of 3.7 K is observed between RRAM and PCRAM. The temperature difference between
RRAM and STTRAM varies with the number of write access in L2 cache.
PCRAM and STTRAM has high dynamic energy due to high write current required
for switching. Hence, the energy consumed by PCRAM and STTRAM depends on the
number of write access in the benchmarks. A case study on ferret (high write accesses)
and blackscholes (fewer write accesses) is performed in the following section to study the




Figure 5.11: Average temperature of 3D-MPSoC with different cache size (based on NVM density) simulat-
ing different parsec benchmarks (a) EDP (b) overall average temperature
5.2.1 CASE STUDY: Ferret and Blackscholes
The normalized value of number of reads, writes and misses in L2 cache in 4-tier 3D-
MPSoC for ferret and blackscholes benchmarks is shown in Figure 5.12. The ferret bench-


























Figure 5.12: Number of Read, Write and Miss in 4-tier 3D-MPSoC (a) Ferret (b) Blackscholes





















Figure 5.13: Average temperature of L2 cache in 4-tier 3D-MPSoC for ferret and blackscholes benchmarks
Figure 5.12, It is observed that ferret benchmark has higher number of write accesses in
the L2 cache. Furthermore, the execution time of ferret and blackscholes also varies in the
proposed 3D-MPSoC architecture. For instance, RRAM L2 cache takes 0.05 s for blacksc-
holes and 0.5 s for ferret benchmarks.
The average temperature of SRAM and NVMs L2 cache in 4-tier 3D-MPSoCs for ferret
56
and blackscholes benchmarks is shown in Figure 5.13. From the simulations, it is observed
that the difference in temperature between NVMs is higher in ferret benchmark. The rea-
son is due to the higher number of write access in the ferret benchmark. A difference of
2.7 K is observed between RRAM and PCRAM, and 7 K is observed between PCRAM
and STTRAM. On the other hand, blackscholes due to fewer write accesses, demonstrate
a smaller temperature variation between the NVMs. The temperature difference between
RRAM and PCRAM is 0.6 K and for PCRAM and STTRAM is 1.36 K for blackscholes
benchmark.
The temperature profile of SRAM and NVMs in 3D-MPSoC with air-cooling for black-
sholes and ferret benchmarks are shown in Figure 5.14 and Figure 5.15 respectively. For
blackscholes benchmarks, it is observed that 3D-MPSoC with NVMs show reduced tem-
perature with fewer number of hotspots. The reason for the small variation between NVMs
is due to fewer write accesses in blackscholes benchmark. On the other hand, the variation
in temperature and number of hotspots in 3D-ICs with NVMs is very high in ferret bench-
mark. For instance, RRAM due to low dynamic and static energy consumption has reduced
temperatures and fewer number of hotspots. While, STTRAM with high dynamic energy
has high temperatures with large number of hotspots.
On the whole, NVMs can outperform SRAM cache with better temperature profile and
low static power consumption. Specifically, RRAM demonstrated a low EDP and average
temperature among the NVMs. Additionally, RRAM offers high density with little impact
on the EDP and temperature. The temperature reduction with RRAM is further enhanced
by implementing liquid cooling in 3D-ICs.
57
(a) 3D-MPSoC with SRAM L2 cache
(b) 3D-MPSoC with RRAM L2 cache
(c) 3D-MPSoC with PCRAM L2 cache
(d) 3D-MPSoC with STTRAM L2 cache
Figure 5.14: Temperature profile of 3D-MPSoC simulating blackscholes benchmark (a) SRAM L2 cache (b)
RRAM L2 cache (c) PCRAM L2 cache (d) STTRAM L2 cache
58
(a) 3D-MPSoC with SRAM L2 cache
(b) 3D-MPSoC with RRAM L2 cache
(c) 3D-MPSoC with PCRAM L2 cache
(d) 3D-MPSoC with STTRAM L2 cache
Figure 5.15: Temperature profile of 3D-MPSoC simulating ferret benchmark (a) SRAM L2 cache (b) RRAM
L2 cache (c) PCRAM L2 cache (d) STTRAM L2 cache
59
5.3 Summary
The proposed WSA algorithm for the TSV placement reduced the area, interconnect length
and temperature of 3D-ICs to a maximum of 16%, 33% and 8 K respectively in MCNC’91
and GSRC benchmark suite. In addition, integration of emerging NVMs has reduced the
temperature in 3D-ICs to a maximum of 12.5 K (RRAM) compared to SRAM. Especially,
RRAM has proven to be thermally efficient replacement of SRAM with 34% lower EDP
and 9.5 K average reduction in temperature among the NVMs.
60
Chapter 6
Conclusions and Future work
6.1 Conclusions
The conclusions of this thesis are as follows:
• Design-time TSV placement using the proposed WSA algorithm reduced the area,
interconnect length and temperature by 16%, 32% and 8 K respectively.
• PDA used in the WSA algorithm reduces the difference between maximum and min-
imum temperatures by 5% and improves the uniformity (1%-5%) across the 3D-IC
stack.
• TSV-PH introduced in the WSA algorithm improved the interconnect length estima-
tion compared to HPWL by considering TSV position and height. TSV-PH has im-
proved the estimate to a maximum of 1053 mm compared to HPWL.
• Investigation of emerging NVMs showed that RRAM has a maximum reduction of
12.5 K in average temperature compared to other NVMs. In addition, Liquid cooled
RRAM L2 cache reduced the average temperature of 3D-MPSoCs to a maximum of
41 K.
• RRAM has proved to be thermally efficient replacement of SRAM cache with 34%
lower EDP compared to the other NVMs. In addition, RRAM provides high density
of 32 MB L2 cache compared to 16 MB PCRAM and STTRAM with a maximum
of 3.7 K increase in temperature.
61
6.2 Future Work
Future work could explore thermal via placement with different cooling mechanisms in-
tegrated in 3D-IC. In addition, the thermal impact of different materials such as carbon
nanotubes, copper etc., can be can be investigated for thermal vias. Furthermore, the per-
formance of different meta-heuristic algorithms with different micro-channel models can
be analyzed to improve the convergence rate and reduce the execution time for the TSV
placement with inter-tier liquid cooling.
The thermal and performance characteristics of different NVMs are investigated in this
work. The modeling, simulation and analysis of MLC storage property of NVMs can be
investigated. In addition, the thermal and performance impact of NVMs can be analyzed
by extending NVM to L3 cache and on-chip main memory. Furthermore, the performance




[1] P. Bangert. Optimization for industrial problems. Springer, 2012.
[2] D. Bertsimas and J. Tsitsiklis. Simulated annealing. Statistical Science, pages 10–15,
1993.
[3] Xiuyuan Bi, Hai Li, and Jae-Joon Kim. Analysis and optimization of thermal effect
on stt-ram based 3-d stacked cache design. In VLSI (ISVLSI), 2012 IEEE Computer
Society Annual Symposium on, pages 374 –379, aug. 2012.
[4] Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. The parsec bench-
mark suite: characterization and architectural implications. In Proceedings of the 17th
international conference on Parallel architectures and compilation techniques, PACT
’08, pages 72–81, New York, NY, USA, 2008. ACM.
[5] Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi,
Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sar-
dashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib, Nilay Vaish, Mark D. Hill,
and David A. Wood. The gem5 simulator. SIGARCH Comput. Archit. News, 39(2):1–
7, August 2011.
[6] David Brenner, Cory Merkel, and Dhireesha Kudithipudi. Design-time performance
evaluation of thermal management policies for sram and rram based 3d mpsocs. In
Proceedings of the great lakes symposium on VLSI, GLSVLSI ’12, pages 177–182,
New York, NY, USA, 2012. ACM.
[7] M. Čepin. Assessment of Power System Reliability: Methods and Applications.
Springer, 2011.
[8] Huan-Lin Chang, Hung-Chih Chang, Shang-Chi Yang, Hsi-Chun Tsai, Hsuan-Chih
Li, and C.W. Liu. Improved spice macromodel of phase change random access mem-
ory. In VLSI Design, Automation and Test, 2009. VLSI-DAT ’09. International Sym-
posium on, pages 134 –137, april 2009.
[9] Meng-Fan Chang, Pi-Feng Chiu, and Shyh-Shyuan Sheu. Circuit design challenges in
embedded memory and resistive ram (rram) for mobile soc and 3d-ic. In Proceedings
63
of the 16th Asia and South Pacific Design Automation Conference, ASPDAC ’11,
pages 197–203, Piscataway, NJ, USA, 2011. IEEE Press.
[10] FrederickT. Chen, HengYuan Lee, YuSheng Chen, YenYa Hsu, LiJie Zhang, Pang-
Shiu Chen, WeiSu Chen, PeiYi Gu, WenHsing Liu, SuMin Wang, ChenHan Tsai,
ShyhShyuan Sheu, MingJinn Tsai, and Ru Huang. Resistance switching for rram
applications. Science China Information Sciences, 54:1073–1086, 2011.
[11] Yibo Chen, E. Kursun, D. Motschman, C. Johnson, and Yuan Xie. Analysis and
mitigation of lateral thermal blockage effect of through-silicon-via in 3d ic designs.
In Low Power Electronics and Design (ISLPED) 2011 International Symposium on,
pages 397 –402, aug. 2011.
[12] C.H. Cheng, C.Y. Tsai, A. Chin, and F.S. Yeh. High performance ultra-low energy
rram with good retention and endurance. In Electron Devices Meeting (IEDM), 2010
IEEE International, pages 19.4.1 –19.4.4, dec. 2010.
[13] C.J. Chevallier, Chang Hua Siau, S.F. Lim, S.R. Namala, M. Matsuoka, B.L. Bateman,
and D. Rinerson. A 0.13 um 64mb multi-layered conductive metal-oxide memory.
In Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2010 IEEE
International, pages 260 –261, feb. 2010.
[14] W.C. Chien, Y.C. Chen, K.P. Chang, E.K. Lai, Y.D. Yao, P. Lin, J. Gong, S.C. Tsai,
S.H. Hsieh, C.F. Chen, K.Y. Hsieh, R. Liu, and Chih-Yuan Lu. Multi-level operation
of fully cmos compatible wox resistive random access memory (rram). In Memory
Workshop, 2009. IMW ’09. IEEE International, pages 1 –2, may 2009.
[15] Wei-Chih Chien, Ming-Hsiu Lee, Feng-Ming Lee, Yu-Yu Lin, Hsiang-Lan Lung,
Kuang-Yeu Hsieh, and Chih-Yuan Lu. Multi-level 40nm wox resistive memory with
excellent reliability. In Electron Devices Meeting (IEDM), 2011 IEEE International,
pages 31.5.1 –31.5.4, dec. 2011.
[16] J. Cong, Jie Wei, and Yan Zhang. A thermal-driven floorplanning algorithm for 3d ics.
In Computer Aided Design, 2004. ICCAD-2004. IEEE/ACM International Conference
on, pages 306 – 313, nov. 2004.
[17] J. Cong and Yan Zhang. Thermal-driven multilevel routing for 3d ics. In Design
Automation Conference, 2005. Proceedings of the ASP-DAC 2005. Asia and South
Pacific, volume 1, pages 121 – 126 Vol. 1, jan. 2005.
64
[18] A.K. Coskun, J.L. Ayala, D. Atienza, and T.S. Rosing. Modeling and dynamic man-
agement of 3d multicore systems with liquid cooling. In Very Large Scale Integration
(VLSI-SoC), 2009 17th IFIP International Conference on, pages 35 –40, oct. 2009.
[19] D. Cuesta, J.L. Risco-Martin, J.L. Ayala, and D. Atienza. 3d thermal-aware floorplan-
ner for many-core single-chip systems. In Test Workshop (LATW), 2011 12th Latin
American, pages 1 –6, march 2011.
[20] Timothy A. Davis, Ekanathan Palamadai Natarajan, and Ansoft Inc. Algorithm 8xx:
Klu, a direct sparse solver for circuit simulation problems. 2010.
[21] G. De Sandre, L. Bettini, A. Pirola, L. Marmonier, M. Pasotti, M. Borghi, P. Mat-
tavelli, P. Zuliani, L. Scotti, G. Mastracchio, F. Bedeschi, R. Gastaldi, and R. Bez.
A 90nm 4mb embedded phase-change memory with 1.2v 12ns read access time and
1mb/s write throughput. In Solid-State Circuits Conference Digest of Technical Pa-
pers (ISSCC), 2010 IEEE International, pages 268 –269, feb. 2010.
[22] R. Dong, Q. Wang, L. D. Chen, D. S. Shang, T. L. Chen, X. M. Li, and W. Q. Zhang.
Retention behavior of the electric-pulse-induced reversible resistance change effect in
ag-la0.7ca0.3mno3-pt sandwiches. Applied Physics Letters, 86(17):172107 –172107–
3, apr 2005.
[23] Xiangyu Dong, Xiaoxia Wu, Guangyu Sun, Yuan Xie, H. Li, and Yiran Chen. Circuit
and microarchitecture evaluation of 3d stacking magnetic ram (mram) as a univer-
sal memory replacement. In Design Automation Conference, 2008. DAC 2008. 45th
ACM/IEEE, pages 554 –559, june 2008.
[24] Xiangyu Dong and Yuan Xie. System-level cost analysis and design exploration for
three-dimensional integrated circuits (3d ics). In Design Automation Conference,
2009. ASP-DAC 2009. Asia and South Pacific, pages 234 –241, jan. 2009.
[25] Xiangyu Dong, Cong Xu, Yuan Xie, and N.P. Jouppi. Nvsim: A circuit-level perfor-
mance, energy, and area model for emerging nonvolatile memory. Computer-Aided
Design of Integrated Circuits and Systems, IEEE Transactions on, 31(7):994 –1007,
july 2012.
[26] F. Frantz, L. Labrak, and I. O’Connor. 3d-ic floorplanning: Applying meta-
optimization to improve performance. In VLSI and System-on-Chip (VLSI-SoC), 2011
IEEE/IFIP 19th International Conference on, pages 404 –409, oct. 2011.
65
[27] B. Goplen and S. Sapatnekar. Placement of 3d ics with thermal and interlayer via
considerations. In Design Automation Conference, 2007. DAC ’07. 44th ACM/IEEE,
pages 626 –631, june 2007.
[28] Brent Goplen and Sachin Sapatnekar. Thermal via placement in 3d ics. In Proceedings
of the 2005 international symposium on Physical design, ISPD ’05, pages 167–174,
New York, NY, USA, 2005. ACM.
[29] W.-L. Hung, G.M. Link, Yuan Xie, N. Vijaykrishnan, and M.J. Irwin. Interconnect and
thermal-aware floorplanning for 3d microprocessors. In Quality Electronic Design,
2006. ISQED ’06. 7th International Symposium on, pages 6 pp. –104, march 2006.
[30] Yongsoo Joo, Dimin Niu, Xiangyu Dong, Guangyu Sun, Naehyuck Chang, and Yuan
Xie. Energy- and endurance-aware design of phase change memory caches. In Pro-
ceedings of the Conference on Design, Automation and Test in Europe, DATE ’10,
pages 136–141, 3001 Leuven, Belgium, Belgium, 2010. European Design and Au-
tomation Association.
[31] T. Kawahara, R. Takemura, K. Miura, J. Hayakawa, S. Ikeda, Young Min Lee,
R. Sasaki, Y. Goto, K. Ito, T. Meguro, F. Matsukura, H. Takahashi, H. Matsuoka, and
H. Ohno. 2 mb spram (spin-transfer torque ram) with bit-by-bit bi-directional current
write and parallelizing-direction current read. Solid-State Circuits, IEEE Journal of,
43(1):109 –120, jan. 2008.
[32] N.H. Khan, S.M. Alam, and S. Hassoun. System-level comparison of power delivery
design for 2d and 3d ics. In 3D System Integration, 2009. 3DIC 2009. IEEE Interna-
tional Conference on, pages 1 –7, sept. 2009.
[33] N.H. Khan, S. Reda, and S. Hassoun. Early estimation of tsv area for power delivery
in 3-d integrated circuits. In 3D Systems Integration Conference (3DIC), 2010 IEEE
International, pages 1 –6, nov. 2010.
[34] Dae Hyun Kim, S. Mukhopadhyay, and Sung Kyu Lim. Tsv-aware interconnect length
and power prediction for 3d stacked ics. In Interconnect Technology Conference,
2009. IITC 2009. IEEE International, pages 26 –28, june 2009.
[35] Krzysztof Koźmiński. Benchmarks for layout synthesisevolution and current status. In
Proceedings of the 28th ACM/IEEE Design Automation Conference, DAC ’91, pages
265–270, New York, NY, USA, 1991. ACM.
[36] H.Y. Lee, P.S. Chen, T.Y. Wu, Y.S. Chen, C.C. Wang, P.J. Tzeng, C.H. Lin, F. Chen,
C.H. Lien, and M.-J. Tsai. Low power and high speed bipolar switching with a thin
66
reactive ti buffer layer in robust hfo2 based rram. In Electron Devices Meeting, 2008.
IEDM 2008. IEEE International, pages 1 –4, dec. 2008.
[37] Young-Joon Lee and Sung Kyu Lim. Co-optimization of signal, power, and thermal
distribution networks for 3d ics. In Advanced Packaging and Systems Symposium,
2008. EDAPS 2008. Electrical Design of, pages 163 –166, dec. 2008.
[38] Young-Joon Lee and Sung Kyu Lim. Routing optimization of multi-modal intercon-
nects in 3d ics. In Electronic Components and Technology Conference, 2009. ECTC
2009. 59th, pages 32 –39, may 2009.
[39] Sheng Li, Jung Ho Ahn, R.D. Strong, J.B. Brockman, D.M. Tullsen, and N.P. Jouppi.
Mcpat: An integrated power, area, and timing modeling framework for multicore
and manycore architectures. In Microarchitecture, 2009. MICRO-42. 42nd Annual
IEEE/ACM International Symposium on, pages 469 –480, dec. 2009.
[40] Xin Li, Yuchun Ma, and Xianlong Hong. A novel thermal optimization flow using
incremental floorplanning for 3d ics. In Design Automation Conference, 2009. ASP-
DAC 2009. Asia and South Pacific, pages 347 –352, jan. 2009.
[41] Z. Li, X. Hong, Q. Zhou, Y. Cai, J. Bian, H. H. Yang, V. Pitchumani, and C.-K. Cheng.
Hierarchical 3-d floorplanning algorithm for wirelength optimization. Circuits and
Systems I: Regular Papers, IEEE Transactions on, 53(12):2637 –2646, dec. 2006.
[42] Zhuoyuan Li, X. Hong, Qiang Zhou, Shan Zeng, J. Bian, Wenjian Yu, H.H. Yang,
V. Pitchumani, and Chung-Kuan Cheng. Efficient thermal via planning approach and
its application in 3-d floorplanning. Computer-Aided Design of Integrated Circuits
and Systems, IEEE Transactions on, 26(4):645 –658, april 2007.
[43] Zhuoyuan Li, Xianlong Hong, Qiang Zhou, Shan Zeng, Jinian Bian, Hannah Yang,
Vijay Pitchumani, and Chung-Kuan Cheng. Integrating dynamic thermal via planning
with 3d floorplanning algorithm. In Proceedings of the 2006 international symposium
on Physical design, ISPD ’06, pages 178–185, New York, NY, USA, 2006. ACM.
[44] Zuoyuan Li, Xianlong Hong, Qiang Zhou, Jinian Bian, Hannah H. Yang, and Vijay
Pitchumani. Efficient thermal-oriented 3d floorplanning and thermal via planning for
two-stacked-die integration. ACM Trans. Des. Autom. Electron. Syst., 11(2):325–345,
April 2006.
[45] C.J. Lin, S.H. Kang, Y.J. Wang, K. Lee, X. Zhu, W.C. Chen, X. Li, W.N. Hsu, Y.C.
Kao, M.T. Liu, Yiching Lin, M. Nowak, N. Yu, and Luan Tran. 45nm low power cmos
67
logic compatible embedded stt mram utilizing a reverse-connection 1t/1mtj cell. In
Electron Devices Meeting (IEDM), 2009 IEEE International, pages 1 –4, dec. 2009.
[46] Xiaohua Lou, Zheng Gao, Dimitar V. Dimitrov, and Michael X. Tang. Demonstration
of multilevel cell spin transfer switching in mgo magnetic tunnel junctions. Applied
Physics Letters, 93(24):242502, 2008.
[47] C.E. Merkel and D. Kudithipudi. Towards thermal profiling in cmos/memristor hybrid
rram architectures. In VLSI Design (VLSID), 2012 25th International Conference on,
pages 167 –172, jan. 2012.
[48] T. Nirschl, J.B. Phipp, T.D. Happ, G.W. Burr, B. Rajendran, M.-H. Lee, A. Schrott,
M. Yang, M. Breitwisch, C.-F. Chen, E. Joseph, M. Lamorey, R. Cheek, S.-H. Chen,
S. Zaidi, S. Raoux, Y.C. Chen, Y. Zhu, R. Bergmann, H.-L. Lung, and C. Lam. Write
strategies for 2 and 4-bit multi-level phase-change memory. In Electron Devices Meet-
ing, 2007. IEDM 2007. IEEE International, pages 461 –464, dec. 2007.
[49] J.H. Oh, J.H. Park, Y.S. Lim, H.S. Lim, Y.T. Oh, J.S. Kim, J.M. Shin, Y.J. Song,
K.C. Ryoo, D.W. Lim, S.S. Park, J.I. Kim, J.H. Kim, J. Yu, F. Yeung, C.W. Jeong,
J.H. Kong, D.H. Kang, G.H. Koh, G.T. Jeong, H.S. Jeong, and Kinam Kim. Full
integration of highly manufacturable 512mb pram based on 90nm technology. In
Electron Devices Meeting, 2006. IEDM ’06. International, pages 1 –4, dec. 2006.
[50] K. Ono, T. Kawahara, R. Takemura, K. Miura, M. Yamanouchi, J. Hayakawa, K. Ito,
H. Takahashi, H. Matsuoka, S. Ikeda, and H. Ohno. Spram with large thermal sta-
bility for high immunity to read disturbance and long retention for high-temperature
operation. In VLSI Technology, 2009 Symposium on, pages 228 –229, june 2009.
[51] I.H. Osman and J.P. Kelly. Meta-heuristics: theory and applications. Springer, 1996.
[52] F. Pellizzer, A. Pirovano, F. Ottogalli, M. Magistretti, M. Scaravaggi, P. Zuliani,
M. Tosi, A. Benvenuti, P. Besana, S. Cadeo, T. Marangon, R. Morandi, R. Piva,
A. Spandre, R. Zonca, A. Modelli, E. Varesi, T. Lowrey, A. Lacaita, G. Casagrande,
P. Cappelletti, and R. Bez. Novel mu;trench phase-change memory cell for embedded
and stand-alone non-volatile memory applications. In VLSI Technology, 2004. Digest
of Technical Papers. 2004 Symposium on, pages 18 – 19, june 2004.
[53] M.M. Sabry, D. Atienza, and A.K. Coskun. Thermal analysis and active cooling man-
agement for 3d mpsocs. In Circuits and Systems (ISCAS), 2011 IEEE International
Symposium on, pages 2237 –2240, may 2011.
68
[54] M.M. Sabry, A. Sridhar, D. Atienza, Y. Temiz, Y. Leblebici, S. Szczukiewicz,
N. Borhani, J.R. Thome, T. Brunschwiler, and B. Michel. Towards thermally-aware
design of 3d mpsocs with inter-tier cooling. In Design, Automation Test in Europe
Conference Exhibition (DATE), 2011, pages 1 –6, march 2011.
[55] Gerald Skidmore Scriptor. Metaheuristics and combinatorial optimization problems.
M. thesis, University of Buffalo, 2000.
[56] Jae seok Yang, K. Athikulwongse, Young-Joon Lee, Sung Kyu Lim, and D.Z. Pan.
Tsv stress aware timing analysis with applications to 3d-ic layout optimization. In
Design Automation Conference (DAC), 2010 47th ACM/IEEE, pages 803 –806, june
2010.
[57] Li Shang and R.P. Dick. Thermal crisis: challenges and potential solutions. Potentials,
IEEE, 25(5):31 –35, sept.-oct. 2006.
[58] Bing Shi and A. Srivastava. Liquid cooling for 3d-ics. In Green Computing Confer-
ence and Workshops (IGCC), 2011 International, pages 1 –6, july 2011.
[59] C.W. Smullen, V. Mohan, A. Nigam, S. Gurumurthi, and M.R. Stan. Relaxing non-
volatility for fast and energy-efficient stt-ram caches. In High Performance Computer
Architecture (HPCA), 2011 IEEE 17th International Symposium on, pages 50 –61,
feb. 2011.
[60] A. Sridhar, A. Vincenzi, M. Ruggiero, T. Brunschwiler, and D. Atienza. 3d-ice:
Fast compact transient thermal modeling for 3d ics with inter-tier liquid cooling.
In Computer-Aided Design (ICCAD), 2010 IEEE/ACM International Conference on,
pages 463 –470, nov. 2010.
[61] A. Sridhar, A. Vincenzi, M. Ruggiero, T. Brunschwiler, and D. Atienza. Compact
transient thermal model for 3d ics with liquid cooling via enhanced heat transfer cavity
geometries. In Thermal Investigations of ICs and Systems (THERMINIC), 2010 16th
International Workshop on, pages 1 –6, oct. 2010.
[62] Haihua Su, F. Liu, A. Devgan, E. Acar, and S. Nassif. Full chip leakage-estimation
considering power supply and temperature variations. In Low Power Electronics and
Design, 2003. ISLPED ’03. Proceedings of the 2003 International Symposium on,
pages 78 – 83, aug. 2003.
[63] TexasInstrument. Understanding Integrated Circuit Package Power Capabilities,
2009. http://www.ti.com/lit/an/snva509a/snva509a.pdf.
69
[64] A. W. Topol, D. C. La Tulipe, L. Shi, D. J. Frank, K. Bernstein, S. E. Steen, A. Kumar,
G. U. Singco, A. M. Young, K. W. Guarini, and M. Ieong. Three-dimensional inte-
grated circuits. IBM Journal of Research and Development, 50(4.5):491 –506, july
2006.
[65] Ming-Chao Tsai and TingTing Hwang. A study on the trade-off among wirelength,
number of tsv and placement with different size of tsv. In VLSI Design, Automation
and Test (VLSI-DAT), 2011 International Symposium on, pages 1 –4, april 2011.
[66] Jue Wang, Xiangyu Dong, and Yuan Xie. Point and discard: a hard-error-tolerant
architecture for non-volatile last level caches. In Proceedings of the 49th Annual De-
sign Automation Conference, DAC ’12, pages 253–258, New York, NY, USA, 2012.
ACM.
[67] M.J. Wolf, P. Ramm, A. Klumpp, and H. Reichl. Technologies for 3d wafer
level heterogeneous integration. In Design, Test, Integration and Packaging of
MEMS/MOEMS, 2008. MEMS/MOEMS 2008. Symposium on, pages 123 –126, april
2008.
[68] H.-S.P. Wong, Heng-Yuan Lee, Shimeng Yu, Yu-Sheng Chen, Yi Wu, Pang-Shiu
Chen, Byoungil Lee, F.T. Chen, and Ming-Jinn Tsai. Metal oxide rram. Proceed-
ings of the IEEE, 100(6):1951 –1970, june 2012.
[69] Cong Xu, Xiangyu Dong, N.P. Jouppi, and Yuan Xie. Design implications of
memristor-based rram cross-point structures. In Design, Automation Test in Europe
Conference Exhibition (DATE), 2011, pages 1 –6, march 2011.
[70] Doe Hyun Yoon, Tobin Gonzalez, Parthasarathy Ranganathan, and Robert S.
Schreiber. Exploring latency-power tradeoffs in deep nonvolatile memory hierarchies.
In Proceedings of the 9th conference on Computing Frontiers, CF ’12, pages 95–102,
New York, NY, USA, 2012. ACM.
[71] Seung Wook Yoon, Dae Wook Yang, Jae Hoon Koo, M. Padmanathan, and F. Carson.
3d tsv processes and its assembly/packaging technology. In 3D System Integration,
2009. 3DIC 2009. IEEE International Conference on, pages 1 –5, sept. 2009.
[72] Ping Zhou, Bo Zhao, Jun Yang, and Youtao Zhang. Energy reduction for stt-ram using
early write termination. In Computer-Aided Design - Digest of Technical Papers,
2009. ICCAD 2009. IEEE/ACM International Conference on, pages 264 –268, nov.
2009.
70
[73] Carl Zweben. Advances in composite materials for thermal management in electronic
packaging. JOM, 50:47–51, 1998.
