Emulation-based transient thermal modeling of 2D/3D systems-on-chip with active cooling by Garcia del Valle, Pablo & Atienza Alonso, David
This article appeared in a journal published by Elsevier. The attached
copy is furnished to the author for internal non-commercial research
and education use, including for instruction at the authors institution
and sharing with colleagues.
Other uses, including reproduction and distribution, or selling or
licensing copies, or posting to personal, institutional or third party
websites are prohibited.
In most cases authors are permitted to post their version of the
article (e.g. in Word or Tex form) to their personal website or
institutional repository. Authors requiring further information
regarding Elsevier’s archiving and manuscript policies are
encouraged to visit:
http://www.elsevier.com/copyright
Author's personal copy
Emulation-based transient thermal modeling of 2D/3D systems-on-chip
with active cooling
Pablo G. Del Valle a,b,n, David Atienza b
a Departamento de Arquitectura de Computadores y Automa´tica (DACYA), UCM, Avda. Complutense s/n, 28040 Madrid, Spain
b Embedded Systems Laboratory (ESL), EPFL, Station 11, EPFL-STI-IEL-ESL, CH-1015 Lausanne, Switzerland
a r t i c l e i n f o
Article history:
Received 3 December 2009
Received in revised form
28 July 2010
Accepted 6 August 2010
Available online 21 September 2010
Keywords:
Thermal modeling
Transient temperature analysis
FPGA emulation
2D/3D MPSoC
Active cooling
Close-loop systems
a b s t r a c t
State-of-the-art devices in the consumer electronics market are relying more and more on Multi-
Processor Systems-On-Chip (MPSoCs) as an efﬁcient solution to meet their multiple design constrains,
such as low cost, low power consumption, high performance and short time-to-market. In fact, as
technology scales down, logic density and power density increase, generating hot spots that seriously
affect the MPSoC performance and can physically damage the ﬁnal system behavior. Moreover,
forthcoming three-dimensional (3D) MPSoCs can achieve higher system integration density, but the
aforementioned thermal problems are seriously aggravated. Thus, new thermal exploration tools are
needed to study the temperature variation effects inside 3D MPSoCs. In this paper, we present a novel
approach for fast transient thermal modeling and analysis of 3D MPSoCs with active (liquid) cooling
solutions, while capturing the hardware–software interaction. In order to preserve both accuracy and
speed, we propose a close-loop framework that combines the use of Field Programmable Gate Arrays
(FPGAs) to emulate the hardware components of 2D/3D MPSoC platforms with a highly optimized
thermal simulator, which uses an RC-based linear thermal model to analyze the liquid ﬂow. The
proposed framework offers speed-ups of more than three orders of magnitude when compared to cycle-
accurate 3D MPSoC thermal simulators. Thus, this approach enables MPSoC designers to validate
different hardware- and software-based 3D thermal management policies in real-time, and while
running real-life applications, including liquid cooling injection control.
& 2010 Elsevier Ltd. All rights reserved.
1. Introduction
The power density of high performance systems continues to
increase with every process technology generation. Nowadays,
several commercial multi-processor system-on-chip (MPSoC)
architectures are available with several tens of cores, such as
IBM’s Cell [1], Sun’s Niagara T1 [2] and Tilera’s 64-core
architecture [3]. However, in these new MPSoC architectures,
power density increases the operating temperature and creates
signiﬁcant hot-spots on the die that need to be managed.
Furthermore, 3D stacking is an emerging solution to increase
the integration capabilities and frequency of forthcoming MPSoCs
[4,5], but it substantially increases further power density due to
the placement of computational units on top of each other.
Therefore, temperature-induced problems exacerbate in 3D
systems and are a major concern to be explored as early as
possible in 3D MPSoC design and integration.
To explore the hardware/software (HW/SW) thermal interac-
tion, cycle-accurate MPSoC simulators including SW thermal
models exist, based on post-processing of run-time power
consumption and ﬂoorplanning information [6–8]. However,
these complex SW environments are very limited in performance
(i.e., up to 100 kHz) due to signal management overhead and are
not interactive with thermal control systems in real-time. Thus,
they are not suitable for thermal control exploration in 2D/3D
MPSoCs running complex real-life applications. Moreover, higher
abstraction levels simulators attain faster simulation speeds, but
lose signiﬁcantly the accuracy for ﬁne-grained thermal-aware
architectural tuning or thermal modeling.
One alternative to cycle-accurate simulators is HW emulation.
Various MPSoC emulation frameworks have been proposed [9–11].
Nevertheless, they are not designed for thermal exploration and are
usually very expensive (between $100 K and $1 M) and not ﬂexible
enough for MPSoC architecture exploration since their baseline
architectures (e.g., processing cores or interconnections) are pro-
prietary, not permitting internal changes. Furthermore, no ﬂexible
interconnection interfaces between HW emulation and also no fast
thermal libraries that model active cooling behavior (e.g., liquid
cooling [12,13]) exist nowadays. Thus, thermal effects can only be
Contents lists available at ScienceDirect
journal homepage: www.elsevier.com/locate/mejo
Microelectronics Journal
0026-2692/$ - see front matter & 2010 Elsevier Ltd. All rights reserved.
doi:10.1016/j.mejo.2010.08.003
n Corresponding author at: Departamento de Arquitectura de Computadores
y Automa´tica (DACYA), UCM, Avda. Complutense s/n, 28040 Madrid, Spain.
Tel.: +34 913947541.
E-mail address: pablo.garciadelvalle@epﬂ.ch (P.G. Del Valle).
Microelectronics Journal 42 (2011) 564–571
Author's personal copy
veriﬁed in the last phases of the design process, typically when the
ﬁnal architecture and cooling components are available can be
tested in the ﬁnal system integration process, which can typically
result in very expensive redesigns.
As a result, one major design challenge is the deployment of
fast exploration methods of multiple HW and SW implementation
alternatives for 2D and 3D MPSoCs with accurate estimations
(e.g., performance, energy) that address the modeling of transient
thermal behavior to tune the ﬁnal architectures.
In this paper, we present a new HW/SW FPGA-based emula-
tion framework of the 2D/3D MPSoC architectures, which enables
realistic thermal studies at an early stage of system integration,
including active (single-phase liquid) cooling modeling, as well as
power, energy and performance constraints validation in real-
time. First, the HW components of the system are mapped on an
FPGA and statistics are extracted from three key MPSoC
architectural levels (processors, memory subsystem and inter-
connections), while real-life applications are executed. Second,
this run-time information is sent using a standard Ethernet
connection to a dynamically adaptable SW thermal modeling tool
running on a host PC. Third, this tool evaluates in real-time the
thermal behavior of the ﬁnal MPSoC design, selecting different
ordinary differential equation solvers according to the desired
accuracy in thermal exploration and simulation time of 2D/3D
chip stacks, and returns this information to the FPGA emulating
the design. This ﬁnal step creates a closed-loop thermal simula-
tion environment for 2D and 3D chips that enables testing
temperature management strategies in real-time.
The experimental results with 2D/3D MPSoCs, using real-life
case studies models of the UltraSPARC T1 [2] and other industrial
platforms from Freescale [14], Philips [15], etc., show that this
HW/SW emulation framework for transient thermal analysis can
achieve speed-ups of more than three orders of magnitude
compared to state-of-the-art cycle-accurate thermal simulators,
while keeping the accuracy of uncertainty levels of the simulated
temperature obtained with the proposed method within 3% with
respect to ﬁnite-element simulations.
The remainder of this paper is structured as follows: It starts in
Section 2 with a detailed overview of prior art in thermal
modeling and architectural simulation for 2D and 3D MPSoCs.
Then, in Section 3 the proposed HW/SW thermal emulation ﬂow
is presented. Next, in Section 4 the 3D liquid cooling model is
described. After that, in Section 5 we present the experimental
setup and results, performed with different 2D and 3D MPSoCs to
validate the simulation model and to demonstrate the ease of use
of the platform. Finally, in Section 6 we summarize the main
conclusions of this work.
2. Related work
It is widely accepted that 2D/3D MPSoCs represent a promising
solution for forthcoming complex processing systems [18]. This
has spurred research on modeling and prototyping MPSoC
designs, using both HW and SW. From the SW viewpoint,
solutions have been suggested at different abstraction levels,
enabling tradeoffs between simulation speed and accuracy. First,
fast analytical models have been proposed to prune very distinct
design options using high-level languages (e.g., C or C++) [19]. Also,
full system simulators, like Symics [20] and others [7,8], have
been developed for embedded SW debugging and can reach
megahertz speeds, but are not able to accurately capture
performance and power effects (e.g., at the interconnection level)
depending on the cycle-accurate behavior of the HW. Second,
transaction-level modeling in SystemC, in academic [21]
and industrial context [22,23] has enabled more accuracy in
system-level simulation at the cost of sacriﬁcing simulation speed
(about 100–200 kHz). Such speeds render unfeasible the transient
testing of large systems due to overly long simulation times,
conversely to the proposed 2D/3D thermal emulation framework.
Moreover, in most cases SW simulations are only limited to a
number of proprietary interfaces.
Finally, important research has been done to obtain cycle-
accurate frameworks in low-level SystemC or hardware descrip-
tion languages (HDL). Companies and universities have developed
cycle-accurate simulators using post-synthesis libraries from HW
vendors [27,28]. However, their simulation speeds (10–120 kHz)
are unsuitable for long MPSoC thermal exploration.
The most important alternative nowadays to simulation is HW
emulation. In industry, one of the most complete sets of statistics
is provided by Palladium II [9], which can accommodate very
complex systems (i.e., up to 256 Mgate). However, its main
disadvantages are its operation frequency (approximately
1.6 MHz) and cost (around $1 million). Then, ASIC integrator
[10] is much faster for architectural exploration. Nevertheless, its
major drawback is its limitation to only up to few ARM-based
cores and only AMBA interconnects. The same exploration
limitation of proprietary cores occurs with Heron SoC emulation
[24] and Zebu-XL [11], both based on multi-FPGA emulation in
the order of MHz. They can be used to validate intellectual
property blocks, but are not ﬂexible enough for fast MPSoC design
exploration or detailed statistics extraction. In the academic
world, a recent emulation platform for exploring performance of
MPSoC alternatives is TC4SOC [25]. It uses a proprietary 32-bit
VLIW core and enables exploration of interconnects by using an
FPGA, but, it does not enable detailed extraction of statistics and
performing thermal modeling at the other three architectural
levels proposed in this work, i.e., memory hierarchy, intercon-
nects and processing cores. Finally, an interesting approach that
uses FPGA prototyping to speed-up co-veriﬁcation of pure SW
simulators is described in [26], which uses a cycle-by-cycle
synchronization basis with the C/C++ SW part by using an array of
shared registers in the FPGA that can be accessed by both sides at
a speed of 1 MHz, outlining the potential beneﬁts of combined
HW–SW frameworks which is exploited in the proposed approach
to reach an emulation speed of hundreds of MHzs.
Turning our attention to thermal modeling, [6] presented a
thermal/power model for 2D super-scalar architectures. It can
predict the temperature variations between the different compo-
nents of a processor and show the expected inﬂuence in
performance. Additionally, [14,17] have investigated the impact
of temperature and voltage variation across the die of 2D and 3D
MPSoCs. Their results show that the temperature can vary by
more than 251 across the die and tiers. In all these works, a ‘‘1D’’
approximation is often assumed to evaluate the thermal behavior
[29,30]. This means that the power is uniformly produced on
active levels (or on parts of them), one per stratum. This
assumption may lead to strongly underestimated maximum
temperature. Thus, several authors [31] use this simpliﬁcation
but perform detailed simulation of 3D thermal effects due to the
presence and localization of supervias, and analyze the local (3D)
and global (1D) modeling contribution to the maximum tem-
perature, showing that thermal resistance can be higher than 1D
thermal resistance due to local 3D effects and even more ﬁne-
grain transient analysis need to be performed to avoid thermal
overestimations.
Finally, numerical thermal simulations have been carried out
to convert power dissipation distribution into a temperature
distribution in a three-dimensional integrated circuit (3D-IC) [32].
Based on the past work, the development of a fundamental
analytical model for heat transport in 3D integrated circuits is
highly desirable. Such an analytical model provides a framework
P.G. Del Valle, D. Atienza / Microelectronics Journal 42 (2011) 564–571 565
Author's personal copy
for the analysis of the general problem of heat dissipation in 3D
ICs, and will enable simple thermal design guidelines.
A key component of the 3D technology is the through-silicon
via (TSV) that enables communication between the two dies as
well as with the package. Several works have analyzed the
optimization of placement of TSVs for heat dissipation in 3D ICs
[31,32]. Other works [33] propose analytical and ﬁnite-element
models of heat transfer in 3D electronic circuits and use this
model to analyze the impact of various geometric parameters and
thermo-physical properties (through silicon vias, inter-die bond-
ing layers, etc.) on thermal performance, but at the cost of very
time-consuming simulations.
Overall, all of these works prove the importance of hot spots in
2D high-performance multi-core systems (and even more in 3D
structures), as well as the need of accurate and fast transient
temperature analysis tools for the different architectural compo-
nents of MPSoCs (cores, TSVs, etc.). Thus, the proposed emulation
method aims at estimating accurately the transient temperature
of integrated circuits implementing 2D/3D MPSoCs, including
active cooling mechanisms (e.g., liquid cooling).
3. HW/SW closed-loop thermal emulation ﬂow for 2D/3D
MPSoCs
3.1. Closed-loop emulation ﬂow
Fig. 1 depicts an overview of the instantiation of the proposed
HW/SW thermal emulation environment for a Freescale-based
3-core MPSoC [14], implemented onto a Xilinx Virtex-V FPGA,
modeling the transient thermal behavior of the system while
executing multiple multimedia applications (e.g., SW-deﬁned
radio, video streaming, etc.) and a multi-processor operating
system (OS). The system can be scaled to any number of cores sub-
systems by using appropriate FPGAs.
The instantiation ﬂow of the proposed HW/SW transient
thermal emulation environment is created in four steps:
1. The HW part of the emulator is deﬁned. It implies synthesizing
the MPSoC architecture onto a certain FPGA target technology
(a multi-FPGA environment exists, if it is necessary).
2. Speciﬁc hardware sniffers (transparent to the normal MPSoC
operation) are included in the FPGA that monitor particular
signals of each component of the target 2D/3D architecture [15].
3. The run-time power information (frequency/voltage and leakage
power) of each system component is sent to a SW thermal
modeling tool running on a host PC that evaluates in real-time
the thermal behavior of the ﬁnal 2D/3D MPSoC design using a
thermal model developed for bulk silicon chip systems [15], and
calculates the temperature of each cell in the ﬂoorplan of the
emulated system. The model includes different types of ordinary
differential equation solvers [16] (Forward Euler 1st order,
Crank–Nicholson 2nd order method, Runge–Kutta 4th order
method, etc.), which enables multiple trade-offs between
accuracy and thermal modeling time in 2D/3D chip stacks.
4. The temperatures calculated by the SW thermal library are
sent back to the FPGA emulating the MPSoC system (see link
between the host PC and the FPGA on the right side of Fig. 1)
and are stored in registers of the FPGA that emulate the
presence of thermal sensors in the target MPSoC in certain
positions of the ﬂoorplan. The registers containing the
predicted temperature are mapped in the memory hierarchy,
so that they can be accessed from the running multi-processor
OS, providing real-time temperature information, and making
up a closed-loop thermal monitoring system
The closed loop scheme, mentioned in step four, is really
needed. First obvious advantage is that the system becomes
aware of its own temperature (through the reading of the
integrated thermal sensors), thus, the on-board OS can make
decisions at runtime based on this information, e.g., reducing the
frequency/voltage of some of the cores, migrating tasks, shutting
down processors, etc. The second, and not so straightforward,
implication of the closed loop is the effects the temperature has
on the thermal behavior of MPSoCs. It would be impossible to
perform an accurate estimation of the temperature without this
knowledge. The thermal model needs to be constantly adjusted as
the temperature varies, since temperature affects the leakage, and
leakage directly affects power consumption of the chip.
3.2. Modeling a 3D chip with an FPGA
To understand how we model a 3D architecture using a 2D
FPGA, take a look at the 3D system depicted in Fig. 1a. It consists
of two layers: In the upper layer there is a core that can access
two local memories: ‘‘A’’ (in the same layer), and ‘‘B’’ (located in
the lower layer). An access to memory A will take less time and
will consume less power than accessing B.
When emulating this system in an FPGA, we have to map
everything in a 2D layout. If we abstract the ﬂoorplanning
information, what is different in the behavior of systems ‘‘a’’
and ‘‘b’’ is the latency. Assume, for example, that accessing
memory A takes 1 cycle, whereas accessing memory B takes 6
cycles. We instantiate in the FPGA a processor connected to two
memories symmetrically and, then, we simply add a new element
that will simulate this extra latency (red oval in the ﬁgure). The
behavior of the system will be the same as the one in the 3D case.
From the point of view of the thermal model, the data interface
remains the same: we only receive ‘‘activity numbers’’ associated
to the different elements of the ﬂoorplan (number of accesses to
memory X, number of transactions in bus Y, etc.), so there is no
difference.
Nevertheless, when calculating temperatures, the thermal model
knows that the bus of memory A is different from the bus of memory
B: different materials, capacitances, etc. It should be noted that
inside the FPGA it is completely irrelevant where we place memory
B, as far as the behavior is the same (number of cycles to access, type
of the bus, etc.), the actual ﬂoorplan of the ﬁnal chip is in the thermal
model. Any of the positions suggested in Fig. 2b as ‘‘Mem i’’ would
be valid: no matter at what side of the processor we place the
memory in the FPGA, since it will be modeled as being underneath
it, in a different layer, as it appears in Fig. 2b.Fig. 1. Overview of the 2D/3D MPSoC thermal emulation framework.
P.G. Del Valle, D. Atienza / Microelectronics Journal 42 (2011) 564–571566
Author's personal copy
4. Active cooling model for 2D/3D MPSoCs
Modeling of the 3D stacked architecture with liquid cooling
can be accomplished in three steps: (i) deﬁning a grid-level
thermal resistor–capacitor (RC) network of 2D/3D chip stacks,
(ii) adding models for the interlayer material and TSVs distribu-
tion and (iii) modeling water ﬂowing in independent thermal cell
layers, which represent microchannels in the stacks. These three
steps are detailed in this section.
4.1. RC network for 2D/3D stacks
2D/3D thermal modeling can be accomplished using an
automated model that forms the RC circuit for certain grid
dimensions. In this work, the model proposed in [15] is used,
which has been extended to include 3D modeling capabilities as
discussed in [17]. The extension for the existing multi-layered
thermal modeling provides a new interlayer material model to
include the TSVs (cf. Section 4.2) and the microchannels
(cf. Section 4.3). To model the heterogeneous characteristics of
the interlayer material including the TSVs and microchannels, we
introduce two major differences to other works: (1) as opposed to
having a uniform thermal resistivity value of the layer, our
infrastructure enables having various resistivity values for each
grid, (2) the resistivity value of the cell can vary at runtime.
To have a better view of what the system looks like, take a look
at Fig. 3. In this ﬁgure, we can see that different tiers contain the
processing cores and memories, interleaved with interlayer
material, where the microchannels and TSVs are.
The interlayer material is divided into a grid, where each grid
cell except for the cells of the microchannels has a ﬁxed thermal
resistance value depending on the characteristics of the interface
material and TSVs. As presented in [34,35] for our considered
TSV density (less than 1% of total chip area), we assume a
homogeneous via distribution on the die, and calculate the
combined resistivity of the interface material based on the TSV
density. Then, the thermal resistivity of the microchannel cells is
computed based on the liquid ﬂow rate through the cell, and the
characteristics of the liquid at runtime.
4.2. Through-silicon-vias modeling
In order to model the effect of TSVs on the thermal behavior of
3D MPSoCs, we performed a complete study to determine which
modeling granularity is required.
As explained in [36,37], the thermal conductivity of a thermal
via region is determined by its density of thermal vias. In Fig. 2 of
[37], we can see the resistivity as a function of vias density.
As shown in [34], for small values of the TSV density (up to 1–2%),
the effect on the temperature proﬁle is limited to only a few
degrees. Therefore, through the rest of this paper, we can safely
assume that the effect of the TSV insertion to the heat capacity of
the interface material is negligible, since we keep the area
overhead of TSVs below 1%, a very small percentage of the
interface material area. This TSVs thermal interference, however,
can also be used as an advantage to control on-chip temperatures,
through thermal via planning [35]. We assign a TSV density to
each unit based on its functionality and system design choices
(a crossbar structure requires a high TSV density, while a
processing core does not require any modeling of TSV inter-
ference).
The TSV dimensions are set to 10 mm10 mm, and a minimum
spacing of 10 mm from each side of the TSV is employed. In fact,
the experiments developed in the calibrated 3D stack thermal
simulation model of the 5-tier stack (see experiments Section 5.3,
Linear model vs. ﬁnite-element simulations) indicate that a block-
level granularity provides very similar results to providing the
exact locations of TSVs, while it has a very important complexity
reduction in transient thermal analysis.
4.3. Active (liquid) cooling modeling
Active cooling properties (i.e., liquid cooling) have been
modeled using additional layers of thermal cells with different
cooling thermal conductance and resistance properties than
silicon and metal layers, using IBM’s technology [12,13]. In such
a 3D system, the local junction temperature can be accurately
computed with conjugate heat and mass transfer modeling. The
complexity of the resulting model (for the ﬂuid, only) is in the
range of billion nodes to be simulated. Instead of using this
expensive, in terms of computation, method, we propose (and
validate) an alternative model based on resistive networks. It runs
at a fraction of the computation requirements, while keeping the
loss in accuracy negligible, thus, making it suitable for our real-
time emulation platform.
We characterize the chip stack using a porosity model, i.e., the
cavities are seen as 2D-porous media, and we study the ﬂuid–
solid thermal ﬁeld-coupling. The parameters we have to calculate,
then, are the equivalent thermal resistance and the permeability
of the cells that form the net.
In Fig. 4a, we observe a single heat transfer unit cell of the
resistor network representing the thermal ﬁeld-coupling of the
2D-porous media (Tﬂuid) with the adjacent 3D-solid walls (Twall).
Fig. 2. (a) Modeled ﬂoorplan and (b) FPGA components mapping.
Fig. 3. Detail of the microchannels and TSVs in the 3D stack chip.
P.G. Del Valle, D. Atienza / Microelectronics Journal 42 (2011) 564–571 567
Author's personal copy
Here, we represent ﬁeld-coupling to transfer heat from solid to
ﬂuid (as green resistors) and solid to solid (as brown resistors),
the convective thermal resistance (Rconv) and the cavity perme-
ability (k).
The permeability is deﬁned through Darcy’s law as
rp¼m
k
v
!
Darcy ð1Þ
with the linear dependence of pressure gradient (rp) on the
superﬁcial velocity, also called the Darcy velocity (vDarcy), and the
dynamic viscosity (m) as material coefﬁcient. The Darcy velocity is
the average ﬂuid velocity (vbulk) in the cavity multiplied with the
cavity porosity (e):
v
!
Darcy ¼ e v!bulk ð2Þ
The cavity porosity is the ratio of the cavity ﬂuid volume (Vﬂuid)
to the total cavity volume including the ﬂuid and solid part (Vtot):
e¼ Vfluid
Vtot
ð3Þ
The projected convective thermal resistance (Rconv) mapping
the heat transfer on a single cavity side is computed by
Rconv ¼
TwallTfluid
_q1
ð4Þ
with average wall (Twall) temperature and ﬂuid (Tﬂuid) tempera-
ture, and the heat ﬂux (q1) dissipated on one cavity wall in case of
a symmetric heat ﬂux boundary conditions.
The solid–solid (tier to tier) conductive thermal resistance is
deﬁned as
Rcond ¼
tcavity
ksolidð1eÞ
ð5Þ
with cavity thickness (tcavity), pin or channel wall thermal con-
ductivity (ksolid) and porosity (e). The permeability and convective
thermal resistance of a microchannel at fully developed boundary
layers is independent of the Reynolds number and ﬂuid velocity,
respectively.
The values used to model the microchannel cells are depicted
in Table 1. As shown in Fig. 4b, the whole channel is modeled by
replicating this discrete element. As ﬂuid advances through the
channel, it removes the excess of heat from the adjacent walls.
5. Experimental setup and results
In this section, we ﬁrst explain through different experiments
how the proposed 3D thermal model was validated, including
the liquid ﬂuid modeling aspect. Then, to conclude, we present a
complete use case that illustrates how designers can use the
proposed emulation framework to speed-up their design phase of
3D MPSoCs.
5.1. 3D thermal model validation
The initial 2D RC thermal model was validated using experi-
mental data from industrial partners ([14,15,37]) for the manu-
facturing technologies of 2D MPSoCs. In order to calibrate the 3D
version of the thermal library, we manufactured a 5-tier 3D chip
stack (see Fig. 5), and carried out exhaustive experiments on it to
characterize the possible inaccuracy with respect to our model,
and tune it accordingly.
The structure of each of the layers of our stacked chip can be
observed in Fig. 6. The ‘‘Di’’ elements are heat sources, modeling
cores in a typical 3D MPSoC (10 microheaters of 1 mm2 per layer).
Surrounding each core, there is a thermal sensor to monitor the
temperature inside the stack and check the heat dissipated and
the heat interactions between neighbouring cores. These sensors
(i.e., thin ﬁlaments around the heaters) are resistance tempera-
ture detectors: the temperature of the heater creates a variation in
the resistance of the sensor. Then, the temperature can be
obtained by injecting a test (ﬁxed) current, observing the voltage
drop at both extremities of the sensor and applying the resistivity
temperature dependence of the metal (platinum).
In the process to validate the 3D thermal model, we ﬁrst
veriﬁed the correct behavior of the heaters and sensors. Next, we
studied the lateral (intra-layer) and vertical (inter-layer) heat
propagation. In all cases, the ﬁgures show comparative results
between the measurements in the different sensors and the
simulation. In the following ﬁgures, D stands for device (see Fig. 6),
Fig. 4. (a) Heat transfer unit cell and (b) modeling the whole channel.
Table 1
Effective model parameters for interconnect pitch of 100 mm.
Microchannel (CH): test vehicle dimensions—101 lm height, 46 lm wall
width
e¼0.540
j¼8.76E–11 m2
Rcond¼7.3 K mm2/W
Rconv¼18.76 K mm2/W
Micro-Heaters
and
Temperature Sensors
Bonding
Pads
Epoxy with
alumina particles
Fig. 5. Manufactured 5-tier chip stack for 3D thermal library validation.
P.G. Del Valle, D. Atienza / Microelectronics Journal 42 (2011) 564–571568
Author's personal copy
H stands for heater (Hi is the heater around Di), and L stands for
layer.
5.1.1. Heaters–sensors validation
In order to verify the correct thermal behavior of the heaters
and sensors, we inject different currents and read back the
temperature measured in the corresponding sensors. In the
following ﬁgures, we indicate the consumed amount of Watts,
which is derived by the measured currents and the resistances of
the heaters of the 3D stack case study. First, in Fig. 7 we show the
thermal validation results of device D1 (S+H: Heater and sensor
are at the same location) for the different layers. The Y axis shows
the change in temperature with respect to ambient temperature
(set to 293 K.).
We repeated the same experiments for different devices in the
3D-IC stack and Table 2 outlines the error of the model. As this
table indicates for D1 and D2, the modeling error is very small
(2.4%, as worst case, at 11.9 W power value), and lower layers
show slightly better accuracy. Indeed, with higher layers, it is
more difﬁcult to estimate the values, since the convection effect
makes the temperature depend on the underlying layers.
5.1.2. Intra-layer validation
In order to characterize the lateral heat diffusion in a given
layer of the 3D stack, one heater (in the example, see Fig. 6, D02 in
Layer 2: D2L2) is excited with different current levels and, for
each level, measurements are made at the different sensors
within the same die. Thus, results in Fig. 8 provide information on
the behavior of the lateral temperature distribution as function of
the distance of the sensor from the heat source within a layer,
using both measured values and simulated ones for different
power ﬁgures. Thus, the ﬁrst two values in the legend show the
case when 1.251 W are burned in the heater; the ﬁrst one (solid
line) is the value obtained in the thermal model simulator (S),
while the second one (dashed line) is the measurement from the
chip (M).
5.1.3. Inter-layer validation
The heater D2L2 is again excited with different current levels.
In this case, for each current level, temperature measurements are
made at the sensors in devices D02 of Die 2, Die 3, Die 4 and Die 5.
In Fig. 9, then, we observe the behavior of the temperature
distribution as function of the vertical distance of the sensor from
the heat source in different layers of the 3D stack. The graph
compares again simulation results (solid line) versus measure-
ments (dashed line) for different power ﬁgures.
Overall, the percentage of average errors between the
simulated and measured thermal resistance using the proposed
3D RC-based thermal model stays under 1% for all the
Fig. 6. One layer of the 3D stack chip: microheaters and sensors.
Fig. 7. Validation of thermal model in component D1, indicating measured
temperature change with respect to the ambient (both in simulation and in 3D
chip measurements).
Table 2
Summary of thermal model validation for different heaters and sensors.
Simulation results vs measurements
Power (W) D1L2 (%Error) D1L4 (%Error) D2L2 (%Error) D2L4 (%Error)
3.0 0.2 1.3 0.5 0.7
4.7 0.4 1.9 0.5 1.2
6.7 0.8 2.2 0.5 1.6
9.1 0.9 3.1 0.8 2.0
11.9 1.5 3.7 0.6 2.4
Fig. 8. Lateral heat transfer measurement indicating measured temperature
change with respect to the ambient.
P.G. Del Valle, D. Atienza / Microelectronics Journal 42 (2011) 564–571 569
Author's personal copy
sensor–heater conﬁgurations in the intra-layer case, and less than
3% in the inter-layer case.
In the next section, we focus on the modeling of the liquid
cooling channels, and compare our efﬁcient ﬂuid modeling
method against the classical ﬁnite-element simulation.
5.2. Linear RC model vs. ﬁnite-element simulations
In this experiment it has been evaluated the accuracy of the
proposed RC-based thermal model with respect to the ﬁnite-
element-based transient thermal analysis of 3D liquid cooling-
based MPSoCs. To this end, it has been compared the temperature
evolution at the junction (cf. Eq. (2)) using ﬁnite-element
simulations [13] (red solid line) with the estimated temperature
of the linear model we developed (yellow dashed line).
The results are shown in Fig. 10, which indicate that the
variations between both types of simulations are less than 1.5% on
average (encircled area). Furthermore, while the proposed
emulation framework using the simple liquid cooling model for
straight channels can calculate the junction temperature evolu-
tion in the order of few milliseconds, the detailed ﬁnite-element
simulation can take few hours. Thus, it illustrates the potential of
linear thermal estimation methods for simple geometries of liquid
microchannels using a laminar ﬂow regime.
5.3. Real-life framework test case
The proposed HW/SW thermal emulation framework for 2D/3D
MPSoCs has been compared with different SW thermal libraries for
2D/3D MPSoCs [6,7,17], while running intensive SW processing
kernels.
Detailed temperature maps of the modeled system can be
extracted at any particular transient moment of the emulation of a
certain 3D MPSoC (see Fig. 11). The obtained results are depicted
in Fig. 12 and show signiﬁcant speed-ups with respect to state-of-
the-art temperature estimation frameworks [14–17]. In particular,
these results outline that the proposed modeling approach scales
signiﬁcantly better than state-of-the-art SW simulators for the
transient thermal analysis. In fact, the results of the exploration of
2D thermal behavior on a commercial 8-core MPSoC [2] have
shown that the proposed thermal emulation can achieve speed-
ups of more than 800 with respect to thermal simulators.
Moreover, the thermal exploration of 3D MPSoCs with active
cooling (liquid) modeling shows even larger speed-ups (more
than 1000 ) due to power extraction and thermal synchroniza-
tion overhead in thermal simulators [6,7,17].
6. Conclusions
2D and emerging 3D MPSoC architectures have been proposed
as a promising solution to exploit the available area in forth-
coming computing systems. In this paper, we have presented a
new HW/SW FPGA-based emulation framework that enables the
rapid analysis of run-time thermal behavior in 2D/3D MPSoCs
with active liquid cooling. The experimental results have shown
that the proposed framework obtains detailed transient thermal
Fig. 9. Vertical heat transfer measurement indicating measured temperature
change with respect to the ambient.
Fig. 10. Temperature evolution at the junction in liquid-cooling based systems
using the ﬁnite-element simulation and the proposed model for straight liquid
cooling channels.
Fig. 11. Transient thermal map of the 3D MPSoC emulated system.
3-core 2D
MPSoC [14]
4-core 2D
MPSoC [15]
8-core 2D
MPSoC [17]
8-core 3D
MPSoC [17]
8-core 3D
MPSoC [17]
1400
1200
1000
800
600
400
200
0
Fig. 12. Simulation speed-ups of the proposed HW–SW thermal emulation
framework for transient thermal analysis with respect to state-of-the-art 2D/3D
thermal simulators.
P.G. Del Valle, D. Atienza / Microelectronics Journal 42 (2011) 564–571570
Author's personal copy
exploration with a speed-up of more than 1000 with respect to
cycle-accurate MPSoC simulators, even more when active (liquid)
cooling effects are considered in the overall thermal system
analysis. Furthermore, almost no loss in thermal estimation
accuracy (less than 3%) is experienced with respect to classical
(and very time-consuming) ﬁnite-element simulations. Overall,
this HW/SW thermal emulation approach is a promising mechan-
ism to perform long-time transient behavior characterization in
2D and 3D MPSoC stacks. We have presented a real-life case study
where the proposed framework can be efﬁciently used to evaluate
the performance of liquid cooling in a 5-tier 3D chip.
Acknowledgments
This research has been partially funded by the Nano-Tera.ch
RTD Project CMOSAIC (Ref. 123618), which is ﬁnanced by the
Swiss Confederation and scientiﬁcally evaluated by SNSF. This
work has also been supported by the Spanish Government Grant
TIN2008-00508 and the MEC Consolider Ingenio CSD00C-07-
20811 Grant of the Spanish Council of Science and Innovation.
References
[1] D. Pham, et al., Design and implementation of a ﬁrst-generation cell
processor, Proc. ISSCC (2005).
[2] P. Kongetira, et al., Niagara: a 32-way multithreaded SPARC processor, IEEE
Micro (2005).
[3] Tilera Corporation, Tilera’s 64-core architecture, 2008, /www.tilera.com/
products/processors.phpS.
[4] W. Davis, et al., Demystifying 3D ICs: the pros and cons of going vertical, IEEE
Des. Test Comput. (2005).
[5] M. Healy, et al., Multiobjective microarchitectural ﬂoorplanning for 2D and
3D ICs, IEEE Trans.CAD (2007).
[6] K. Skadron, et al., Temperature-aware microarchitecture: modeling and
implementation (hot-spot simulator), IEEE TACO (2004).
[7] G. Paci, et al., Exploring temperature-aware design in low-power MPSoCs,
Proc. Des. Autom. Test Eur. (2006).
[8] M.-N. Sabry, High-precision thermal models, IEEE TCPT (2005).
[9] Cadence Palladium II, 2005, /http://www.cadence.comS.
[10] ARM integrator AP, 2004, /http://www.arm.comS.
[11] Emulation Engineering, Zebu models, /http://www.eve-team.comS.
[12] T. Brunschwiler, et al., Direct liquid-jet impingement cooling with micron-
sized nozzle array and distributed return architecture, Proc. ITHERM (2006).
[13] T. Brunschwiler, et al., Interlayer cooling potential in vertically integrated
packages, Microsyst. Technol. (2008).
[14] F. Mulas, et al., Thermal balancing policy for streaming computing on
multiprocessor architectures, Proc. Des. Autom. Test Eur. (2008).
[15] D. Atienza, et al., A fast HW/SW FPGA-based thermal emulation framework
for multi-processor system-on-chip, Proc. DAC (2006).
[16] B. Richard, et al., Numerical Analysis, Brooks Cole, 2000.
[17] A.K. Coskun, et al., Dynamic thermal management in 3D multicore
architectures, Proc. Des. Autom. Test Eur. (2009).
[18] A. Jerraya, et al., in: Multiprocessor SoCs, Morgan Kaufmann, 2005.
[19] G. Braun, et al., Processor/memory co-exploration on multiple abstraction
levels, Proc. Des. Autom. Test Eur. (2003).
[20] P.S. Magnusson, A Simics, et al., Full system simulation platform, IEEE
Comput. (2002).
[21] P. Paulin, A Stepnp, et al., System-level exploration platform for network
processors, IEEE Des. Test Comput. (2002).
[22] Coware, Convergence and Lisatek product lines, 2006, /http://www.coware.
comS.
[23] ARM, PrimeXSys Platform Architecture and Methodologies, White Paper,
2005, /http://www.arm.com/S.
[24] Heron Engineering, SoCemulation, 2004, /http://www.hunteng.co.ukS.
[25] M.D. Nava, et al., An open platform for developing MPSoC, IEEE Comput.
(2005).
[26] Y. Nakamura, A fast HW/SW co-veriﬁcation method for SoC by using a C/C++
simulator and FPGA emulator with shared register communication, Proc. DAC
(2004).
[27] Mentor Graphics, Platform express and primecell, 2005, /http://www.
mentor.com/S.
[28] Synopsys, Realview Max-sim ESL Exploration Framework, 2004, /http://
www.synopsys.com/S.
[29] T.-Y. Chiang, et al., Thermal analysis of heterogeneous 3-D ICs with various
integration scenarios, Proc. IEDM (2001).
[30] A. Rahman, et al., Thermal analysis of three-dimensional integrated circuits,
Proc. IITC (2001).
[31] P. Leduca, et al., Challenges for 3d IC integration: bonding quality and thermal
management, Proc. IITC (2007).
[32] K. Puttaswamy, et al., Thermal analysis of a 3d die-stacked high-performance
microprocessor, Proc. GLSVLSI (2006).
[33] A. Jain, et al., Thermal modeling and design of 3D integrated circuits, Proc.
ICTTPES (2008).
[34] C. Zhu, Z. Gu, L. Shang, R.P. Dick, R. Joseph, Three-dimensional chip-
multiprocessor run-time thermal management, IEEE Trans. CAD (2008).
[35] J. Cong, et al., Thermal via planning for 3-D ICs, Proc. ICCAD (2005) 745–752.
[36] B. Goplen, et al., Placement of thermal vias in 3-d ICs using various thermal
objectives, IEEE Trans. CAD (2006).
[37] K. Ayse, Coskun et al., Dynamic thermal management in 3d multicore
architectures, Proc. Des. Test. Eur. (2009).
P.G. Del Valle, D. Atienza / Microelectronics Journal 42 (2011) 564–571 571
