Exploring Temperature-Aware Design of Memory Architectures in VLIW Systems by Ayala, Jose L. et al.
Exploring Temperature-Aware Design of Memory Architectures in VLIW
Systems
Jose´ L. Ayala1, Anya Apavatjrut2, David Atienza3,4, Marisa Lo´pez-Vallejo1
1Departamento de Ingenierı´a Electro´nica
Universidad Polite´cnica de Madrid (Spain)
Email: {jayala, marisa}@die.upm.es
2Department of Telecommunication Services and Usage
INSA (Lyon, France)
Email: anya.apavatjrut@insa-lyon.fr
3Departamento de Arquitectura de Computadores y Automa´tica
Universidad Complutense de Madrid (Spain)
4LSI
Ecole Polytechnique Fe´de´rale de Lausanne (Switzerland)
Email: david.atienza@epfl.ch
Abstract
This paper presents a thermal model to analyze the tem-
perature evolution in the shared register files found on
VLIW systems. The use of this model allows the analysis
of several factors that have an strong impact on the heat
transfer: layout topology, placement and memory accesses.
Finally, some relevant conclusions are obtained after ana-
lyzing the thermal behavior of several multimedia applica-
tions.
1 Introduction
As technology scales, higher power consumption cou-
pled with smaller chip area will result in higher power den-
sity, which in turn will lead to higher power temperature on
the chip [1, 2]. In fact, extrapolating the changes in micro-
processor organization and the device miniaturization, one
can project future power density to 200W/cm2 [3]. This
requires extensive efforts on cooling techniques which have
shown to be complex and expensive.
While hardware solutions to temperature management
problems are very important, software can also play an im-
portant role because it determines the circuit components
exercised during the execution and the period of time for
which they are used. In particular, compilers and source
code transformations determine the data and instruction ac-
cess patterns of applications, what shapes the power density
profile. Also, the topology of the hardware modules and the
placement of these components determine the temperature
behavior.
In this paper we present a complete parameterized ther-
mal characterization of one of the hardware modules that
can be found in a VLIW architecture, the shared register
file. The experimental approach analyzes the effect of the
topology of this device, as well as the placement in the chip
layout, in the temperature behavior if the chip when differ-
ent applications are run.
The contributions of this paper are:
1. Definition of a mathematical model to analyze the tem-
perature behavior of the registers found inside the reg-
ister file of a VLIW architecture. This model is inte-
grated in a complete simulator of VLIW architectures
in order to use the bus activities and register file ac-
cesses as input parameters for the model.
2. Analysis of the effect on the temperature behavior of
different register file topologies, module placements
and register access pattern.
3. Characterization of several multimedia applications to
evaluate the common characteristics in terms of tem-
perature behavior.
This paper is composed as follows: Section 2 presents
the previous relevant works in this topic, while our pro-
posed methodology and thermal model are briefly explained
in Section 3. Also, some optimization policies are presented
International Workshop on Innovative Architecture for Future generation Processors and Systems 2007
1527-1366/08 $25.00 © 2008 IEEE
DOI 10.1109/IWIA.2007.7
817 7
Authorized licensed use limited to: IEEE Xplore. Downloaded on November 20, 2008 at 19:04 from IEEE Xplore.  Restrictions apply.
in Section 4. Finally, Section 5 covers some preliminary re-
sults.
2 Related Work
In recent years there has been an increasing interest to
provide a detailed die temperature distribution [4, 5]. In
these works, the authors present different detailed full chip
thermal models. All these models have detailed temperature
distribution across the silicon die and can that be solved ef-
ficiently. However, it has been reported recently that the
results achieved by these models present inaccuracy prob-
lems [6, 7]. All these works provide an analytical method
for studying the thermal distribution in the die of high per-
formance processors but they require the complete knowl-
edge of the layout details. Moreover, these models are con-
strained to the processor layout and cannot be easily ex-
tended to different target architectures.
There are other approaches which relay in dynamic mea-
sures to characterize the thermal behavior of the chip [8].
These techniques, opposite to ours, require not only the
complete knowledge of the target architecture but also the
capability to modify it. Some other approaches like [9]
or [10] also propose techniques based on electrical mea-
sures to develop a power consumption model. However,
these works have not dealt with the thermal behavior of the
chip. Our previous research work [11, 12] has also targeted
the thermal modeling of systems. This paper presents a dif-
ferent approach based on simulation that increases the gran-
ularity in the register file of the memory unit and explores
several factors with impact on the temperature behavior, like
the topology layout or placement.
Our work is based on the thermal model presented
at [13,14] and extends its capabilities, modeling the thermal
behavior inside the register file when different placements,
topologies and running benchmarks are used.
3 Thermal Model
For the development of the thermal model, a well known
analogy between the electrical circuits and the thermal
sources is exploited. The silicon die and heat spreader is
composed in elementary cells in a cubic shape. The tem-
perature for every cell is computed using an RC model. The
size of the cell trades-off the simulation speed with the ther-
mal accuracy.
Each cell is associated with a thermal capacitance and
five thermal resistances. Four resistances are used to model
the horizontal thermal spreading, whereas the fifth is used
to model the vertical thermal behavior. The thermal con-
ductivity (horizontal and vertical) and capacitance, respec-
tively, of each elementary cell are computed as follows:
Bottom
EW
S
Top
N
h
w
l
Ghor = KG(Si/Cu) ×
(
h× w
l
)
Gver = KG(Si/Cu) ×
(
l × w
h
)
C = KC(Si/Cu) × l × h× w
where KG(Si/Cu) (thermal conductivity for silicon or cop-
per), KC(Si/Cu) (thermal capacitance per volume unit for
silicon or copper), l (cell length), w (cell width), h (cell
height).
With this RC characterization, every cell is connected
with the cells in the surroundings. The heat dissipation of
each block is modeled as a source connected to the current
node. A thermal circuit, which is similar to an electrical cir-
cuit, is created and can be solved by a node voltage analysis.
As a result, the temperature of each block is obtained.
3.1 Register File Modeling
As was mentioned before, one of the goals of this work
is to increase the model granularity by focusing the analy-
sis on the temperature behavior of the registers inside the
register file. To accomplish such goal, the register file is
supposed to be represented as a N ×M matrix and every
register belongs to one of the elementary cells. Therefore,
the thermal resistance and capacitance (R and C) for every
elementary cell and every specific floorplan, have to be cal-
culated.
3.1.1 Elementary resistances calculation
Since the total resistance and total capacitance of the device
is known in advance, the register file can be decomposed
into smaller units. Each unit is associated with its own re-
sistance and capacitance as shown in Figure1.
From Figure 2, the total resistance for a cell is
Rcell = R+
R
3
=
4R
3
82
Authorized licensed use limited to: IEEE Xplore. Downloaded on November 20, 2008 at 19:04 from IEEE Xplore.  Restrictions apply.
Now, from Figure 3, the circuit can be decomposed in a
matrix of N rows by M columns of registers, and the resis-
tance of every row can be calculated as follows.
Rcell,M =
[
(Rcell,M−1 +R)||R2
]
+R
=
[
(Rcell,M−1 +R)× R2
Rcell,M−1 + 3R2
]
+R
Supposing that Rcell,M−1 = SM−1R and Rcell,M =
SMR, then
SM =
{[
SM−1 + 1
SM−1 + 1.5
]
× 0.5
}
+ 1
Considering that each row is parallel with the others, the
total resistance for the device can be calculated dividing by
the number of rows.
Rtot =
Rcell,M
N
The resistance of each register can be computed as
R =
NRtot
SM
3.1.2 Elementary capacitances calculation
The total capacitance of the circuit can be calculated by con-
sidering each elementary capacitance to be parallel with the
others (see Figure 4).
The total capacitance can be computed for N rows and
M columns in parallel as
Ctot = C ×N ×M
The capacitance of every register can be computed as
C =
Ctot
N ×M
Once the resistance and capacitance for each elementary
cell are known, the size of the elementary cell can be calcu-
lated supposing that it is a quadratic cube by the expression
size =
R
KG(Si/Cu)
These last expressions are integrated in the VLSI simu-
lator in order to retrieve the thermal behavior for every reg-
ister in the register file for different placements and topolo-
gies.
From [13], we employ the same expression to compute
the temperature evolution once the technology factors are
calculated and the activities are obtained by the simulator.
Tc(n+ 1)× 236 = Tc(n) + ((cap× EC × 262)× act) +
+((A× 226 −B × 226 × (Tc(n))×
×(Tn(n)× 236 − Tc(n)× 236)))/(226)
where Tc(n) is the temperature at step n, Tc(n + 1) is the
temperature at step n + 1, Tn(n) is the neighbor cell tem-
perature, cap×EC is the temperature difference due to the
activities, act is the activity factor,A is the linear coefficient
and B is the quadratic one.
4 Optimization Policies
One of the analysis that has been performed to optimize
the thermal behavior of the register file is the selection of
83
Authorized licensed use limited to: IEEE Xplore. Downloaded on November 20, 2008 at 19:04 from IEEE Xplore.  Restrictions apply.
NoC 6x6
switch
p1
8KB
dCache
p1
8KB
p1
8KB
dCache
8KB
private
32 kB
memory
NoC 6x6
switch
private
32 kB
memory
p1
private
32 kB
memory
private
32 kB
memory
NoC 6x6
switch
8KB
dCache
8KB
private
32 kB
memory
dCache
8KB8KB
NoC 6x6
iCache
p2
iCache
p2
p3
switch
p3p2
p2
iCache
p3 p3
p4
p4
iCache
p4 p4
NoC
interface
processor 1 processor 2
processor 4
processor 3
different placement locations for the device. The tempera-
ture achieved for every register in the register file is a factor
of the number of accesses to the register, the temperature
reached by the neighboring registers, and the thermal trans-
fer promoted by the hot functional units close to the regis-
ter file. All these parameters are considered by the thermal
model to characterize the temperature profile.
The baseline architecture devised for the set of experi-
ments resembles a common VLIW system with four pro-
cessing cores, a shared memory subsystem, a shared regis-
ter file and a communication network (see Figure 5). The
layout of the system is configured in a text file where the
placement and size of these modules are coded. One of
the configuration files specifies the placement of the lay-
out modules with a letter for every cell (’m’ for the memory
cell, ’r’ for the register file cell, etc.), while the other con-
figuration file provides the size of these cells.
This set of experiments studies the effect of the place-
ment by the selection of four different positions for the reg-
ister file in the layout of the system. These positions are:
close to the processing units (position 1), close to the mem-
ory devices (position 2), near the border of the chip (posi-
tion 3) and surrounded by cooler devices (position 4)
For every one of these placements, the thermal map of
the register file is acquired supposing an homogeneous ac-
cess pattern. Figure 6 shows these thermal maps.
As can be seen, when the register file is surrounded by
hot devices as memories (position 2), or close to the pro-
cessing units (position 1), the temperature of the close reg-
isters is increased due to the thermal diffusion and gradients
of temperature that can appear between the hotter and cooler
devices, destroying the silicon.
On the other hand, when the register file is placed near
the border of the chip or it is surrounded by cooler devices
(positions 3 and 4), the heat can be transferred to the outside
of the system and the temperature is not increased. More-
over, the temperature map of the register file is homoge-
neous and the thermal gradients are reduced.
Finally, the last set of experiments analyzes the effect
of the access pattern on the temperature of the register file.
This analysis will allow to define temperature-aware access
policies that reduce the temperature of the device as well
as the power consumption [15]. These experiments have
been performed for every placement of the layout (positions
1, 2, 3 and 4) and three different access patterns (registers
accessed from a bank placed on the right hand side of the
register file, accessed registers randomly placed in several
spots of the device and registers accessed in a homogeneous
manner as a chess board).
The following graphs show the results for the three dif-
ferent accesses when the register file is placed in position
4.
Figure 7 shows the evolution in time of the thermal map
for the register file when the registers are accessed from a
bank located on the right hand side of the device. As can
be seen, the bank where the registers are accessed from is
increasingly heated as time advances. At the end, a large hot
spot appears in the register file, which can severely damage
the device.
Figure 8 shows the evolution in time of the thermal map
for the register file when the registers are randomly accessed
from several spots in the device. As can be seen, these
spots where the registers are accessed from are increasingly
heated as time advances. At the end, several hot spots ap-
pear on the register file surface increasing the probability of
chip damage. Therefore, an access pattern what homoge-
nizes the thermal map on the silicon surface must be found.
Figure 9 shows the evolution in time of the thermal map
for the register file when the registers are accessed in a
“chess board” manner. As can be seen, this access pattern
homogenizes the temperature on the silicon because the ac-
cesses are distributed across a larger surface. Moreover, the
probability of hotspots is minimized and the reliability of
the system is not compromised. Therefore, this access pol-
icy can be considered as an effective way to optimize the
thermal behavior of the shared register file from the compi-
84
Authorized licensed use limited to: IEEE Xplore. Downloaded on November 20, 2008 at 19:04 from IEEE Xplore.  Restrictions apply.
lation stage.
5 Experimental Section
Figure 10 shows the evolution in time of the temperature
on the surface of the register file when running the applica-
tion ADPCM DECODE. For this set up, the register file has
been implemented as an squared array of 64 registers.
Every one of the plots showed in the figure has been ob-
tained every 5000 simulation cycles, and the benchmark has
been run until completion. As can be seen, the profile on the
register accesses causes that several registers are more de-
manded than others, creating, in this way, a thermal differ-
ence between registers (temperature gradient) that can de-
stroy the device. In this example, the repetitive access to a
few registers makes them to increase the temperature when
compared with the whole set.
In order to evaluate the proposed mechanisms to reduce
the thermal breakdown, the register assignment performed
by the compiler is modified in a way that the registers from
the register file are assigned in a “chess manner” policy.
Figure 11 shows the new evolution in time of the temper-
ature in the register file.
As can be seen, once the register assignment policy has
been modified to assign the registers in such a special way,
the thermal gradients disappear as the temperature is homo-
geneously distributed across the registers. The graph shows
how the temperature of the register file slowly increases
with the number of accesses, but the performed compiler
modification avoids overused registers and, therefore, over-
heated active areas.
The same approach has been followed to mitigate the ef-
fect of thermal gradients when running an image processing
algorithm (JPEG2000). The data presented in the plots are
also acquired every 5000 simulation cycles. As can be seen
in Figure 12, thermal gradients appear as a consequence of
the different register usage. Also, the placement of the reg-
ister file close to the hottest units in the system (cache mem-
ories) increments the temperature of the device by thermal
diffusion.
The optimization of this example has been performed by
modifying the register assignment phase as mentioned pre-
viously to diminish the temperature differences among the
registers. Additionally, the placement of the register file has
been selected to be close to the border area of the chip in
order to avoid the thermal transfer from the hot cache mem-
ories and reduce the average temperature.
Figure 13 shows the evolution in time of the register file
temperature when the register assignment policy has been
85
Authorized licensed use limited to: IEEE Xplore. Downloaded on November 20, 2008 at 19:04 from IEEE Xplore.  Restrictions apply.
(a) First 5000 simulation cycles (b) Second 5000 simulation cy-
cles
(c) Third 5000 simulation cycles (d) Fourth 5000 simulation cy-
cles
(e) Last simulation cycles
modified to perform a “chess based” assignment, as well
as the placement of the register file has been moved to the
border area. These graphs show how the temperature is ho-
mogenized for every register in the register file, and the final
temperature of the device is also reduced.
6 Conclusions
The thermal behavior of the hardware modules that inte-
grate the architecture of the processor is a factor that must
be controlled from different abstraction levels. For that pur-
pose, accurate and flexible estimation mechanisms are re-
quired.
The work presented in this paper has shown the devel-
opment of an analytical model to evaluate the thermal map
of the register file, one of the hottest modules in the VLIW
(a) First 5000 simulation cycles (b) Second 5000 simulation cy-
cles
(c) Third 5000 simulation cycles (d) Fourth 5000 simulation cy-
cles
(e) Last simulation cycles
processor architectures. This model has been used to eval-
uate the effect of several high-level transformations in the
register file placement, topology and register access pattern.
Finally, the presented model has demonstrated to be an
effective mechanism to achieve the thermal map of the reg-
ister file and has allowed to propose a register access policy
that improves the thermal behavior.
Acknowledgements
This work is partially supported by the Swiss FNS Re-
search Grant 20021-109450/1, and the Spanish Government
Research Grants TIN2005-05619 and TIC2003-07036.
86
Authorized licensed use limited to: IEEE Xplore. Downloaded on November 20, 2008 at 19:04 from IEEE Xplore.  Restrictions apply.
(a) First 5000 simulation cycles (b) Second 5000 simulation cy-
cles
(c) Third 5000 simulation cycles (d) Last simulation cycles
References
[1] D. Brooks and M. Martonosi, “Dynamic thermal man-
agement for high-performance microprocessors,” in
HPCA, 2001.
[2] J. Donald and M. Martonosi, “Temperature-Aware
Design Issues for SMT and CMP Architectures,” in
Workshop on Complexity-Effective Design, 2004.
[3] http://www.hpl.hp.com/research/dca/smart cooling/.
[4] H. Su, F. Liu, A. Devgan, E. Acar, and S. Nassif, “Full
leakage estimation considering power supply and tem-
perature variations,” in ISLPED, 2003.
[5] P. Li, L. Pileggi, M. Ashegi, and R. Chandra, “Effi-
cient full-chip thermal modeling and analysis,” in IC-
CAD, 2004.
[6] W. Huang, E. Humenay, K. Skadron, and M. Stan,
“The need for a full-chip and package thermal model
for thermally optimized IC designs,” in ISLPED,
2005.
[7] W. Huang, M. Stan, and K. Skadron, “Parameterized
physical compact thermal modeling,” IEEE Trans. on
Component Packaging and Manufacturing Technol-
ogy, vol. 28, no. 4, pp. 615–622, December 2005.
(a) First 5000 simulation cycles (b) Second 5000 simulation cy-
cles
(c) Third 5000 simulation cycles (d) Last simulation cycles
[8] S. Lopez-Buedo, J. Garrido, and E. I. Boemo, “Dy-
namically inserting, operating, and eliminating ther-
mal sensors of FPGA-based systems,” IEEE Trans.
on Components and Packaging Technologies, vol. 25,
no. 4, pp. 561–566, December 2002.
[9] N. Julien, J. Laurent, E. Senn, and E. Martin, “Power
consumption modeling and characterization of the
ti c6201,” IEEE Micro, vol. 23, no. 5, pp. 40–49,
September 2003.
[10] E. Senn, J. Laurent, N. Julien, and E. Martin, “Softex-
plorer: Estimation, characterization, and optimization
of the power and energy consumption at the algorith-
mic level,” in International Workshop on Power and
Timing Modeling, Optimization and Simulation, 2004.
[11] J. L. Ayala, C. Me´ndez, and M. Lo´pez-Vallejo, “Anal-
ysis of the Thermal Impact of Source-Code Transfor-
mations in Embedded Processors,” in ICECS, 2006.
[12] C. Me´ndez, J. L. Ayala, and M. Lo´pez-Vallejo, “Target
Independent Thermal Modeling for Embedded Pro-
cessors,” in IES, 2006.
[13] G. Paci, P. Marchal, F. Polletti, and L. Benini, “Explor-
ing Temperature Aware Design in Low-Power MP-
SoCs,” in DATE, 2006.
87
Authorized licensed use limited to: IEEE Xplore. Downloaded on November 20, 2008 at 19:04 from IEEE Xplore.  Restrictions apply.
[14] D. Atienza, P. G. D. Valle, G. Paci, and F. Poletti, “A
Fast HWSW FPGA-Based Thermal Emulation Frame-
work for Multi-Processor System-on-Chip,” in DAC,
2006.
[15] D. Atienza, P. Raghavan, J. L. Ayala, G. de Micheli,
F. Catthoor, D. Verkest, and M. Lo´pez-Vallejo,
“Compiler-driven leakage energy reduction in banked
register files,” in International Workshop on Power
and Timing Modeling, Optimization and Simulation,
2006.
88
Authorized licensed use limited to: IEEE Xplore. Downloaded on November 20, 2008 at 19:04 from IEEE Xplore.  Restrictions apply.
