Hotspot Prevention Through Runtime Reconfiguration in Network-On-Chip by Link, G. M. & Vijaykrishnan, N.
Hotspot Prevention Through Runtime Reconfiguration in Network-On-Chip ∗
G. M. Link, N. Vijaykrishnan
The Pennsylvania State University, University Park, PA, 16802
{link,vijay}@cse.psu.edu
Abstract
Many existing thermal management techniques focus on
reducing the overall power consumption of the chip, and do
not address location-specific temperature problems referred
to as hotspots. We propose the use of dynamic runtime re-
configuration to shift the hotspot-inducing computation pe-
riodically and make the thermal profile more uniform. Our
analysis shows that dynamic reconfiguration is an effective
technique in reducing hotspots for NoCs.
1. Introduction
Technology scaling and the quest for higher performance
has caused the power density of chips to reach alarming
levels. The temperature profiles are spatially non-uniform
across the chip and vary based on the power dissipation
characteristics of the individual chip components, creating
localized hot spots [1], resulting in failures. These hot spots
are not effectively addressed by techniques that minimize
the power consumption of the chip. Thermal solutions em-
ployed in current commercial processors such as dynamic
clock disabling and dynamic frequency scaling stop or shut
down the entire chip for brief periods of time. Instead of
shutting down or slowing down the entire chip, recent pro-
posals have focused on the migration of the workload from
a hot component to a cooler spare until the temperature re-
duces [2]. However, the use of spare units adds to hardware
redundancy consuming additional area. In contrast to us-
ing spares, our work explores the use of dynamic reconfig-
uration as a mechanism for addressing localized hotspots.
We periodically modify the configuration data stored to mi-
grate the functionality across the chip in order to balance
the temperature profile. In essence, we implement a spatial
remap of the different functionalities at runtime. Our so-
lution can be employed in programmable embedded archi-
tectures such as Network-on-Chip (NoC) designs and Field
Programmable devices that are being increasingly used in
various applications.
We evaluate our technique as applied to a Low Den-
sity Parity Check (LDPC) Decoder implemented on NoC
with a thermally-aware static mapping. We explore issues
such as the frequency of functional migration, choice of
remap function, techniques for modifying of the configura-
tion stream, performance impact of the reconfigured map-
pings, and heating patterns that can(not) be effectively han-
dled by some migration schemes. Our experimental investi-
gation reveals that dynamic reconfiguration approaches are
successful in balancing the thermal profiles and in reduc-
ing the peak temperature by up to 8◦ C (from 85C) when
starting with a thermally optimized static mapping.
∗ This work was supported by a MARCO/DARPA PAS grant, as well as
NSF 0093086.
2. Experimental Platform
Our experimental platform is based on the HotSpot ther-
mal library. The HotSpot tool was left with all settings at
the default values and an ambient temp. of 40◦ C.
Our floorplans were taken directly from the layout of our
sample chips. Two test chips implementing LDPC decod-
ing [3] were synthesized and placed and routed using a
commercial 160nm standard cell library. Overall, each func-
tional unit has an area of 4.36 sq. mm. Power consump-
tion of each unit is determined through the use of Synopsys
Power Compiler. A modified cycle-accurate NoC simula-
toris then run with an encoded message to obtain switching
rates for the components in the chip during operation. Note
that our simulations also include the energy consumed dur-
ing the migration operation to more accurately evaluate the
utility of our proposed method.
To more fully explore the design space for reconfigura-
tion, we evaluate multiple configurations of our test chips.
In particular, the 4x4 chip is evaluated with two different
configurations (referred to as A and B), while the 5x5 chip
is evaluated with three different configurations (C, D, E).
Differences in thermal profiles and power consumption
between the configurations are due to the irregularity of
the communication patterns and the amount of computa-
tion mapped to a single PE. Consequently, our workload
was mapped onto PEs using a thermally-aware placement
algorithm that minimizes the peak temperature. Using such
a thermally-aware mapping puts our method in a worst-case
light, that where power and heat distribution has already
been equalized at design time.
2.1. Techniques for Implementing Remapping
As maintaining additional configuration registers on chip
for remapping purposes requires a significant increase in
chip area, we propose a means of transforming existing con-
figuration information at run-time to generate new place-
ments. In this scheme, the operation of the PEs is halted,
the configuration and state information of each PE is passed
through a conversion unit, and then sent across the network
to the destination PE.
2.2. Migration Schemes
Our avenue of investigation is based on a logical model
that ensures the new position of the workloads can be alge-
braically determined from the current position information
and that the workloads will retain the same relative posi-
tion to each other, resulting in a much more predictable im-
pact on network traffic patterns. Intuitively, we can see that
there are only a limited number of ways of placing the log-
ical functionality onto the chip. If we abstract this relative
positioning requirement into a theoretical plane in which all
workloads are statically placed, we can see that all possi-
ble migrations must operate on the plane as a whole, rather
than on the workloads themselves. In practice, there are a
Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE’05) 
1530-1591/05 $ 20.00 IEEE 
New X Coordinate New Y Coordinate
Rotation N-1-Y X
X Mirroring N-1-X Y
X Translation X + Offset Y
Table 1. Transformation Functions
total of three ways of adjusting a plane that, when com-
bined, define all possible operations. These three operations
are rotation, mirroring, and translational shifting. We con-
sider each of these three operations as migration functions
separately. During the migration operation, it is possible
to ensure congestion-free packet movement by transform-
ing groups of PEs in phases. This congestion-free operation
allows for deterministic migration times, making our tech-
nique applicable to real-time systems.
2.3. Implementation of the Migration Functions
All of the proposed migration functions are mathemat-
ically quite simple, and require little hardware to properly
implement. Each migration function takes as input the cur-
rent X, Y location of the workload, and provides as output
the new X, Y destination of the workload. As such, only
3-bit operands are required to address up to 64 PEs, result-
ing in fast operation. The transformation functions are ex-
tremely simple when represented in {X,Y} format, and are
shown in Table 1. The selection of these simple functions al-
lows the migration unit to remain small, fast, and low power.
More importantly, the simplicity and predictability of the
migration functions presented allows for a simplified I/O
interface to the outside of the chip, by transforming the des-
tination address assigned to all incoming packets and trans-
forming the source address of all packets leaving the chip.
By including a migration unit at the I/O interface, the mi-
gration operation is totally transparent to the outside world.
Finally, we note that the same migration unit can perform
all migration functions presented with only minor changes
to the mathematical operations, allowing dynamic alteration
of the migration function at runtime
3. Results
We begin our analysis by comparing the relative effec-
tiveness of the different migrations at reducing the peak
temperature of the test chips. Figure 1 shows the reduc-
tion in peak temperature in the various circuit configura-
tions with different migration techniques. For circuit con-
figurations A and B, the rotational and X-Y mirroring mi-
grations reduce the peak temperature the most, while for the
larger configurations, translation is more effective. This dif-
ference in efficacy is due, in large part, to the even dimen-
sionality (4x4 array) of test cases A and B, as opposed to
the odd dimensionality (5x5 array) of test cases C, D, and
E. In the odd-dimensioned test cases, both the rotational and
-1
0
1
2
3
4
5
6
7
8
9
A (85.44) B (84.05) C (75.17) D (72.8) E (75.98)
Circuit Configuration and Base Temperature
D
e
g
r
e
e
s
 (
C
)
Rot X Mirror X-Y Mirror Right Shift X-Y Shift
Figure 1. Reduction in Peak Temps
mirroring migration functions ignore the central PE, and as
such, they are unable to balance the heat generated at the
center of the device. The poor behavior of the right shift is
due to the relative power output of the rows in the various
test cases. In all test cases, one of the rows had a signifi-
cantly higher power output than the remaining rows, gener-
ating a warm band that right shifting alone is unable to dis-
tribute. While such a warm band might seem to skew our re-
sults, note that a thermally-aware placement algorithm was
used to generate our initial test cases, and as such, it is rea-
sonable to assume that such characteristics would be even
more common in non-thermally-aware placements.
Among the configurations tested, X-Y shifting has the
highest average temperature reduction, 4.62◦ C. Rotational
migration has the second highest average temperature re-
duction, 4.15◦ C, but actually results in higher peak temper-
atures for configuration E for two reasons. The first reason,
common to both the rotational and mirroring migrations, is
that the hotspots in configuration E are near the center of the
chip, where those algorithms are least efficient at migrating
workload away from the hotspot. Second, the rotational mi-
gration has the largest energy penalty for performing recon-
figuration, resulting an increase in average chip tempera-
ture of 0.3◦ C. All of the above simulations were performed
with a migration period of 109 microseconds, resulting in
an overall throughput reduction of 1.6%. Higher frequen-
cies of migration result in a more even temperature distri-
bution, but do so at the cost of performance. For a recon-
figuration period of 437.2 microseconds, the overall perfor-
mance penalty drops to less than 0.4%, and the peak temper-
atures rise less than a tenth of a degree in the additional time
between migrations. Further, we can increase the period be-
tween reconfigurations to 874.4 microseconds and reduce
the throughput penalty to less than 0.2% without significant
impact on peak temperature. We note that the periods for
reconfiguration were chosen to coincide with the comple-
tion of the decoding of LDPC message blocks, minimizing
the amount of state information that must be transferred be-
tween PEs. The overhead for this state information was in-
cluded in our simulations.
4. Conclusions and Future Work
In this paper, we propose the use of dynamic runtime
reconfiguration to shift the hotspot-inducing computations
periodically and make the thermal profile more uniform.
Different approaches to reconfiguration are proposed and
evaluated for their effectiveness using a target Network
on Chip designed in 160nm technology to implement a
LDPC decoder. Our analysis shows that dynamic reconfig-
uration reduces the peak temperature up to 8◦ C over a ther-
mally optimized static placement. Our work demonstrates
that hotspots can be a problem even in homogenous ar-
chitectures, such as FPGAs, and that workload migration
is an effective technique for reducing hotspots, even when
thermally-aware placement is used at design time.
References
[1] K. S. et al. HotSpot: Techniques for Modeling Thermal Ef-
fects at the Processor Architecture Level. International Work-
shop on Thermal Investigations of ICs and Systems (THER-
MINIC), pages 169–172, Oct. 2002.
[2] M. Kanellos. At Intel-The Chip with Two Brains. C-Net and
http://www.news.com/, August 2002.
[3] T. Theocharides, G. M. Link, N. Vijayrkrishnan, and M. J. Ir-
win. Implementing LDPC Decoder on Network-on-Chip. IS-
LVSI, January 2005.
Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE’05) 
1530-1591/05 $ 20.00 IEEE 
