Evaluation of temperature-performance trade-offs in wireless network-on-chip architectures by Nerurkar, Nishad
Rochester Institute of Technology 
RIT Scholar Works 
Theses 
5-1-2013 
Evaluation of temperature-performance trade-offs in wireless 
network-on-chip architectures 
Nishad Nerurkar 
Follow this and additional works at: https://scholarworks.rit.edu/theses 
Recommended Citation 
Nerurkar, Nishad, "Evaluation of temperature-performance trade-offs in wireless network-on-chip 
architectures" (2013). Thesis. Rochester Institute of Technology. Accessed from 
This Thesis is brought to you for free and open access by RIT Scholar Works. It has been accepted for inclusion in 





Evaluation of Temperature-Performance Trade-offs in 





A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of 
Master of Science in Computer Engineering 
 
Supervised by 
Dr. Amlan Ganguly 
Department of Computer Engineering 
Kate Gleason College of Engineering 









Dr. Amlan Ganguly       Date: 
Primary Advisor – R.I.T. Dept. of Computer Engineering 
 
______________________________________________________________________________ 
Dr. Shanchieh Jay Yang      Date: 
Secondary Advisor – R.I.T. Dept. of Computer Engineering 
 
______________________________________________________________________________ 
Dr. Sonia Lopez Alarcon      Date: 











Dedicated to my parents, Dr. Shekhar Nerurkar  






Foremost, I would like to express my sincere gratitude to my primary thesis advisor Dr. 
Amlan Ganguly for his patience, motivation and constant support and guidance that he extended 
throughout the duration of this work. Without his valuable suggestions and useful comments, this 
work would not have been possible. My sincere thanks to Dr. Shanchieh Jay Yang and Dr. Sonia 
Lopez Alarcon, whose valuable comments and suggestions made a significant impact on this 
thesis. I would also like to thank Dr. Sonia Lopez Alarcon for the knowledge gained during her 
High Performance Architecture course that has been helpful for certain concepts related to this 
thesis work. Lastly, I would also like to thank my family and friends for their encouragement and 






Continued scaling of device geometries according to Moore’s Law is enabling complete 
end-user systems on a single chip. Massive multicore processors are enablers for many 
information and communication technology (ICT) innovations spanning various domains, 
including healthcare, defense, and entertainment. In the design of high-performance massive 
multicore chips, power and heat are dominant constraints. Temperature hotspots witnessed in 
multicore systems exacerbate the problem of reliability in deep submicron technologies. Hence, 
there is a great need to explore holistic power and thermal optimization and management 
strategies for the massive multicore chips. High power consumption not only raises chip 
temperature and cooling cost, but also decreases chip reliability and performance. Thus, 
addressing thermal concerns at different stages of the design and operation is critical to the 
success of future generation systems.  
The performance of a multicore chip is also influenced by its overall communication 
infrastructure, which is predominantly a Network-on-Chip (NoC). The existing method of 
implementing a NoC with planar metal interconnects is deficient due to high latency, significant 
power consumption, and temperature hotspots arising out of long, multi-hop wireline links used 
in data exchange. On-chip wireless networks are envisioned as an enabling technology to design 
low power and high bandwidth massive multicore architectures. However, optimizing wireless 
NoCs for best performance does not necessarily guarantee a thermally optimal interconnection 
architecture. The wireless links being highly efficient attract very high traffic densities which in 
turn results in temperature hotspots. Therefore, while the wireless links result in better 




reliability of the system. Consequently, the location and utilization of the wireless links is an 
important factor in thermal optimization of high performance wireless Networks-on-Chip. 
Architectural innovation in conjunction with suitable power and thermal management 
strategies is the key for designing high performance yet energy-efficient massive multicore 
chips. This work contributes to exploration of various the design methodologies for establishing 
wireless NoC architectures that achieve the best trade-offs between temperature, performance 
and energy-efficiency. It further demonstrates that incorporating Dynamic Thermal Management 
(DTM) on a multicore chip designed with such temperature and performance optimized Wireless 
Network-on-Chip architectures improves thermal profile while simultaneously providing lower 






Acknowledgements ......................................................................................................................... 3 
Abstract ........................................................................................................................................... 4 
Chapter 1 Introduction ............................................................................................................. 11 
1.1 Multicore Systems-on-Chip ........................................................................................... 11 
1.2 The Network-on-Chip paradigm .................................................................................... 13 
1.3 Emerging Interconnection Networks ............................................................................. 14 
1.4 On-chip Wireless Interconnects ..................................................................................... 15 
1.5 Temperature-Aware Multicore System-on-Chip ........................................................... 17 
1.6 Thesis Contributions ...................................................................................................... 19 
Chapter 2 Related Work .......................................................................................................... 21 
Chapter 3 Temperature-Aware NoC Architectures ................................................................. 23 
3.1 Generic Mesh ................................................................................................................. 23 
3.2 Small-World based NoC ................................................................................................ 24 
3.3 Wireless NoC Architectures ........................................................................................... 28 
3.3.1 Location based topologies....................................................................................... 29 
3.3.2 Optimization based topologies ................................................................................ 31 
3.4 Physical Layer Implementation ...................................................................................... 32 
3.5 Flow Control and Routing Policy .................................................................................. 33 




3.6.1 Thermal Characteristics .......................................................................................... 36 
3.6.2 Performance Characteristics ................................................................................... 40 
3.6.3 Performance Evaluation with Application-specific workloads .............................. 42 
Chapter 4 Temperature-Aware Task Reallocation on Wireless NoC Enabled Multicore chips
 46 
4.1 Temperature-Aware Thread Reallocation ...................................................................... 46 
4.2 Performance-Aware Reallocation .................................................................................. 49 
4.3 Experimental Results...................................................................................................... 51 
4.3.1 Thermal-Energy-Performance Tradeoffs ................................................................ 52 
4.3.2 Thermal and Performance Characteristics of Network Elements ........................... 57 
Chapter 5 Conclusions and Future Work ................................................................................ 65 






List of Figures 
Figure 1.1: Plot of Transistor Count with Time (Adapted from [1]) ............................................ 11 
Figure 1.2: Plot of Gate and Wire Delays with Technology Scaling (reproduced from [2]) ....... 12 
Figure 3.1: 64 Core Mesh Network-on-Chip ................................................................................ 23 
Figure 3.2: A generic naturally occurring small-world network .................................................. 24 
Figure 3.3: Average Hop Count L(p)/L(0) and Clustering Coefficient C(p)/C(0) ....................... 25 
Figure 3.4: Location based wireless placement on Mesh (a) E-Mesh (b) C-Mesh ....................... 30 
Figure 3.5: Simulation Work Flow ............................................................................................... 34 
Figure 3.6: Peak and Mean temperatures seen on Switches and Links of proposed topologies for 
(a) 0.1 flits/core/cycle (b) 0.3 flits/core/cycle ............................................................................... 36 
Figure 3.7: Temperature Distribution for Links and Switches for Uniform Injection rates of 
0.1flit/core/cycle ........................................................................................................................... 39 
Figure 3.8: Thermal Maps of NoC for (a) Mesh (b) E-Mesh (c) OSWNoC (d) OT-OSWNoC at 
Uniform Injection rates of 0.1 flit/core/cycle ............................................................................... 39 
Figure 3.9: Peak Bandwidth and Packet Energy and Total NoC Power Dissipation at for 
proposed topologies ...................................................................................................................... 41 
Figure 3.10: (a) Average Packet Energy (b) Total Power Dissipation ......................................... 43 
Figure 3.11: Temperature Distribution on Links and Switches for .............................................. 45 
Figure 4.1: The adopted task scheduling heuristic........................................................................ 50 
Figure 4.2: Modified Simulation Flowchart ................................................................................. 51 
Figure 4.3: Thermal histograms of cores showing temperature shift to the left (a) FFT (b) 




Figure 4.4:(a) Network Latency (b) Average Energy per Packet (c) Total Network  Power 
Dissipation in presence of all benchmarks and all architectures .................................................. 56 






List of Tables 
Table 3-1: NoC Architecture Space Summary ............................................................................. 31 
Table 3-2: Percentage of busy and idle cycles in a 64-core system given default problem sizes 35 
Table 3-3: Peak temperatures on Switches for Benchmarks ........................................................ 42 
Table 3-4: Peak temperatures on Links for Benchmarks .............................................................. 43 
Table 4-1: Reduction in Core temperatures from random task allocation to temperature-aware 
task allocation ............................................................................................................................... 54 
Table 4-2: Reduction in NoC Temperatures ................................................................................. 62 
Table 4-3: Reductions in network temperatures from Mesh to OT-OSWNoC and OP-OSWNoC 





Chapter 1  Introduction 
Technology scaling according to Moore’s law pushes the limits of the level of integration 
allowing billions of transistors to be integrated on a chip. Chip designers need to come up with 
computationally powerful processors for satisfying the computational needs faced in various 
scientific and research fields like weather forecasting, astrophysics, and bioinformatics and also 
of the growing demand in consumer electronics for faster computers. 
1.1 Multicore Systems-on-Chip 
The trend in the past few years shows a shift to integrating multiple cores in a chip as 
opposed to increasing frequency of operation of a uniprocessor system. Figure 1.1, shows the 
plot of the transistor count with time. [1] 
 
 




Traditional methods of increasing the system performance by increasing clock 
frequencies has hit a bottleneck due to the high levels of integration and consequent increase in 
the power dissipation of the chip. As the gate delays of the transistors scale with scaling of 
technology, Figure 1.2, taken from the 1997 Semiconductor Industry Association, SIA, roadmap 
[2], shows that the wire delays increase significantly. This increase in wire delays can be 
attributed to the high levels of integration on constant chip sizes. Long-distance interconnection 
wires between multiple cores on the highly integrated system-on-chip show higher delays due to 
relatively longer lengths resulting in higher resistances and parasitic capacitances. This makes 
traditional bus-based interconnection fabrics like the ARM AMBA [3] and the IBM 
CoreConnect [4] non-scalable as the system sizes increase to hundreds of cores in the future. The 
high levels of integration seen on multi-core architectures and the consequent increase in delays 
for global communication have lead chip designers to explore new global interconnection 
architectures, giving rise to the Network-on-Chip (NoC) paradigm.  
  
 
Figure 1.2: Plot of Gate and Wire Delays with Technology 




1.2 The Network-on-Chip paradigm 
Integrating multiple cores on a single chip to take advantage of core level parallelism 
leads to an increased delay in the inter-core communications due to the large lengths of the wires 
connecting the cores. Global wires carry signals across the chip and typically do not scale in 
length with scaling of technology [5]. Though gate delays scale down with technology, global 
wire delays typically increase exponentially or, at best, linearly with insertion of repeaters to 
mitigate delays. However, even with repeater insertion, the wire delay may exceed the limit of 
one clock cycle. The ultra-deep submicron processes show 80% of delays on the critical paths to 
be due to interconnects [6] [7]. These delays are mitigated with the use of Network-on-Chip 
paradigm [8].  
The Network-On-Chip paradigm [8] aims to mitigate global wire delays by designing 
separate scalable interconnection fabrics to support high speed communication between cores. 
The NoC separates the computational cores from the interconnection and communication needs, 
and provides a scalable plug-and-play network separate from the functionality of the cores. The 
interconnection network routes data packets via a series of network switches and links. Standard 





1.3 Emerging Interconnection Networks 
Following the NoC paradigm, many varied interconnection architectures have been 
researched upon [8]. The traditional method of implementing interconnection architectures using 
planar metallic interconnection links is deficient due to high latency and significant power 
consumption arising due to multi-hop communication used for data exchanges between cores. 
Increased power consumption can give rise to higher temperatures and compromise the chip 
reliability and performance, and increase cooling costs [9].  
The performance of conventional NoC architectures can be enhanced using few radically 
different interconnection technologies that are currently being explored; such as 3D integration, 
photonic interconnects and multi-band RF or wireless interconnects [10] [11]. These new 
technologies have been predicted to be capable of enabling multi-core NoC designs, which 
improve the speed and energy dissipation of the multi-hop communication. Three-dimensional 
interconnects integrate multiple active layers onto a single chip and consequently reduce hop 
counts and the average wire length of a single hop. The performance advantages of three 
dimensional interconnects come at the cost of an increase in temperatures due to smaller foot 
print and higher resultant power densities which cause high heat dissipation [8]. Fabrication of 
3D interconnections is also proven to be a challenge due to issues with inter-layer alignments, 
bonding, inter-layer contact patterning [12] and increased risks of manufacturing defects coupled 
with demands for new CAD tools to support the 3D integration. 
Another emerging interconnection fabric is the photonic interconnects which make use of 
optical interconnects instead of metallic links. Photonic interconnects have been predicted to 
considerably enhance the bandwidth and reduce latency [13] with data transmissions occurring at 




without the need for regeneration or buffering [14]. These emerging interconnects, however, are 
challenged by the technology needed to manufacture photonic devices and integrate them with 
silicon-compatible circuits under the constraints of area, power and delay. 
Multiband RF interconnects use wires as transmission lines to transfer data in the form of 
electromagnetic (EM) waves. The data is modulated onto a carrier using amplitude or phase shift 
keying [15]. Using this method of interconnection, bandwidth of conventional wires can be 
increased using multiple access techniques resulting in low latency data transfer at speed of light 
via EM waves. Multiband RF interconnects are limited by the design of high frequency 
oscillators and filters on the chip for the transceivers. 
On-chip wireless links are a promising alternative to the performance limitations seen by 
long-distance wired links. On-chip wireless transceivers enable long distance, high bandwidth 
and low latency communication over long range paths. Absence of the need for physical 
interconnection layouts makes wireless interconnects stand out from other emerging 
interconnects.  
1.4 On-chip Wireless Interconnects 
Manufacturing and fabrication difficulties seen with emerging interconnect technologies 
discussed in previous section limit them from being realized into efficient on chip 
communication backbones while it is important to address the limitations of planar metal 
interconnect-based NoCs. On-chip wireless links are seen as a step towards this direction. Over 
the past few years, there have been considerable efforts in the design and fabrication of on-chip 
miniature antennas operating in the range of tens of gigahertz to hundreds of terahertz, opening 




of CMOS Ultra-Wide Band (UWB) technology [19] and the feasibility of designing on-chip 
wireless communication network with miniature antennas and simple transceivers that operate in 
the sub-THz range of 100-500 GHz has been demonstrated [20]. It is possible to decrease the 
transmission frequencies to the THz range with the use of nano-scale antennas based on carbon 
nanotubes (CNTs) [17]. Recent research has uncovered excellent emission and absorption 
characteristics leading to dipole like radiation behavior in carbon nanotubes (CNTs), making 
them promising to use as on-chip antennas for wireless communication. Consequently, building 
on-chip wireless interconnection network using THz frequencies for inter-core communications 
becomes feasible.  
On-chip wireless links act as traffic attracters since they provide single-hop long distance 
shortcuts over the NoC. This can lead to excessive traffic flowing over the wireless links causing 
high power dissipation on wireless transceivers and localized temperature hotspots. Therefore, 
while the wireless links result in better performance and energy-efficiency, they can also cause 
temperature hotspots and undermine the reliability of the system. Consequently, the location and 
utilization of the wireless links is an important factor and naïve approaches towards performance 
optimizations will not result in thermally optimal solutions. Hence, it is necessary to take into 





1.5 Temperature-Aware Multicore System-on-Chip 
The increased levels of power densities seen on highly integrated systems-on-chip makes 
designing effective packages to dissipating maximum heat infeasible [21]. Moreover, technology 
scaling is pushing the limits of affordable cooling, thereby requiring suitable design techniques 
to reduce peak temperatures. Temperature hotspots witnessed in multicore systems exacerbate 
the problem of reliability in deep submicron technologies. Thus, addressing thermal concerns at 
different stages of the design is critical to the success of future generation systems. In this 
context, Dynamic Thermal Management (DTM) [22] appears as a solution to avoid high spatial 
and temporal thermal variations among cores, and thereby avoid local temperature hotspots. 
Recent works on DTM for multicore architectures focus on optimizing the performance and 
explore the design space in the presence of thermal constraints.  
The performance of a multicore chip is also influenced by its overall communication 
infrastructure, which is predominantly a Network-on-Chip (NoC). The existing method of 
implementing a NoC with planar metal interconnects is deficient due to high latency, significant 
power consumption, and temperature hotspots arising out of long, multi-hop wireline links used 
in data exchange. It is possible to design high-performance, robust, and energy efficient 
multicore chips by adopting novel architectures inspired by complex network theory in 
conjunction with on-chip wireless links. Networks with the small-world property have a very 
short average path length, making them particularly interesting for efficient communication with 
minimal resources. Designing temperature-aware NoC architecture is a primary concern to 
mitigating poor reliability and degraded NoC performance in presence of high chip temperatures 




Using the small-world approach a temperature and performance efficient NoC with both 
wired and wireless links can be designed with neighboring cores connected through normal 
metal wires while widely separated cores communicating through long-range, single-hop, 
wireless links. This work aims at designing a temperature-aware small-world based wired NoC 
architecture which uses wireless links to exploit performance gains shown in [11] while also 
eliminating temperature hotspots. The proposed hybrid wireless NoC architectures is integrated 





1.6 Thesis Contributions 
This work proposes a holistic approach towards achieving thermal management in 
multicore chips interconnected using wireless NoCs. The approach integrates thermally optimal 
architecture design with dynamic task reallocation based DTM techniques. This thesis 
demonstrates the design of temperature-aware small-world based Network-on-Chip architectures 
with on-chip wireless links to improve the temperature characteristics as well as performance 
gains of the NoC. It further evaluates the temperature-energy-performance tradeoffs involved in 
multi-core system-on-chip designs using the proposed architectures and a suitable dynamic 
thermal management heuristic. The following summarize the contributions made during this 
work. 
 Temperature-Aware Architecture-Space Exploration 
o Evaluation of temperature-performance tradeoffs for small-world based network 
topologies. 
o Thermal-aware placement of wireless interconnects for best temperature-
performance tradeoffs. 
 Dynamic Thermal Management Heuristic 
o Dynamic Thermal Management heuristic for multi-core Systems-on-Chip 
employing the Network-on-Chip paradigm. 
o Evaluation of Thermal Management heuristic in presence of proposed Network-
on-Chip Architecture. 
 Experimental Results 
o Setup of Simulation workflow involving Gem5, a full system simulator, and  




cycle accurate Network-on-Chip simulator to implement the proposed CNT based 
wireless NoC architectures in presence of different synthetic and real-time 
traffics. 
o Develop Dynamic Thermal Management heuristic based on chip temperature 
predictions and NoC architecture characteristics. 
o Obtain the experimental results for the proposed Network-on-Chip architectures 
integrated with the thermal management heuristic with respect to the following 
parameters using the simulation work flow 
 Peak achievable bandwidth 
 Packet energy dissipation 
 Peak and mean temperatures on NoC elements such as switches and links 
 Peak and mean temperatures of the processor cores 
 Temperature distributions for the proposed NoC architectures 
 Area, performance overhead of the thermal management heuristic. 
 Publications 
o Jacob Murray, Paul Wettin, Partha Pande, Behrooz Shirazi, Nishad Nerurkar, 
Amlan Ganguly, “Evaluating Effects of Thermal Management in Wireless NoC-
Enabled Multicore Architectures”, accepted, International Green-Computing 
Conference, 2013. 
o Nishad Nerurkar, Amlan Ganguly, Aniket Mhatre, “Evaluating Temperature, 
Performance and Energy Trade-offs in Wireless NoC Architectures,” under 





Chapter 2  Related Work 
Conventional NoCs use multi-hop, packet switched communication. NoCs have been 
shown to perform better by inserting long-range, wired links following principles of small-world 
graphs [23]. A comprehensive survey regarding various WiNoC architectures and their design 
principles are presented in [24]. Notable examples include, design of a WiNoC based on CMOS 
ultra wideband (UWB) [19], 2D concentrated mesh-based WCube architecture using sub-THz 
wireless links [20], and the inter-router wireless scalable express channel for NoC (iWISE) 
architecture [25]. Possibilities of creating novel architectures aided by the on-chip wireless 
communication have been explored in [11] and [8]. These two works proposed design of 
hierarchical and hybrid WiNoC architectures using long-range wireless shortcuts. The whole 
system is partitioned into multiple small clusters of neighboring cores called subnets. In the 
upper level of the network, the subnets are connected via wireline and wireless links. In both 
these designs the subnets are connected in a basic regular structure like a mesh or a ring, in the 
second level of the hierarchy, long-range wireless shortcuts are placed on top of that. It is also 
shown that a WiNoC, where the network architecture is designed following the power-law based 
small-world connectivity [26], is more robust in presence of wireless link failures compared to 
the hierarchical counterpart [23]. Though there have been several investigations regarding the 
performance evaluation and associated design trade-offs of various WiNoC architectures, 
analysis of their thermal profiles has not received much attention. The work in [27] shows that 
by incorporating dynamic voltage and frequency scaling (DVFS) in a WiNoC, the thermal 
profile of a multicore chip can be improved. 
DTM [22] is a widely adopted technique to avoid thermal emergencies. Most DTM 




measure only when the temperature reaches a thermal limit. Predictive DTM methods [28] rely 
on thermal estimation techniques to predict future temperatures allowing exploration of different 
thermal solutions well before the thermal threshold has been reached. A comprehensive survey 
of the architectural, software, and algorithmic issues for energy-aware scheduling of workflows 
on single core, multicore, and parallel architectures is presented in [29].  
In the multicore era, runtime task migration is a popular technique for reducing peak 
temperature. Task migration redistributes existing processes to available cores based on the 
current thermal condition of the chip. Numerous migration schemes have been proposed in [30]. 
Although most of these techniques use a centralized scheme for monitoring the temperature and 
workload distribution, [31] proposes a distributed approach by performing task migration among 
neighboring cores. The drawback of such methods is long latencies incurred in the migration of 
tasks from one core to another, thereby making frequent migrations infeasible.  
Multicore systems with apriori known workload can overcome this limitation by using 
predictive task allocation. The goal is to schedule the tasks such that deadlines and the 
dependence constraints of the tasks are met while achieving the best possible thermal profile. For 
dynamic workloads, a dynamic task allocation algorithm has been proposed in [31]. By 
observing the current temperature of the cores, the dynamic technique allows systems to respond 
to real-time changes, and adapt dynamically with the current workload. The aim of this work is 
to explore the complex network architecture space for thermally optimal wireless Network-on-
Chip architecture and evaluate the performance-energy-temperature trade-offs by incorporating 




Chapter 3  Temperature-Aware NoC Architectures 
Data communication between embedded cores is facilitated by multiple switches and 
wired links in a generic wired NoC. This multi-hop communication results in high energy 
dissipation and high latency. Generic wired NoC architectures are consequently, prey to high 
chip temperatures. The problem of high energy and latency of wired NoC can be alleviated using 
high-bandwidth, long distance wireless links. However, the placement of wireless nodes (WIs) 
provides single hop, long distance links causing majority of the traffic to be routed through these 
nodes leading to localized hotspots near the WIs. This chapter explores different wireless NoC 
architectures that are able to mitigate NoC temperature hotspots while improving the 
performance.  
3.1 Generic Mesh 
A regular mesh is a grid of switches interconnecting cores placed along with each of the 
switch. Every switch, except those at the ends, is connected to its four neighboring switches and 
to its own core. Each switch is connected to its neighbors through bidirectional links. Figure 3.1 
shows a 64 core system with a regular Mesh NoC. 
       





3.2 Small-World based NoC 
Regular Mesh based topologies have high energy expenditures on on-chip data transfer 
due to planar metal interconnect-based multi-hop links. The data transfer between two far apart 
blocks leads to high latency and power consumption. An increase in the number of cores on the 
chip significantly aggravates this problem.  
This problem occurring in regular networks can be alleviated by the use of naturally 
occurring networks. Most naturally occurring networks like social networks, networks seen in 
microbial colonies and many other technological and biological networks are neither completely 
regular nor completely random. These networks can be said to be a combination of clustered 
nodes with many short-range links and few long range links which provide shortcuts to different 
regions of the network [32]. Figure 3.2 shows the topology of such a generic natural network. The 
existence of long range paths between these clusters of interconnect nodes is what gives rise to 
the “Small-World” phenomenon.  
 
Figure 3.2: A generic naturally occurring small-world network 
These small-world networks are typically characterized by the logarithmic relation 
between the mean internode distance and the network size. The best example of small-world 




shows in Figure 3.3 that the average hop count of the networks drop exponentially due to the 
presence of shortcuts. On the other hand, randomly wired networks have a very low clustering 
coefficient, which is defined as the fraction of possible links to immediate neighbors. This drop 
in clustering coefficient can reduce the fault tolerance of the network topology making it 
vulnerable to segmentation in case of node failures. The networks with the small-world property 
lie in between the regular and random networks with high clustering coefficients as well as low 
average hop counts.  
 
Figure 3.3: Average Hop Count L(p)/L(0) and Clustering Coefficient C(p)/C(0)  
(reproduced from [33]) 
The regular networks like a Mesh can be improved by inserting such long range wired 
shortcuts in the network. Inserting long-range links to create a small-world network has been 
shown to improve the efficiency of NoCs [23]. This improvement in efficiency can be attributed 
to lower latency and power dissipation when rewiring regular networks to obtain small-world 
based topologies with limited resources. Such networks have very short average hop count 
between any two nodes due to the presence of long range links as shown in Figure 3.3.  
In order to take advantage of the energy-efficiency of small-world topologies and reduce 
chip temperatures, a small-world NoC is created, with same number of switches as a 




connections follow a small-world topology where the wireline links between switches are 












),(              (3.1) 
Where, the probability of establishing a link, between two switches, i and j, P(i,j), separated by a 
Euclidean distance of lij is proportional to the distance raised to a finite power [33]. The 
parameter α governs the nature of connectivity. In general, a higher α means a very locally 
connected network with few or no long range links similar to that of a cellular automata based 
topology. On the other hand, a zero value of α would result in an ideal small-world network 
following the Watts-Strogatz model [33] with long range shortcuts virtually independent of the 
distance between the cores. The value of α can be optimized to obtain topologies with higher 
performance gains and optimal wiring costs [26]. 
The small-world NoC (SWNoC) topology obtained using this method has been shown to 
perform better as compared to a regular mesh [11]. However, this topology is not guaranteed to 
be thermally optimal as the energy-efficient wireless links would attract a lot of traffic 
dissipating a lot of power and eventually create local hotspots. To optimize the topology for a 
better temperature profile, the power dissipation of the switches and links needs to be uniformly 
distributed such that traffic hotspots are avoided.  
In order to achieve a uniform distribution of traffic the initial setup of small-world 
topology is further optimized by adopting a simulated annealing (SA) based approach as it is 
shown to converge to an optimal NoC configuration faster than exhaustive search [11]. The 




distribution of its switches. While, the performance of a network is reflected by the average hop-
count between two switches in the network the occurrence of hotspots is captured by variation in 
traffic densities among the NoC switches. A network with high performance requires a low 
average hop-count. However, simply optimizing for hop-count can result in a few very well 
connected nodes which will attract significant traffic. This phenomenon is called preferential 
attachment where new links are attached to already well connected nodes. In the trivial case this 
may end up being a star shaped network which is not scalable and cannot sustain high data rates 
while also causing thermal hotspots due to the unbalanced traffic through the central hubs. 
Hence, the traffic density through each switch needs to be balanced in addition to optimizing the 
average hop-count. The number of flits passing through a switch over a clock cycle can be 
defined as the traffic density. The traffic density is dependent on the number of routes passing 
through the switch and the number of connections a switch has to other switches. Therefore an 
aggregate metric is optimized that captures both the average hop-count and the variance in traffic 
density in the NoC switches as in 
uh  ˆ).1(
ˆ.       (3.2) 
Where, ρ is the metric to be optimized, ĥ  is the average hop-count of the small-world topology 
generated, normalized with that of the Mesh with same number of cores and  u̂   is the standard 
deviation of switch utilization normalized with that of the Mesh with same number of cores. The 
switch utilization is dependent on the number of flits that pass through a given switch. The 
number of flits passing through a given switch depends on the number of routes passing through 
that switch and the degree or number of immediate neighbors of that switch. The utilization of 




iii dnu .       (3.3) 
Where, ui is the utilization of switch i, ni is the number of routes passing through switch i and di 
is the degree of switch i. The degree of the switches can be calculated from the adjacency matrix 
of the topologies. To obtain the number of paths passing through a particular switch a shortest 
path based routing scheme is adopted as discussed in the following sections.  
The product ui captures the traffic density information for a switch. Hence, the goal is to 
minimize the variation of the ui given in (3.3). The variation of ui is modeled with the standard 
deviation of ui for i ranging over all the NoC switches in the given topology. The goal of the SA 
optimization is to minimize the aggregate metric ρ given in (3.2) to achieve a well-balanced 
trade-off between performance and thermal characteristics. To achieve this, Φ is taken to be 0.5 
to assign equal importance to average hop-count as well as utilization in the aggregate metric. In 
each iteration of the SA process, a new network is created by randomly rewiring a link in the 
current network between a different source and destination pair, which are not already directly 
connected. The optimization metric is then computed for this network. The new network is 
accepted if the metric is less than the metric of the optimal network obtained till that iteration. 
The network with a higher metric is accepted with a probability that decreases with every 
iteration to avoid being stuck in local optima. A Cauchy scheduling policy is adopted in the SA 
approach. The optimized SWNoC (OSWNoC) is expected to have better thermal characteristics 
than the SWNoC due to a balanced traffic density. 
3.3 Wireless NoC Architectures 
Small-world topologies can improve the performance of NoC architectures however, the 




The energy dissipation can be considerably reduced with the use of high-bandwidth, low-power 
wireless links [23]. On-chip wireless links enable one hop data transfers between distant nodes in 
the network, reducing the number of hops required in inter-core communication. In addition to 
reducing delay, insertion of wireless links also significantly reduces the energy dissipation 
compared to NoCs with long distance metallic links. However, the insertion of a wireless link is 
associated with a rise in temperature of the associated switches as they will attract high traffic 
densities.  Consequently, the WIs need to be placed such that temperature hotspots are avoided. 
Two different approaches have been followed to place the wireless links – location based 
heuristic methods and direct optimization based methods. In each case 24 wireless links have 
been deployed as recent literature shows the possibility of creating 24 wireless links operating in 
distinct THz frequency ranges creating an efficient Frequency Division Multiple Access 
mechanism [20]. 8 WIs connected in all-to-all manner would require 28 links. Hence, there are 8 
all-to-all connected nodes with the 4 shortest links removed to constrain the total number of 
wireless links to 24. The 4 shortest are removed to keep the farthest nodes directly connected by 
the wireless links. .  
3.3.1 Location based topologies 
The edges of the NoC are nearer to the chip edge leading to easier heat sinking. The first 
location based heuristic method, E-Mesh involves placing the WIs on the edges of the NoC with 
two WIs on each edge of the chip, as shown in Figure 3.4(a) with the constraint that the WIs are 
at least three hops away for a regular Mesh topology. The resulting topologies are called in 





Placing WIs on just the edges does not consider mitigating the hotspots and delays seen 
by the wired NoC towards the center of the chip. In the second location based placement 
heuristic, the nodes are placed at four corners of the grid of switches along with 4 nodes at the 
central region of the NoC as shown in Figure 3.4(b). The resulting topologies are called C-Mesh 
and C-OSWNoC. The corners of the grid are closest to the edge of the chip and can dissipate the 
heat efficiently. However, the four nodes towards the center of the chip can act as shortcuts 
towards the center of the chip and reduce the traffic following in the wired links towards the 
center of the chip. 
Complex network topologies like the small-world may have two nodes physically 
adjacent to each other with more than one hop of communication due to the inherently random 
construction of topology. C-OSWNoC and E-OSWNoC does not guarantee performance gains 
by placing WIs only on the corners and edges respectively. Connecting nodes which see the 
highest hop-count in the network reduces the average hop-count of the network and improves 
performance. To enhance the effect of performance by placing the WIs, another approach is 
 




used, the D-OSWNoC, which places wireless transceivers on those switches which are separated 
by the maximum number of hops between them.  
3.3.2 Optimization based topologies 
The above described approaches are based on simple location based heuristics 
optimization to increase performance and optimize the temperatures.  Simulated Annealing based 
optimization is used in order to place the WIs such that the average hop-count as well as the 
variation in traffic density and power dissipation of the switches and links is optimized. In OP-
Mesh and OP-OSWNoC, the metric in (3.2) is optimized by setting Φ = 1, so that the placement 
is optimized to reduce the average hop count of the NoC only. In OT-Mesh and OT-OSWNoC 
the NoC is optimized for average hop-count as well as the variation in traffic density by setting 
Φ = 0.5. The resulting topologies have optimal placement of WIs with respect to each of the 
metrics optimized. Table 3-1 below summarizes the designed NoC architectures. 
Architecture Description 
Mesh Regular 64 core mesh 
C-Mesh Mesh with WIs on Corners and Center 
E-Mesh Mesh with WIs on Edges 
OP-Mesh Mesh with WIs placed for optimal performance 
OT-Mesh Mesh with WIs placed for optimal performance and temperatures 
SWNoC Performance Optimized Small-World NoC 
OSWNoC Small-World NoC optimized for performance and temperatures 
C-OSWNoC OSWNoC with WIs on Corners and Center 
E-OSWNoC OSWNoC with WIs on Edges 
D-OSWNoC OSWNoC with WIs on network diameters 
OP-OSWNoC OSWNoC with WIs placed for optimal performance 
OT-OSWNoC OSWNoC with WIs placed for optimal performance and temperature 




3.4 Physical Layer Implementation 
Suitable on-chip antennas are necessary to establish the wireless links. Antenna 
characteristics of the carbon nanotubes (CNTs) in the THz frequency range have also been 
investigated both theoretically and experimentally [17] [34]. It is already shown that wireless 
NoCs designed using carbon nanotube (CNT) antennas can outperform conventional wireline 
counterparts significantly [11]. By using multiband laser sources to excite the CNT antennas, 
different frequency channels can be assigned to pairs of communicating source and destination 
nodes. This will require using antenna elements tuned to different frequencies for each pair, thus 
creating a form of frequency division multiplexing (FDM), giving rise to dedicated channels 
between source and destination pairs. In [20], 24 continuous wave laser sources of different 
frequencies are used. Thus, these 24 different frequencies can be assigned to multiple wireless 
links in the NoC in such a way that a single frequency channel is used only once to avoid signal 
interference on the same frequencies.  This enables concurrent use of multi-band channels over 
the chip; hence, in this work 24 wireless links are assumed each with a single channel for the 
SWNoC architecture. The laser sources can be located off-chip or bonded to the silicon die, 
therefore, their power dissipation does not contribute to the chip power density [14]. As noted in 
[11] the electro-optic transceivers dissipate 0.33pJ/bit for wireless data transmission. However, 
integration of CNT antennas with standard CMOS processes needs to overcome significant 
challenges. Mm-wave CMOS transceivers operating in the sub-THz frequency ranges is a more 
near-term solution. WiNoCs using sub-THz transceivers use a token passing based data-link 
layer to enable multiple transceivers access the shared wireless channel have been shown in [25] 
and [10]. However, in this work, CNT based transceivers have been considered for on-chip 




3.5 Flow Control and Routing Policy 
In the proposed NoC architectures, data is transferred via a flit-based, wormhole routing 
using virtual channel (VC) based NoC switches [35]. In order to achieve an efficient, deadlock-
free and distributed routing policy a layered shortest path routing (LASH) is adopted as proposed 
in [36]. In LASH, shortest paths between different source-destination pairs are separated into 
multiple virtual layers to avoid cyclic dependencies between paths in a particular layer. The 
shortest path between any source and destination is pre-computed offline to eliminate the 
overheads of path computation for every packet. Each switch has a routing table, which contains 
only the identity of the next switch corresponding to all possible final destinations. Hence, the 
memory requirement is proportional only linearly to the system size. When a header flit arrives 
at a particular switch the next switch is determined from the routing table based on the final 
destination of the packet. The header flit is then routed to the appropriate port along the 
particular virtual channel reserved for its source/destination pair. Only the next switch is 
determined at each intermediate switch making the routing decision fast and efficient. The 
routing paths being shortest paths also enable highly efficient data transfer resulting in high data 






3.6 Experimental Results 
In this section, the performance and temperature profile of the proposed topologies are 
evaluated and compared with those of a conventional mesh-based NoC. Topologies are evaluated 
for performance and temperature profiles for synthetic as well as real application-based traffic 
distributions. Real-time traffics are obtained from SPLASH-2 and PARSEC benchmarks. The 
GEM5 [37], a full system simulator, is used to obtain detailed processor and network-level 
information of a system with 64 alpha cores running Linux within the GEM5 platform. The 
memory system is MOESI_CMP_directory, setup with private 64KB L1 instruction and data 
caches and a shared 64MB (1MB distributed per core) L2 cache. Three SPLASH-2 benchmarks, 
FFT, RADIX, LU [38], and the PARSEC benchmark CANNEAL [39] are considered. 
 
These benchmarks vary in characteristics from computation intensive to communication 
intensive in nature and thus are of particular interest in this work. The behavior and problem size 
of the benchmarks is described in Table 3-1. The processor-level statistics generated by GEM5 
simulations are incorporated into McPAT (Multicore Power, Area, and Timing) [40] to 
 
Figure 3.5: Simulation Work Flow 
 







Table 3-2: Percentage of busy and idle cycles in a 64-core system given default problem 
sizes 
Benchmark Busy % Idle % Default Problem Size 
FFT 81.99 18.01 65,536 Data Points 
RADIX 84.98 15.02 262,144 Integers, 1024 RADIX 
LU 87.62 12.38 512x512 Matrix, 16x16 Blocks 
CANNEAL 56.74 43.26 200,000 Elements 
 
determine the processor-level power statistics. The original frequency of traffic interaction 
between the cores, fij, is obtained from GEM5 and used to generate the traffic patterns for each 
benchmark in a cycle-accurate NoC simulator to obtain the NoC performance in terms of 
network bandwidth, average packet energy and temperature of the NoC links and switches. The 
width of all wired links is considered to be same as the flit width, which is considered to be 32 
bits in this paper. Each packet consists of 64 flits. Similar to the wired links, wormhole routing is 
adopted in the wireless links too. The NoC simulator uses switches synthesized from an RTL 
level design using CMP 65-nm CMOS process using Synopsys™ Design Vision. The particular 
NoC switch architecture has three functional stages, namely, input arbitration, routing/switch 
traversal, and output arbitration [8]. The number of VCs in the WiNoC switches depends on the 
system size and the number of interconnects.  As shown in [36], 4 layers are enough for 
deadlock-free LASH routing in a 64 core system. Each layer is considered to have a single VC 
reserved. Each VC has a buffer depth of 2 flits. The wireless switches route a higher traffic 
density as the wireless links provide shortcuts between distant parts of the NoC. Hence, the ports 
of the wireless switches have a higher buffer depth of 16 flits. Energy dissipation of the network 
switches were obtained from the synthesized netlist by running Synopsys™ Prime Power, while 
the energy dissipated by wireline links was obtained through HSPICE simulations taking into 




can sustain a data rate of 10Gbps and has an energy dissipation of 0.33pJ/bit [11]. After 
obtaining processor and network power values, the processors and the network switches and 
links are arranged on a 20mm x 20mm die. These floor plans, along with the power values, are 
used in HotSpot [41] to obtain steady state thermal profiles. The core and network powers in 
presence of the specific benchmarks are fed to the HotSpot simulator to obtain the temperature 
profiles of each scenario. The flow of the overall simulation process is shown in Figure 3.5. 
3.6.1 Thermal Characteristics 
Temperatures are evaluated by simulating a uniform random traffic pattern with injection 





Figure 3.6: Peak and Mean temperatures seen on Switches and Links of proposed topologies for 

























 Peak Switch Temperature Mean Switch Temperature





















Peak Switch Temperature Mean Switch Temperature




realistic power of 0.2W as seen from the characteristics of the various benchmarks considered 
here. Figure 3.6 shows the maximum and mean temperatures seen on links and switches for the 
proposed topologies for both traffic loads mentioned above. From Figure 3.6(a), it can be seen 
that the Mesh has a maximum temperature of 92 °C on the NoC elements with low injection 
loads. The SWNoC and OSWNoC are seen to have lower temperatures compared to the Mesh. 
The OSWNoC has 35% reduction in temperatures compared to the conventional Mesh at low 
loads of 0.1 flits/core/cycle. The temperatures increase to 188°C with a uniform random traffic 
with injection rates of 0.3 flits/core/cycle, as seen in Figure 3.6(b). The inherent multi-hop 
communication, high latency and power dissipation of the regular Mesh gets saturated with 
increased loads and hence shows very high temperatures. For the same configuration, the 
SWNoC has maximum temperatures falling below 100°C and the OSWNoC has maximum 
temperatures of 80°C on the switches and links. This significant reduction of temperatures in the 
OSWNoC can be attributed to the uniformly distributed traffic densities.  
The Figure 3.7 shows the temperature distribution of links and switches for Mesh, E-
Mesh, SWNoC, OSWNoC and OT-OSWNoC evaluated for 0.1 flits/core/cycle. It can be seen 
that the small-world topologies, SWNoC and OSWNoC, not only reduce the peak and mean 
temperatures compared to the Mesh but also reduce overall temperatures of the NoC switches 
and links which is signified by a leftward shift in the histograms. This is because the small-world 
topologies have lower hop-count compared to Mesh and hence can route data more efficiently as 
will be seen in the next subsection. However, the peak temperatures for traffic distribution 
making it inherently resilient to hotspots. Figure 3.8 shows the temperature maps for the Mesh, 
E-Mesh, OSWNoC and OT-OSWNoC. It can be seen that the Mesh has very high temperatures 




Mesh. The high temperatures seen on the Mesh are significantly reduced with the OSWNoC 
which shows a temperature map with relatively lower temperatures at the hotspot location. 
It can be observed that placing WIs on wired NoC architectures significantly improves 
the thermal characteristics of the NoC. The location of the WIs causes significant differences in 
the peak temperatures seen on mesh based architectures.  E-Mesh shows maximum reduction in 
peak temperatures with 68.91°C on Switches and 72.95°C on links. Placing WIs on the edges of 
the mesh diverts the higher traffic densities towards the edges which are close to the heat sinks. 
Consequently, this results in the lowest temperatures among the mesh based WiNoCs as can be 
seen from Figure 3.8.  
The C-OSWNoC and OT-OSWNoC show maximum reduction in peak temperatures 
compared to the Mesh. The traffic densities in C-OSWNoC are distributed towards the corners 
and towards the center. These distributed densities are supported by the optimized wired 
OSWNoC resulting in 39.2% reduction in switch temperatures compared to the Mesh and 5% 
reduction in switch temperatures compared to OSWNoC. OT-OSWNoC is optimized for 
reduction in average hop count as well as variation in traffic densities. It results in 37.5 % and 
4.5% reduction in temperatures compared to the Mesh and the wireline OSWNoC. This result 
can be seen from Figure 3.8 where, OT-OSWNoC shows a balanced thermal map as compared to 







































































                                   (c)                                                                      (d) 
Figure 3.8: Thermal Maps of NoC for (a) Mesh (b) E-Mesh (c) OSWNoC (d) OT-OSWNoC at 























The D-OSWNoC, OP-Mesh and OP-OSWNoC are not fully optimized for thermal 
characteristics. The difference in temperatures is evident in Figure 3.6(b) where, the architectures 
are subjected to a uniform random traffic of 0.3 flits/core/cycle. It can be seen that these 
architectures, although still having temperatures lower than wired NoC architectures, still have 
maximum temperatures 9°C higher than the best temperature cases of C-OSWNoC and OT-
OSWNoC. 
3.6.2 Performance Characteristics 
This section presents the performance evaluation of the proposed topologies based on 
peak bandwidth, packet energy dissipation and total power dissipation on the NoC for uniform 
random traffic at injection loads of 0.1 flits/core/cycle and 0.3 flits/core/cycle, as shown in 
Figure 3.9. Peak bandwidth is measured as the number of bits successfully arriving at their 
destinations per second at network saturation (full load of 1 flit/core/cycle). The packet energy is 
evaluated as the energy dissipated in transferring a packet from source to destination at 
corresponding injection loads to demonstrate the energy-efficiency of the proposed architectures 
at low-load scenarios. The total NoC power is evaluated as the power dissipated in all the 
switches and links of the NoC. The total NoC powers can be correlated to the mean temperature 
on the NoC and can be seen to follow a similar trend as temperatures. Thus, the NoC with lower 
total power can be reasonably assumed to have lower mean temperatures. The E-Mesh with the 
WIs along the edges have the best bandwidth and packet energy characteristics among the mesh 
based architectures as in this case all the WIs are almost at the corner of the NoC. The power 
dissipation of E-Mesh is the lowest for Mesh based architectures as can also be seen from the 
temperatures on E-Mesh. In comparison, in the C-Mesh only 4 WIs out of 8 are at the corners. 




OP-Mesh has low average hop-count compared to the E-Mesh but that results in traffic 
congestion and increased power and hence, consequently hotspots. The OT-Mesh optimizes the 
hop-count as well as the variation in traffic density and hence results in better performance-
temperature trade-off compared to the OP-Mesh, with lower power and packet energies. 
 
The SWNoC shows better performance with respect to the Mesh due to the long-range 
shortcuts. The OSWNoC is optimized for better temperature characteristics and distributed 
traffic densities. This is seen in the low power dissipation on the NoC in Figure 3.9. The decrease 
in performance, in terms of packet energy and peak bandwidth, of the OSWNoC over SWNoC 
can be attributed to an increase in average hop count in order to achieve more distributed traffic 
densities. These temperature-performance tradeoffs in architectures can be mitigated by using 
wireless links. 
The D-OSWNoC and OP-OSWNoC are optimized only for performance and hence show 
the highest throughput and least energy dissipation. However, the high performance shown by 
these placements is coupled with a degradation of the temperature optimization and an increase 
in power dissipation, leading to an increase in peak temperatures seen on the NoC. The low 
 


































































bandwidth seen on C-OSWNoC shows the temperature performance tradeoffs involved in the 
topologies. C-OSWNoC, has lower peak bandwidth since it is optimized only for well distributed 
traffic densities without considering the performance. OT-OSWNoC on the other hand is 
optimized for average hop count as well as variation in traffic densities. With such overall 
optimization including the location of the WIs, it is possible to achieve reduction in peak 
temperature compared to the wireline OSWNoC while also sustaining a 12.5% higher bandwidth 
due to the wireless links. Among the mesh based topologies the best temperature-performance 
trade-off is achieved by the E-Mesh whereas in the small-world architectures both E-OSWNoC 
and the OT-OSWNoC offer comparable trade-offs. This is because in the OT-OSWNoC the 
underlying wireline NoC is also optimized for traffic density variation while the Mesh based 
underlying NoC could not be optimized completely in the OT-Mesh. 
3.6.3 Performance Evaluation with Application-specific workloads 
 
Table 3-3: Peak temperatures on Switches for Benchmarks 
Topologies FFT CANNEAL RADIX LU 
Mesh 85.07 °C 60.37 °C 103.11 °C 85.86 °C 
E-Mesh 64.93 °C 52.53 °C 77.3 °C 63.8 °C 
OSWNoC 57.31 °C 49.96 °C 60.11 °C 56.4 °C 







The architectures with the best trade-offs from previous subsection such as E-Mesh, 
OSWNoC and OT-OSWNoC are evaluated in presence of real application based traffic from 
SPLASH-2 and PARSEC benchmarks. Each of the chosen benchmarks has a different 
communication and computation characteristics. The thermal characteristics of the entire system 
are evaluated while considering the actual power dissipation in the cores. Thermal management 
techniques are not incorporated since the impact of only the characteristics of the NoC 
architectures on the temperature is under consideration. Table 3-2 and Table 3-3 shows the peak 
temperatures on switches and links respectively in presence of each benchmark. Figure 3.10 
shows the packet energy dissipation and the NoC Power Dissipation for the benchmarks. The 
traffic densities for each of the chosen benchmarks vary in nature. While CANNEAL is most 
communication intensive as can be seen from its power dissipation on the NoC, most of the 
communication occurs between a few selective switches which result in temperature hotspots on 
Table 3-4: Peak temperatures on Links for Benchmarks 
Topologies FFT CANNEAL RADIX LU 
Mesh 89.84 °C 62.59 °C 113.44 °C 87.58 °C 
E-Mesh 67.75 °C 55.43 °C 83.96 °C 69.19 °C 
OSWNoC 56.47 °C 49.5 °C 58.94 °C 55.3 °C 




(a)  (b) 
Figure 3.10: (a) Average Packet Energy (b) Total Power Dissipation 






















the NoC. However, the average temperatures of the cores are comparatively lower for 
CANNEAL leading to better heat sinking for hotspot areas and subsequent reduction of 
temperatures. FFT and LU have lower rates of communication resulting in lower temperatures on 
the NoC. RADIX on the other hand has highly distributed traffic resulting higher powers and 
consequently, higher temperatures seen on the NoC elements in Mesh towards the internal 
regions of the chip, augmented by the presence of higher temperatures on surrounding cores 
compared to the other benchmarks. This results in higher temperatures on the chip and NoC even 
if the power dissipation on the NoC is lesser than CANNEAL by about 0.4W. Figure 3.11 shows 
the temperature distributions on the switches and links for each of the benchmarks. It can be seen 
from the leftward shift in the histogram that the OSWNoC has lower temperatures than the Mesh 
and E-Mesh. The OT-OSWNoC shows better temperature distributions with 60% links and 62% 
switches falling in 50-60°C.The OSWNoC is optimized to distribute traffic densities so as to 
obtain uniform temperatures. Each of the benchmarks shows reduction for peak temperatures on 
switches to below 60°C, with RADIX showing maximum relative reduction of temperatures. The 
OT-OSWNoC shows least power dissipation for all benchmarks along with least temperatures. 
CANNEAL has an overall higher Energy dissipation on all topologies due to the presence of few 
switches showing high communication activity. The OT-OSWNoC architecture achieves 44% 








Figure 3.11: Temperature Distribution on Links and Switches for  




Chapter 4  Temperature-Aware Task Reallocation on Wireless 
NoC Enabled Multicore chips 
The current trend of multi-core processors aims to dramatically improve the performance 
by exploiting parallelisms in the applications running on them. This performance improvement 
comes at the cost of thermal effects induced by high power densities such as temperature 
imbalances and hotspots, affecting the stability and the performance of the processor. Thermal-
Aware task scheduling aims at optimizing the performance of multi-core processors. This 
chapter deals with temperature-aware reallocation to optimize the temperatures of Multi-core 
processors using the network-on-chip architectures. 
4.1 Temperature-Aware Thread Reallocation 
The temperature aware heuristics are based on a thermal models similar to the one 
described in [42]. A multi-core chip can be abstracted as a set of rectangular regions. These 
regions can then represent a single core, a single switch or a single link. Using this level of 
abstraction, each region can then have a single power input and a single temperature 
characteristic. These set of regions of a many core chip representing the cores, switches and links 
can then be represented as a thermal RC network described by a set of linear ordinary differential 
equations. This thermal RC model is a good choice for high level estimation due to the accuracy 
and efficiency of thermal estimation [41]. For a many-core system, consisting of cores on a 
silicon layer with a heat spreader and heat sink, the relationship between power (current), 





    (4.1) 
where Ci is the thermal capacitance of node i, Ti is the node temperature as a function of the time 
t, Pi denotes the instantaneous power input at node i, Path is the set of all thermal conduction 
paths that connect with node i, Rj denotes the resistance of each thermal conduction path in Path, 
and Tj is the temperature of the adjacent node in each thermal conduction path.  
The event-driven, thermal estimator proposed in [42] is used for the temperature–aware 
task scheduling. The event driven thermal estimator is based on power events. A power event is 
an increase or decrease in power dissipated by a core. The allocation or removal of a task from a 
core generates a power event with instantaneous power change at the beginning and end of the 
task.  
A look-up table (LUT) approach is used, as described in [42], since it is shown to be 
computationally less intensive compared to other existing models, while predicting the thermal 
profile of the chip very accurately. An LUT is used to model the thermal characteristics of the 
cores using a three dimensional matrix consisting of multiple tables. Each table has the 
temperature trace of the chip, when one watt of power is injected into one particular core leaving 
the other cores untouched. Therefore, the number of tables is equal to the number of cores in the 
NoC. Each row in a particular table corresponds to the temperatures of all the cores at a 
particular instant of time. Hence, there are as many columns in each table as the number of cores. 
The number of samples required to reach the steady state temperature across the chip determines 




The future thermal map, ft
T
, at a future time, tf , can be calculated based on the current thermal 
map, ct
T
, at current time, tc , by adding the temperature increment of each core in the interval Δt 
to the current thermal map based on the power events as follows: 
cfcf ttttt





 and cf ttt
T  are N-element vectors denoting the future thermal maps, the current 
thermal map, and the temperature increment map of each of the N cores respectively. 









       (4.3) 
where, Ea is an atomic power event in the list of events E and PEa is the power associated with 
that event. LUTrow denotes a row of the LUT for time tf and tc respectively for the LUT of core 
given by CoreEa, the core where the power event occurs. 
Reallocation of threads is done using the Future Temperature Trends (FTT) heuristic [42] 
to obtain a thermally-optimal task allocation. The FTT heuristic algorithm classifies all the idle 
cores into two sets namely, Core+ for cores with increasing temperature and Core- for cores with 
decreasing temperature, based on the difference in current and predicted temperature for that 









          /
         




where, a+ is the temperature increment and a- is the temperature decrement. Allocating a queued 
task to the core with minimum weight obtained through the above process is shown to have 
thermally optimal results with best core temperatures [42]. 
4.2 Performance-Aware Reallocation 
 Temperature-aware reallocation of threads does not guarantee an optimal performance. In 
order to obtain the best performance, the task allocation heuristic needs to take into consideration 
the topology characteristics of the NoC to capture the performance of the NoC. For any 
particular core, the performance of the NoC is measured by a weight which is the average 
distance between the tasks that are already allocated, when the next queued task is mapped to 
that core. For each core i, this weight is calculated as follows –  
corecurrentifhW
SetAllocj




                (4.5) 
where, Alloc_Set is the set of all tasks that are already allocated on the NoC, hij is the distance in 
number of hops between core i and the core where task j is allocated and fij is the frequency of 
communication between the queued task mapped on core i and task j mapped on core j. 
 The weights obtained in (4.4) and (4.5) are normalized and combined to obtain a final 
weight which decides the destination core for allocating a particular thread as follows – 
        ̅           ̅        (4.6) 
Where, ̅    , and ̅     are the sets of weights obtained by normalizing the weights obtained 
in (4.4), and (4.5) respectively. Φ is the modulation parameter that controls the importance of 
core temperature, or performance on the task allocation heuristic. The heuristic adopted for the 




optimization resulting in the FTT heuristics [42]. When ϕ is set to 0, Wcore characterizes only 
network performance, and not thermal characteristics.  
 
Figure 4.1: The adopted task scheduling heuristic 
 Similar to the weight assigned to the core, the threads need to be prioritized based on the 
effect each thread will have on the temperature and performance of the system. The time taken 
by each thread (performance) to run depends on the amount of computations and communication 
involved. The computations result in power dissipation on the core and the communications 
result in power dissipation on the NOC. The power dissipation of the thread is used as a measure 
of its computations. Similarly, the number of messages sent and received by the thread is taken 
as a measure of its communication. Each thread is prioritized according to the following weight - 




where, Pthread is the power dissipation of the corresponding thread, FInOut is the total number of 
messages sent and received by the thread to and from all other threads, and    is a modulation 
parameter to give importance to either communication or computation for a benchmark. These 
weights are calculated for each thread and then normalized with respect to the maximum weight 
obtained. The thread reallocation is done by assigning the thread with highest priority obtained in 
(4.7) to the core with the lowest weight obtained in (4.6). 
4.3 Experimental Results 
The simulation work flow discussed in Chapter 3 is modified for evaluating the effects of 
thread reallocation on the obtained small world based topologies. The core powers associated 
with different allocation strategies, and their corresponding network powers in presence of the 
specific benchmarks, are fed to the HotSpot [41] simulator to obtain the temperature profiles of 
each scenario. The flow of the overall simulation process is shown in Figure 4.2. 
 





4.3.1 Thermal-Energy-Performance Tradeoffs 
The thread reallocations are done by considering two extreme values of the reallocation 
weight parameter, ϕ as described in (4.6). The initial thread placement is a random placement. 
The temperature-aware reallocations take only the core temperatures into consideration for 
reallocation. The thread allocations on the cores do not vary across architectures for either 
random or temperature-aware allocations as both these allocations are done independent of the 
NoC characteristics. Hence, core thermal characteristics are identical between the proposed 
architectures in these two cases. In case of performance-aware allocation, the network 
architecture plays a role in optimizing the overall hop count and thus, the resultant thread 
allocations differ.  
4.3.1.1 Thermal Characteristics of the Cores 
The thermal characteristics of the cores are presented by the percent of cores within 
particular temperature ranges for different benchmarks and the reduction in core temperatures 
from random task allocation to temperature aware task allocations in presence of each of the 
considered topologies. The task allocation heuristics are evaluated in presence of application 
specific benchmarks from SPLASH-2 and PARSEC benchmarks. It can be seen that 
temperature-aware task placement allows the temperature among the cores to decrease. This is 
evident from Figure 4.3 and shown by a leftward shift in the histograms of core temperatures 
among the benchmarks.  
In presence of CANNEAL traffic, it can be seen that cores with temperatures above 80°C 
in random task allocation were shifted to the lower temperature ranges in the temperature-aware 
allocation. The average reduction in core temperature between temperature-aware task allocation 




SPLASH-2 benchmarks, FFT, RADIX, and LU, it can be seen that the higher temperature cores 
in random task placement were shifted to lower temperature ranges in the temperature-aware 
allocations. The average reductions in core temperatures between temperature-aware task 
allocation and the random task allocation for all the benchmarks are shown in Table 4-1. The 
Temperature-Aware task allocation is same for all the NoC architectures used since the task 
allocation is independent of the NoC architecture.  It can be seen from Figure 4.3, nearly 15% of 
the 64 cores that were above 90°C decrease to temperatures between 70-90°C when optimizing 
LU task placement for temperature. Similar decrease of temperatures is seen in RADIX with 
10% of the cores with temperatures greater than 90°C being cooled to 70 - 90°C. FFT shows a 
balanced computation environment with the hottest temperature for random task allocation being 
 
Figure 4.3: Thermal histograms of cores showing temperature shift to the left (a) FFT (b) 




81°C. Temperature aware task reallocation reduces the temperatures for FFT by up to 2.3°C. To 
further quantify the improvement in the thermal profile by incorporating temperature-aware task 
allocation, the difference in maximum core temperature between the random and temperature-
aware task allocations is shown in Table 4-1 and denoted as ΔTmax. 
Table 4-1: Reduction in Core temperatures from random task allocation to temperature-
aware task allocation 
 FFT RADIX LU CANNEAL 
ΔTmax 2.31°C 26.37°C 22.45°C 37.91°C 
ΔTavg 0.64°C 2.32°C 3.42°C 3.17°C 
 
In the FFT benchmark, the change in maximum temperature is small because the 
benchmark is evenly distributed among the tasks. RADIX, LU, and CANNEAL have much 
higher opportunities for maximum temperature savings as those benchmarks are less evenly 
distributed. Here, the workload is dominated by a few tasks, and hence the power and 
temperature for the cores running those tasks are high. By adopting temperature-aware task 
allocation, where cores running highly active tasks are placed towards the chip edge, closest to 
the heat sink, a large reduction in temperature is achieved. As an example, in the case of the 
CANNEAL benchmark the random task placements create core temperatures as high as 117.1°C. 
This might compromise chip reliability, as it will give rise to a thermal emergency [29]. The 
temperature-aware allocation produces a hottest core temperature of 79.19°C avoiding thermal 
emergencies. 
By observing the trend in performance-aware task placement, there is a slight right shift 
from the random allocation. This is understandable due to the fact that core temperatures are not 




OT-OSWNoC and OP-OSWNoC. However, the tasks dissipate similar amounts of power 
irrespective of their locations. Hence, as can be seen in Figure 4.3, the histograms of core 
temperatures for the considered NoC architectures in presence of performance-aware task 
allocation are also nearly identical. The main difference will be observed in the temperature 
profiles of the NoC switches and links as will be explained later.  
In presence of FFT traffic, it can be seen in Figure 4.3(a), that cores in the temperature 
range from 70-80°C in random task placement were shifted to the higher temperature range (80-
90°C) in the performance-aware task allocation. The change in maximum temperatures ΔTmax, for 
FFT is 1.83°C rise in ΔTmax, on OP-OSWNoC, as compared to random allocation. The average 
chip temperatures however show a decrease of 0.14°C. This increase of maximum temperatures 
can be attributed to the shift of particular high power dissipating cores towards the areas of the 
chip showing lesser heat sinking. The performance aware task allocation is NoC dependent. The 
chosen NoC architectures both use thermally optimal Small-World based wired NoC, enhanced 
for either temperature benefits (OT-OSWNoC) or performance benefits (OP-OSWNoC) with the 
placement of wireless links. While ΔTmax is same for FFT, CANNEAL, and RADIX over all the 
NoC architectures, LU shows ΔTmax of 1.37°C with Mesh and 10.04°C with OT-OSWNoC. 
Similar trends are seen in the average chip temperatures. For the Mesh, ΔTavg is 0.51°C, 0.29°C, 
0.41°C and 0.22°C, 0.84°C and 0.36°C for OT-OSWNoC for FFT, RADIX and CANNEAL 
respectively. OP-OSWNoC shows results similar to the OT-OSWNoC, however, it differs in the 
network performance from the OT-OSWNoC as explained in the next section. Due to reductions 
in ΔTmax and ΔTavg, there is significant benefit in choosing performance-based task allocation 
over the random placement. This is due to the decrease in network latency, and network energy, 












Figure 4.4:(a) Network Latency (b) Average Energy per Packet (c) Total Network  























































































































































































































































































4.3.2 Thermal and Performance Characteristics of Network Elements 
The performance characteristics of the NoC architectures in presence of the four 
benchmarks are shown in Figure 4.4 as latency, packet energy dissipation and NoC power 
dissipation. It can be observed from Figure 4.4 that for each benchmark, when task placement is 
optimized for performance, the latency and packet energy decreases with respect to the random 
task placement. This is due to the task placement of minimizing the hop-count as explained 
previously. On the contrary, when thermal-aware task placement is used, the hop-count is not 
minimized, leading to higher average latency. In each of the cases, the latency seen by the small 
world based architectures is lower than that of the Mesh. The OP-OSWNoC being optimized for 
performance has lower latencies than the OT-OSWNoC. The NoC Power Dissipation follows a 
different trend. The increased power dissipation seen on OP-OSWNoC can be attributed to the 
wired OSWNoC which is optimized for distributed traffic densities with longer wires leading to 
an increase in power dissipation of the links. 
The packet energy dissipation is considered to compare the characteristics of the NoC 
architectures under consideration here. It can be observed from Figure 4.4 that in each 
benchmark the packet energy is lower when the task placement is optimized for performance. 
This is again, due to tasks being placed to minimize hop-count. When the hop-count is 
minimized, the packets have to traverse through less number of stages; leading directly to lower 
packet energies. Also, the benefits of the hop-count minimization are greater in the OP-
OSWNoC as it takes the most advantage of performance-aware task placement However, this 
leads to an increased activity on the switches and hence to an increase in NoC power dissipation 
as seen in Figure 4.4. Again, the OT-OSWNoC and OP-OSWNoC have lower packet energies 




decreases significantly due to the thermally optimal small world NoC which has uniformly 
distributed traffic density characteristics. As compared to the Mesh, the average hop count of 
both the small-world based architectures is significantly reduced and hence, on the average, 
packets have to traverse through less number of switches and links. In addition, a significant 
amount of traffic traverses through the energy efficient wireless channels; consequently allowing 
the wired interconnect power dissipation to decrease.  
The thermal characteristics of the network are shown in Figure 4.5. The thermal 
characteristics of the NoC depend highly on the type of traffic the NoC is being subjected to and 
the subsequent power dissipation on the NoC.  RADIX and LU have irregular traffic densities 
causing higher temperatures. FFT on the other hand is highly computational and has balanced 
traffic densities. CANNEAL is optimized for balanced computational load showing few threads 
with high communication as compared to other threads. For all the benchmarks, the cores govern 
the overall system temperature. Thus, the high power dissipation seen on the NoC for 
CANNEAL due to network hotspots does not result in very high temperatures due to heat 
sinking on surrounding cores which dissipate lower temperatures. 
After a significantly high reduction in core temperature, by performing temperature-
aware task placement, the network becomes the bottleneck. Here, the difference between 
architectures becomes noticeable. When the network plays an important role in the overall 
thermal characteristics of the chip, the benefits of the OT-OSWNoC can be clearly seen as 
shown in Figure 4.5. RADIX shows higher switch temperatures due to irregular traffic densities 
on switches. The OSWNoC is optimized for temperatures and inherently has lower temperatures 
than the Mesh. The OT-OSWNoC shows a shift towards the left in the histograms as compared 




switches that have temperatures greater than 80°C with RADIX showing a few links with 
temperatures greater than 100°C. The OT-OSWNoC and OP-OSWNoC have temperatures less 
than 70°C in each case with all temperatures falling less than 50°C for CANNEAL. Table 4-2 
shows the ΔTavg and ΔTmax over mesh for the links and switches for each NoC architectures at 
random placement. In the highest temperature scenario (RADIX with random task allocation), 
the temperatures of the switches and the wireline links in the mesh architecture have, on average, 
been reduced by 17.75°C and 19.34°C, respectively in the OT-OSWNoC. OP-OSWNoC shows 
similar reductions of 16.6°C and 18.2°C for the switches and wireline links respectively. 
Similarly, for other benchmarks, reductions in switch and link temperatures are seen to be about 
10-12°C for the all other benchmarks.The reductions of OT-OSWNoC are due to the overall 












Figure 4.5: Thermal histograms of Switches and Wireline Links showing 




The OP-OSWNoC uses the same thermally optimized topology for wired NoC with 
wireless links placed to optimize performance. The trade-off seen between the two small world 
topologies is between temperature and performance, with OT-OSWNoC showing better 
temperatures in presence of all benchmarks at a cost of higher packet energy as compared to OT-
OSWNoC which gains performance at the cost of higher temperatures. 
Table 4-2: Reduction in NoC Temperatures 
 
OT-OSWNoC OP-OSWNoC 
FFT RADIX LU CANNEAL FFT RADIX LU CANNEAL 
Switch 
ΔTmax 29.11°C  46.31°C 30.99°C 11.41°C 29.59°C 44.75°C 30.61°C 12°C 
Switch 
ΔTavg 13.9°C 17.67°C 14.43°C 4.87°C 13.56°C 16.86°C 14.63°C 5.35°C 
Link 
ΔTmax 36.15°C 57.77°C 33.99°C 13.9°C 35.55°C 56.51°C 34.26°C 14.72°C 
Link 
ΔTavg 15.4°C 19.34°C 15.98°C 5.45°C 15.07°C 18.61°C 16.17°C 5.91°C 
 
Temperature-aware allocation results in frequently communicating tasks being placed 
without performance optimization. This is seen in the higher temperatures seen for Temperature 
aware allocations on each of the NoC architecture from Figure 4.5. Performance aware allocation 
results in lower temperatures on the NoC architectures. The reduced temperatures can be 
attributed to placing highly communicating tasks with least hops together. Thus the power 
dissipation by the inter-task communication is subjected on less NoC elements and reduces the 
temperatures. As an example, the RADIX benchmark is shown so that the effects of task 
allocation on the network thermal characteristics can be clearly seen. Table 4-3 shows the two 
parameters, ΔTmax, and ΔTavg for OT-OSWNoC and OP-OSWNoC compared to the Mesh, for the 
different task allocation strategies. First, by analyzing the difference in architectures with random 
task placement, it can be seen that there is a large leftward shift in switch and link temperatures. 




60°C, while all the switches for OT-OSWNoC and OP-OSWNoC lie below 60°C. Similarly, in 
the wireline link segments, 65% in the mesh are above 60°C, while there are none above 60°C in 
the OT-OSWNoC and OP-OSWNoC. These drastic shifts in histograms correspond to the 
maximum and average temperature reductions in these two NoC architectures compared to the 
Mesh as shown in Table 4-3 
Table 4-3: Reductions in network temperatures from Mesh to OT-OSWNoC and OP-
OSWNoC architectures with RADIX traffic under various task placements 
 
OT-OSWNoC OP-OSWNoC 
Random Temperature Performance Random Temperature Performance 
Switch 
ΔTmax 46.31°C 48.72°C 24.16°C 44.75°C 46.52°C 22.26°C 
ΔTavg 17.67°C 20.95°C 13.93°C 16.86°C 19.83°C 13.42°C 
Link 
ΔTmax 57.77°C 57.09°C 30.17°C 56.51°C 54.92°C 29.11°C 
ΔTavg 19.34°C 22.93°C 15.79°C 18.61°C 21.80°C 15.30°C 
 
By incorporating performance-aware task placement, the change in temperature is further 
increased between the architectures. While both architectures are optimized to perform energy-
efficiently within the network when tasks are placed for performance the inherent benefits of the 
OT-OSWNoC architecture can be seen. From the random task placement to the performance-
aware task placement, the OP-OSWNoC saves 7.1% total network energy, while the OT-
OSWNoC saves 17.5% energy over its random task placement. This is again due to the thermally 
optimal small-world architecture and energy-efficient wireless links placement to enhance 
performance. The OPSWNoC being optimized for performance shows lower effect with 
performance reallocation. The OT-OSWNoC being optimized for temperature has greater space 
for energy reduction. 
Thermal-aware task placement introduces the network-induced hotspots. In the mesh 




network. The large spike in the number of switches and wireline links in the 90-100°C range for 
the Mesh architecture shows the trade-off involved between performance and temperature 
optimizations. This problem is somewhat unmanageable, since a majority of the network is 
becoming a thermal hotspot. The OSWNoC wireline architecture reduces the hotspot 
temperatures significantly. The optimization for OSWNoC for uniform distribution in traffic 
densities reduces the temperatures seen for this hotspot region to less than 60°C on switches as 
well as wireline links. The wireless nodes placed on this thermally optimal wired NoC 
architecture tend to enhance its performance. The OT-OSWNoC performs better thermally at the 
cost of small reduction in performance while the OP-OSWNoC has better performance with 





Chapter 5  Conclusions and Future Work 
In this work, various design methodologies were explored to obtain optimal temperature, 
performance and energy trade-offs in Network-on-Chip architectures enhanced with the use of 
on-chip wireless links. Dynamic Thermal Management techniques were further incorporated into 
complete multiprocessor systems using the obtained architectures to evaluate the temperature-
performance trade-offs seen on the entire System-on-Chip in presence of optimized 
communication backbone infrastructures. This chapter summarizes the results obtained during 
this thesis work. 
Experimental results on comparing different architectures obtained by the proposed 
design methodologies showed that the performance of a multicore chip is influenced by its 
overall communication infrastructure. The existing method of implementing a NoC with planar 
metal interconnects is deficient due to high latency, significant power dissipation, and 
temperature hotspots arising out of long, multi-hop wire-line links used in data exchange as can 
be seen from the performance and temperature results for the Mesh and wired Small-world based 
networks. Wireless NoCs optimized for best performance do not necessarily guarantee a 
thermally optimal interconnection architecture. Consequently, the location and utilization of the 
wireless links plays an important role towards performance-temperature trade-off optimizations. 
The proposed design method for optimizing both wired small-world and the placement of 
wireless interconnects results in a 55% improvement in peak bandwidths seen on a saturated 
network compared to the regular mesh and a 12.5% improvement of peak bandwidth over the 
optimized wireline small-world architecture. These performance benefits also occur at a 39% 
reduction of temperatures in comparison with the Mesh and 5% reduction compared to the 




to thermal optimizations can be enhanced with the use of wireless interconnects along with 
further reductions in temperature. The architecture is also resilient to localized traffic hotspots 
with up to 53% reduction of packet energies and 20% reduction in temperatures compared to the 
Mesh. 
The proposed architecture OT-OSWNoC was incorporated into a multicore system with 
Dynamic Thermal Management heuristics based on task re-allocations. Task reallocations were 
done either to enhance performance or to enhance temperatures. The experimental results for 
thermal management heuristics show that the small world based architectures inherently perform 
better and show lower temperatures. An overall thermally optimal NoC like the OT-OSWNoC 
shows lower link and switch temperatures in presence of temperature aware allocation there by 
reducing the overall temperatures of the cores. The OP-OSWNoC has wireless links placed for 
performance benefits and has higher performance in a temperature aware task reallocation. A 
temperature-performance trade-off is evident when performance aware reallocation is done on 
each of the architectures either resulting in minor increase in temperatures as is seen for the OT-
OSWNoC compared to the OP-OSWNoC. 
The future challenges to this work involve further improvement of trade-offs seen on 
temperature resilient wireless NoCs with adaptive routing policies. This can provide further 
performance gains while successfully reducing temperatures on the NoC by avoiding traffic 
congestions. Another challenge in the improvement of the Wireless NoC architectures is the 
improvement of reliability of wireless interconnects through fault-tolerant Wireless NoC 
architectures. In addition, other wireless transceiver implementations like mm-wave metal 





[1] S. Borkar, "Obeying Moore's law beyond 0.18 micron [microprocessor design]," in 
Proceedings of 13th Annual IEEE International ASIC/SOC Conference, 2000, pp. 26-31. 
[2] National Technology Roadmap for Semiconductors.: Semiconductor Industry Association, 
1997. 
[3] D. Flynn, "AMBA: enabling reusable on-chip designs," IEEE Micro, vol. 17, no. 4, pp. 20-
27, 1997. 
[4] R. Hofmann and B. Drerup, "Next generation CoreConnect processor local bus 
architecture," in Proceedings of 15th Annual IEEE International ASIC/SOC Conference, 
2002, pp. 221-225. 
[5] R. Ho, K.W. Mai, and M.A. Horowitz, "The Future of Wires," Proceedings of the IEEE, 
vol. 89, no. 4, pp. 490 - 504, 2001. 
[6] P. Kapur, G. Chandra, J.P. McVittie, and K.C. Saraswat, "Technology and reliability 
constrained future copper interconnects - Part II: Performance Implications," IEEE 
Transactions on Electronic Devices, vol. 49, no. 4, pp. 598-604, April 2002. 
[7] D. Sylvester and K. Keutzer, "Impact of Small Process Geometries on Microarchitectures in 
Systems on a Chip," Proceedings of the IEEE, vol. 89, no. 4, pp. 467-489, April 2001. 
[8] P. Pande, C. Grecu, M. Jones, A. Ivanov, and R. Saleh, "Performance evaluation and design 




Computers, vol. 54, no. 8, pp. 1025-1040, 2005. 
[9] L. Shang, L. Peh, A. Kumar, and N.K. Jha, "Temperature-Aware On-Chip Networks," IEEE 
Micro, 2006. 
[10] S. Deb et al., "Enhancing Performance of Network-on-Chip Architectures with Millimeter-
Wave Wireless Interconnects," in Proceedings of ASAP, 2010, pp. 73-80. 
[11] A. Ganguly et al., "Scalable Hybrid Wireless Network-on-Chip Architectures for Multi-
Core Systems," IEEE Transaction on Computers, 2010. 
[12] V.F. Pavlidis, V.F. Pavlidis, E.G. Friedman, and E.G. Friedman, "3-D Topologies for 
Networks-on-Chip," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 
vol. 15, no. 10, pp. 1081-1090, 2007. 
[13] M. Brière et al., "System Level Assessment of an Optical NoC in MPSoC Platform," in 
Proceedings of DATE, 2007. 
[14] A. Shacham, A. Shacham, K. Bergman, and L.P. Carloni, "Photonic Networks-on-Chip for 
Future Generations of Chip Multiprocessors," IEEE Transactions on Computers, vol. 57, 
no. 9, pp. 1246 - 1260, 2008. 
[15] M.F. Chang et al., "CMP Network-on-Chip Overlaid With Multi-Band RF-Interconnect," in 
Proceedings of IEEE International Symposium on High-Performance Computer 
Architecture (HPCA), 2008. 




IEEE Journal of Solid-State Circuits, vol. 42, no. 8, pp. 1678-1687, August 2007. 
[17] K. Kempa et al., "Carbon Nanotubes as Optical Antennae," Advanced Materials, vol. 19, pp. 
421-426, 2007. 
[18] B.A Floyd, C. Hung, and K.K. O, "Intra-chip wireless interconnect for clock distribution 
implemented with integrated antennas, receivers, and transmitters," IEEE Journal of Solid-
State Circuits, vol. 37, no. 5, pp. 543-552, May 2002. 
[19] D. Zhao and Y. Wang, "SD-MAC: Design and Synthesis of a Hardware-Efficient Collision-
Free QoS-Aware MAC Protocol for Wireless Network-on-Chip," IEEE Transactions on 
Computers, vol. 57, no. 9, pp. 1230 - 1245, 2008. 
[20] S. Lee et al., "A scalable micro wireless interconnect structure for CMPs," in Proceedings of 
the 15th annual International Conference on Mobile Computing and Networking, 2009, pp. 
217-228. 
[21] V. Hanumaiah, S. Vrudhula, and K. Chatha, "Maximizing performance of thermally 
constrained multi-core processors by dynamic voltage and frequency control," in 
Proceedings of the ICCAD, 2009, pp. 310 - 313. 
[22] P. Chaparro et al., "Understanding the Thermal Implications of Multi-Core Architectures," 
IEEE Transactions on Parallel and Distributed Systems, vol. 18, no. 8, pp. 1055 - 1065, 
2007. 




NoC Architectures with Wireless Links," in Proceedings of NOCS, 2011. 
[24] S. Deb, A. Ganguly, P. Pande, B. Belzer, and D. Heo, "Wireless NoC as Interconnection 
Backbone for Multicore Chips: Promises and Challenges," IEEE Journal on Emerging 
Selective Topic Circuits Systems, vol. 2, no. 2, pp. 228-239, 2012. 
[25] D. DiTomaso, A. Kodi, S. Kaya, and D. Matolak, "iWise: Inter-router wireless scalable 
express channels for Networks-on-Chips (NoCs) architecture," in Proceedings of IEEE 
HOTI, 2011, pp. 11-18. 
[26] T. Petermann and P. Rios, "Spatial small-world networks: A wiring-cost perspective," in 
arXiv preprint cond-mat/0501420 (2005). 
[27] J. Murray, P. Pande, and B. Shirazi, "DVFS-enabled sustainable wireless NoC architecture," 
in Proceedings of IEEE SOCC, 2012. 
[28] I. Yeo, C.C. Liu, and E.J. Kim, "Predictive dynamic thermal management for multicore 
systems," in Proceedings of DAC, 2008, pp. 734-739. 
[29] H.F. Sheikh, H. Tam, I. Amhad, S. Ranka, and B. Phanisekhar, "Energy-and performance-
aware scheduling of tasks on parallel and distributed systems," ACM Journal on Emerging 
Technologies and Computing Systems (JETC), vol. 8, no. 4, 2012. 
[30] D. Cuesta, J. Hidalgo, J. Ayala, D. Atienza, A. Acquaviva, and E. Macii, "Adaptive task 





[31] T. Ge, P. Malani, and Q. Qui, "Distributed Task Migration for thermal management in 
many-core systems," in Proceedings of DAC, 2010, pp. 579-584. 
[32] U.Y. Ogras and R. Marculescu, ""It's a small world after all": NoC performance 
optimization via long-range link insertion," IEEE Transactions on Very Large Scale 
Integration (VLSI) Systems, vol. 14, no. 7, pp. 693-706, July 2003. 
[33] D.J. Watts and S.H. Strogatz, "Collective Dynamics of 'Small World' Networks," Nature, 
vol. 393, pp. 440-442, 1998. 
[34] P.J. Burke, P.J. Burke, S. Li, and Z. Yu, "Quantitative Theory of Nanowire and Nanotube 
Antenna Performance," IEEE Transactions on Nanotechnology, vol. 5, no. 4, pp. 314-334, 
July 2006. 
[35] J. Duato, S. Yalamanchili, and L.M. Ni, Interconnection Networks-An Engineering 
Approach.: Morgan Kaufmann, 2002. 
[36] O. Lysne et al., "Layered routing in irregular networks," IEEE Transactions on Parallel and 
Distributed Systems, vol. 17, no. 1, pp. 51-65, January 2006. 
[37] N. Binkert et al., "The GEM5 Simulator," ACM SIGARCH Computer Architecture News, 
vol. 39, no. 2, pp. 1-7, August 2011. 
[38] S. Woo, M. Ohara, E. Torrie, J. Singh, and A. Gupta, "The SPLASH-2 programs: 





[39] C. Bienia, "Benchmarking modern multiprocessors," Ph.D. Dissertation, Princeton Univ., 
January 2011. 
[40] S. Li et al., "McPAT: An integrated power, area, and timing modeling framework for 
multicore and manycore architectures," in Proceedings of the International Symposium on 
Computer Architecture, 2009, pp. 469-480. 
[41] K. Skadron, M. R. Stan, W. Huang, S. Velusamy, and K. Sankaranarayanan, "Temperature-
aware microarchitecture," in Proceedings of ISCA, 2003, pp. 2-16. 
[42] J. Cui and D.L. Maskell, "A Fast High-Level Event-Driven Thermal Estimator for Dynamic 
Thermal Aware Scheduling," IEEE Transactions on Computer-Aided Design of Integrated 
Circuits and Systems, vol. 31, no. 6, pp. 904-917, June 2012. 
 
 
 
 
 
 
 
 
 
