Global Congestion and Fault Aware Wireless Interconnection Framework for Multicore Systems by Shahriat, Sajeed Mohammad
Rochester Institute of Technology 
RIT Scholar Works 
Theses 
5-2019 
Global Congestion and Fault Aware Wireless Interconnection 
Framework for Multicore Systems 
Sajeed Mohammad Shahriat 
sms9874@rit.edu 
Follow this and additional works at: https://scholarworks.rit.edu/theses 
Recommended Citation 
Shahriat, Sajeed Mohammad, "Global Congestion and Fault Aware Wireless Interconnection Framework 
for Multicore Systems" (2019). Thesis. Rochester Institute of Technology. Accessed from 
This Thesis is brought to you for free and open access by RIT Scholar Works. It has been accepted for inclusion in 
Theses by an authorized administrator of RIT Scholar Works. For more information, please contact 
ritscholarworks@rit.edu. 
 
 
 
 
 
 
 
 
 
 
Global Congestion and Fault Aware Wireless 
Interconnection Framework for Multicore Systems 
 
Sajeed Mohammad Shahriat 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Global Congestion and Fault Aware Wireless 
Interconnection Framework for Multicore Systems  
 
 
Sajeed Mohammad Shahriat 
 
  
 
 
A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of   
Master of Science 
In 
Electrical Engineering 
 
  
Supervised by   
Dr. Amlan Ganguly   
Department of Computer Engineering   
Kate Gleason College of Engineering   
Rochester Institute of Technology   
Rochester, NY   
May 2019 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Department of Electrical and Microelectronic Engineering 
i 
 
Global Congestion and Fault Aware Wireless 
Interconnection Framework for Multicore Systems  
 
Sajeed Mohammad Shahriat 
May 2019 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Committee Approval: 
 
 
 
--------------------------------------------------------------------------------------------------------------------- 
Dr. Amlan Ganguly, Advisor             Date 
Associate Professor – R.I.T Dept. Of Computer Engineering 
 
 
 
--------------------------------------------------------------------------------------------------------------------- 
Dr. Andres Kwasinski                 Date 
Professor – R.I.T Dept. Of Computer Engineering 
 
  
 
--------------------------------------------------------------------------------------------------------------------  
Dr. Panos P. Markopoulos            Date 
Assistant Professor – R.I.T Dept. Of Electrical and Microelectronic Engineering 
 
 
 
--------------------------------------------------------------------------------------------------------------------- 
Dr. Sohail Dianat                  Date 
Department Head – R.I.T Dept. Of Electrical and Microelectronic Engineering 
ii 
 
ACKNOWLEDGEMENTS 
 
This thesis would not have been possible without the motivational and intellectual support of many 
people. First and foremost I would like to thank my advisor Dr. Amlan Ganguly, who has mentored 
and guided me almost throughout my time at RIT. He has taught me invaluable research skills and 
has helped shape the work that is presented in this thesis book. My sincere thanks also goes to Dr. 
Andres Kwasinski and Dr. Panos P. Markopoulos to agreeing to be my thesis external committee 
members and provide there invaluable ideas and suggestions wherever needed. I would also like 
to thank all my mentors and colleagues I made during my internship at AMD, especially Ray 
Talacka, Steve Anderson and my manager David Meyerhofer. Lastly, but not the least I would like 
to thank my family and friends who has been a constant source of emotional support during my 
entire time here at RIT.    
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
iii 
 
ABSTRACT 
Multicore processors are getting more common in the implementation of all type of computing 
demands, starting from personal computers to the large server farms for high computational 
demanding applications. The network-on-chip provides a better alternative to the traditional bus 
based communication infrastructure for this multicore system. Conventional wire-based NoC 
interconnect faces constraints due to their long multi-hop latencies and high power consumptions. 
Furthermore high traffic generating applications sometimes creates congestions in such system 
further degrading the systems performance. 
In this thesis work, a novel two-state congestion aware wireless interconnection framework for 
network chip is presented. This WiNoC system was designed to able to dynamically redirect traffic 
to avoid congestion based on network condition information shared among all the core tiles in the 
system. Hence a novel routing scheme and a two-state MAC protocol is proposed based on a 
proposed two layer hybrid mesh-based NoC architecture. The underlying mesh network is 
connected via wired-based interconnect and on top of that a shared wireless interconnect 
framework is added for single-hop communication. The routing scheme is non-deterministic in 
nature and utilizes the principles from existing dynamic routing algorithms. The MAC protocol 
for the wireless interface works in two modes. The first is data mode where a token-based protocol 
is utilized to transfer core data. And the second mode is the control mode where a broadcast-based 
communication protocol is used to share the network congestion information. The work details the 
switching methodology between these two modes and also explain, how the routing scheme 
utilizes the congestion information (gathered during the control mode) to route data packets during 
normal operation mode. The proposed work was modeled in a cycle accurate network simulator 
and its performance were evaluated against traditional NoC and WiNoC designs. 
iv 
 
Abbreviations 
1. NoC: Network On Chip  
2. IC: integrated circuits  
3. SoC: System-on- chip  
4. MPSoC: Multi-processor System-on-chip  
5. WDM: Wavelength Division Multiplexing  
6. EM: Electromagnetic  
7. TSV: Through Silicon Via  
8. CMOS: Complementary MOSFET   
9. MOSFET: Metal Oxide Semiconductor Field Effect Transistor  
10. UWB: Ultrawideband  
11. CNT: Carbon nanotube  
12. WI: Wireless Interface  
13. CDMA: Code Division Multiple Access  
14. TDMA: Time Division Multiple Access  
15. WiNoC: Wireless Network-on-Chip  
16. BFT: Butterfly Fat Tree  
17. MAC:  Media Access Control  
18. VC: Virtual Channel  
19. OOK: On-Off Keying  
20. PIR: Packet Injection Rate  
 
 
 
 
 
 
 
v 
 
TABLE OF CONTENTS 
Signature Sheet.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .   i 
Acknowledgements.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .   ii 
Abstract.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  iii 
Abbreviations.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .   iv 
Table of Contents.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  v 
List of Tables.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .vii 
List of Figures.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . vii 
Chapter 1: INTRODUCTION.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .   1 
1.1: Emerging Interconnect Technologies.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .   3 
1.2: Designing wireless interconnect- Challenges and Benefits.  .  .  .  .  .  .  .  .  .  .  .  .  .   5 
1.3: Significance of Routing schemes, communication protocols and selection strategy 
in NoCs.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  9 
1.4:  Fault Tolerance in NoCs.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  10 
1.5:  Contributions of this thesis work.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 10 
1.6: Thesis organization.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  12 
Chapter 2: RELATED WORKS.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 14 
Chapter 3: SYSTEM ARCHITECTURE.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  19 
3.1: Proposed WiNoC topology and design.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  19  
3.2:  Wireless interface physical layer. .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .22 
3.3: Operation modes.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .26 
3.4: Routing scheme and controller design.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .30 
3.5: Example operation.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .34 
vi 
 
3.6: Simulation setup and methodology.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  39 
3.7: Performance evaluation under Uniform Random Traffic.  .  .  .  .  .  .  .  .  .  .  .  .  .  .42 
3.8: Performance evaluation under Transpose Traffic.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .45 
3.9: Performance evaluation under Hotspot Traffic.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 48 
3.10: Energy consumption.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 51 
Chapter 4: FAULT TOLERANCE STUDY.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .53 
Chapter 5: CONCLUSION AND FUTURE WORK.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  57 
Bibliography.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  59 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
vii 
 
LIST OF TABLES 
Table I: General and wireless configurations for simulation.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 40 
LIST OF FIGURES 
Figure 1: Proposed 8x8 WiNoC Framework.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . .  .  . .  .  . .  .  .  19 
Figure 2: Proposed subnet architecture.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . .  .  . .  .  . .  .  . .  . .   21 
Figure 3: proposed zig-zag antenna placement on the die.  .  .  .  . .  .  . .  . .  .  . .  . .  .  . .  . .  .   22 
Figure 4: (a) Transmitter (b) receiver block diagram.   . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . . 23 
Figure 5: (a) Control packet (b) State diagram.  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . 26 
Figure 6: The routing scheme flowchart.  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . .  . .    30 
Figure 7: Block diagram for the router architecture.  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . .  . 33 
Figure 8: Network condition for the case scenario 1.  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  .  . .35 
Figure 9: Network condition for the case scenario 2 (a) adjacent (b) diagonal.  . .  .  . .  . .  .  . .  36 
Figure 10: Network condition for the case scenario 3.  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . . 37 
Figure 11: Network condition for the case scenario 4.  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . . 38 
Figure 12: Global average delay VS PIR for network sizes (a) 6x6 (b) 8x8 (c) 10x10 under Uniform 
Random Traffic .  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . .  .  .  .  .  .42-43 
Figure 13: Throughput VS PIR for network sizes (a) 6x6 (b) 8x8 (c) 10x10 under Uniform Random 
Traffic.  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . .  .  .  .  .  .  .43-44 
Figure 14: Global average delay VS PIR for network sizes (a) 6x6 (b) 8x8 (c) 10x10 under 
Transpose Traffic .  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . 45-46 
viii 
 
Figure 15: Throughput VS PIR for network sizes (a) 6x6 (b) 8x8 (c) 10x10 under Transpose 
Traffic.  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . .  . 46-47 
Figure 16: Global average delay VS PIR for network sizes (a) 6x6 (b) 8x8 (c) 10x10 under Hotspot 
Traffic .  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . .  .  .  .  .  . 48-49 
Figure 17: Throughput VS PIR for network sizes (a) 6x6 (b) 8x8 (c) 10x10 under Hotspot Traffic.  
. .  .  . .  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . .  .  .  .  .  .  .49 -50 
Figure 18: Total energy consumption for three simulated systems.  . .  .  . .  . .  .  . .  . .  .  . .  . . 51 
Figure 19: Fault modeling and the Hotspot tiles.  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  . .  . .  .  53  
Figure 20: 8x8 (a) Global Average Delay VS PIR (b) Throughput VS PIR.  . .  .  . .  . .  .  . .  .  54 
Figure 21: 10x10 (a) Global Average Delay VS PIR (b) Throughput VS PIR.  . .  .  . .  . .  .  . . 55
1 
 
Chapter 1: INTRODUCTION 
Transistor scaling has come a long way since the Moore’s law was presented. Current industry 
trends show that, transistors will no longer will be able to be scaled (effectively) after the year 
2021 [1]. With regards to that, the single uniprocessor systems also seem to be a non-viable option 
these days due to the computational demands of modern workload. As this would require a single 
processor to work at a very high frequency which in turn will cause processors to become very 
power hungry. Instead of increasing the frequency both industry and research has focused on 
creating multi-processors system-on-chips (MPSoC), where identical processing cores will 
execute tasks at lower clock speed simultaneously, instead of one processing core operating at a 
higher frequency and power rating. To give some examples of such MPSoCs we can look into 
Intel’s 80 core Polaris [2] and the 48 cores Single Chip Cloud Computer (SCC) [3], Tilera’s 64 
core TILE64 [4] and Cavium’s 32-64 cores ThunderX2 [5] (most recent) amongst other multicore 
systems. In addition to higher throughput at the same clock frequency, multicore systems allows 
for the execution of complex task at a comparatively lower energy cost than a single core 
processor. 
The bottleneck in such multicore system is the underlying communication infrastructure that needs 
to be developed in order for these cores to communicate with each other and maintain a coherency 
in terms of executing tasks in parallel. Currently developed general purpose CPUs are multicore 
systems consisting of core count ranging from 4 to 16 cores [6]. Most of these systems uses some 
form of a shared bus-based interconnection systems which are incompatible for systems mentioned 
above consisting of cores ranging from 48 to 80 cores! In addition to the network slow down, a 
failure in such shared bus-based system would cause the entire communication backbone to fail 
causing the entire system to non-operational.   
2 
 
In order for the aforementioned large systems to communicate efficiently high-performance 
Network on Chip (NoC) architectures were developed to act as the communication fabric. NoC as 
its name suggest is a network-based communication system which can be implemented into 
integrated circuits. The major advantage of NoC over shared bus based system is, it is more 
scalable and reliable due to its modular design and multi-path architecture. Past research has 
looked into various NoC architectures such as Mesh, Ring, Folded Torus, Butterfly Fat Tree, Small 
World [7, 8]. Each of these architectures has its own advantage and disadvantage but, this study 
will focus mostly on the Mesh system since the multicore systems mentioned above uses a mesh 
network due to its symmetrical nature which makes it easier to physically implement such large 
systems. Since it a symmetrical system, each link in the system is identical to each other hence 
maintaining a uniform energy consumption across the same workload. 
Traditional NoCs uses planar metallic interconnect which requires data to travel through multiple 
hops across an underlying wired path in forms of packets. In a large system the energy required to 
route such packet is higher since it requires more hop to communicate between cores thus limiting 
any performance gain. Besides power gain another issue is the network latency. Since data has to 
travel over a wired path it will require buffers to make sure no data is lost over the long range. This 
in turn cause the system latency to increase, which degrades the overall performance of the system. 
In order to improve system performance by addressing these issues emerging interconnect 
technologies have been proposed by researchers and in the next subsection will discuss some of 
them in details. 
 
 
3 
 
1.1: Emerging Interconnect Technologies 
State of the art interconnect technologies can be broadly categorized into 4 categories namely, 
Photonic interconnects, RF Interconnect, Wireless Interconnect and finally 3D Interconnect. Each 
of the categories will be discussed in details below: 
A. 3D Integration: Three-dimensional integration of wired interconnects exploits a SoCs 
ability to be stack multiple IPs on top of each other. The metallic interconnect is passed 
through the silicon substrates by using special vias such as TSVs. This allows in the 
reduction of length for long distance communication hence decreasing both latency and 
power consumption. 
But due to its complex routing nature which requires communicating core to be aligned in 
such a way which allows for seamless intra-layer communication. The multi-layer 
technique also makes testing and adding test structure to the system more complex which 
is ok if the cores communicating are simple IPs (such as memory cell stacks). But structures 
such as Processing/Computing units require a large amount of data to be communicated 
between Processing/Computing units which make communication using TSVs very 
cumbersome in nature. Furthermore the 3D interconnect designs are more prone to heating 
due the presence densely packed wires between silicon layers and due to the lack of proper 
cooling mechanism for it.  
B. Photonic Interconnect: Instead of metallic wires photonic interconnects utilizes on-chip 
laser source, optical waveguides and resonators. Since data is transmitted at the speed of 
light, the latency is significantly reduced [9, 10] and since the data is travelling in the form 
of light through an optical waveguide, there is minimal loss and thus does require constant 
buffering. Another advantage of using photonic is that, the principal of multiplexing 
4 
 
multiple light waves using WDM techniques allows for multiple data source to traverse 
through the MPSoC using a single waveguide. 
 The issue with this technology is that current design and fabrication tools does not support 
any kind of photonic interconnect structures which makes any research in this field 
intangible. Lasers used in this type of systems are very power hungry in nature and requires 
large structures to build them thus increasing the power consumption and laying out 
waveguide increase the overhead of such SoCs significantly. Finally the waveguides 
themselves have bending loss and electro-optical conversion itself requires a lot of 
additional overhead. 
C. RF Interconnect: The RF interconnect as its name suggest uses EM waves which are 
transmitted over length of wire which acts as an EM waveguide. This allows for single hop 
communications between cores thus decreasing the latency. Furthermore, the latency of 
such systems can be further improved by applying similar multiplexing technique as seen 
in the Photonic interconnect architectures. FDMA and CDMA techniques implemented in 
certain research [11, 12] showed these further improvements in performance. 
The issue faced in terms of RF interconnect is similar to that of the photonic interconnect 
in essence that the EM waveguide and the high frequency oscillators are needed to be laid 
out throughout the MPSoC, which again is not supported by recent design and fabrication 
tools.   
D. Wireless Interconnect: In principle wireless interconnects communicates using EM waves 
similar to RF interconnect but unlike RF interconnect, wireless interconnect does not 
require any form of waveguides due to the introduction of specialized on-chip wireless 
interfaces. This means that the advantages seen in the RF interconnect can be exploited in 
5 
 
this architecture with none of the drawbacks of the RF interconnect due to the absence of 
the waveguide and the high frequency oscillators. And due to the absence of physical layout 
the wireless interconnect is able to stand out from the other emerging technology discussed 
above. In this work we will be utilizing wireless interconnect to show improvement in 
performance for previously mentioned large MPSoC systems.  
Since Wireless Interconnect is the choice of architecture of this work, the challenges faced 
in designing such Interconnect will discussed in detail in the next subsections. 
1.2: Designing wireless interconnect- Challenges and Benefits 
In the previous subsection it was seen how emerging technology can be used in designing of a 
communication fabric for MPSoC systems. It needs to be pointed out that most research uses this 
emerging technology-based interconnect on top of wired interconnect system usually mesh [13, 
14, 15]. Therefore, the resulting MPSoC system consisting of planar metallic wire and the “state-
of-the-art” interconnect combined to form a Hybrid System which enhances the traditional NoC’s 
performance and ability. In this work we will look into such hybrid MPSoC system consisting of 
a planar wired mesh system and a wireless interconnect framework integrated to it. 
As mentioned earlier the wireless interconnect has special structures called wireless hubs which 
enable wireless communication between the IP cores. This hub can be placed adjacent to the IP 
cores and based on the design and research objective these routers can be implemented in multiples 
ways but two important components of these hubs need to be present for successful wireless 
transmission. These two components are (1) The Antenna, (2) The Transceiver.  
Recent research has shown that these on-chip antennas and transceivers can be designed in 
miniature scales to be implemented in such NoCs [16 -19]. This miniature antennas and transceiver 
6 
 
can work in frequencies ranging from megahertz to terahertz range. Some of this antenna and 
transceiver technologies are detailed below: 
A. CMOS Ultra-Wideband (UWB) technology: This design is a more popular choice in the 
RF interconnect architecture. Simple and small transceivers and antennas was shown to be 
to be operating over a 100-500GHz frequency range as wireless interconnect [20].   But 
due to the impulse based transceivers the effective range of such routers are limited to only 
few millimeters [17]. 
B. Graphene/CNT based technology: Carbon based structures such as graphene and carbon 
nanotubes have been explored in antenna designs in recent researches [21, 22]. The 
advantage over the UWB antennas is that that unlike UWB Graphene/CNT antennas can 
transmit data at frequencies in terahertz range thus increasing the overall bandwidth of the 
system. But the issue with such devices is, integrating carbon-based structure in the CMOS 
process is a very complex fabrication process in itself. Furthermore Graphene/CNT based 
structures are very unreliable and are prone to high failure rate. 
C. Millimeter-Wave technology: mm-wave antennas has been shown to transmit data from a 
range of 10 to one hundred GHz range. It was also seen through research [18] that CMOS 
compatible wireless shortcuts operating in the mm-wave frequencies are able to 
communicate between WIs deployed across multiple die hence showing long range 
capability. The issue with the mm-wave technology is that the bandwidth of the wireless 
channel is limited by the transceiver design. 
Another bottleneck for WIs is that the size of the antennas and transceivers. The antennas 
implemented in the system needs to provide the best power gain with the least amount of physical 
7 
 
overhead. The metal zigzag antenna has been shown to fulfil both the aforementioned requirements 
[23].  
The next challenges in a wireless interconnect system is to develop an efficient wireless channel 
access mechanism between all the wireless router in the system. It is possible to utilize multiple 
frequency bands for a one-to-one communication between two WIs but this approach is not 
feasible since large system would require multiple frequency channel and multiple transceiver for 
each router which makes the design very inefficient. Thus, a MAC based mechanisms are used to 
efficiently allocate wireless bandwidth between all the wireless routers in the system. As mention 
in previous subsections, the use of EM waves allows for multiple signals to be multiplexed into a 
single wireless channel using multiplexing techniques such as TDMA and CDMA. Recent 
researches have shown to successfully implement simple and distributed MAC mechanism such 
as the ALOHA [24], carrier sense multiple access (CSMA) [25], Token based TDMA [26] and 
CDMA [13], to just name a few. It was also found out that the token-based MAC mechanisms 
allow for smaller structural overhead while maintaining fairness in the channel access [25]. In this 
thesis work, both token-passing based and the orthogonal code-based (which is the principle of 
CDMA) communication MAC protocols will be utilized for the developed Hybrid WiNoC system. 
Thus, making the system have a twofold communication protocol, each having its own mode of 
operation based on the current state of the WiNoc system. 
The principal of token passing mechanism is to organize all the WIs in the system into a virtual 
ring. The token is passed from one router to the next wirelessly as a token packet. Each token 
packet contains the necessary information for a router to access the wireless channel and transmit 
a predetermined data packet to the destination router. Once the transmission is complete the token 
packet is updated and send to the next the router for it to gain access to the wireless channel and 
8 
 
transmit data. It is important that such control packets and data packets are distinguishable to the 
system. In this work the token passing protocol will be used during the normal operation mode, 
when core data packet from one tile needs to be transmitted to another tile and vice versa. From 
this point onwards, this token-based data transmission operation mode will be called as “data 
mode”. 
In a wireless communication system orthogonal code-based MAC protocol is used for multiple 
access, where several WIs can transmit information over a single communication channel without 
any centralized control or arbitration. This kind encoding technique exploits mathematical 
properties of orthogonality between vectors representing data strings. Using the principle of 
orthogonal-based coding each transmitting WI encodes its data bits using a unique keyword 
consisting of multiple code bits called the chip code or chipping code. Each code is orthogonal to 
the other codes such that the cross-correlation between different codeword is zero. By doing so the 
interference between transmissions from different wireless transceivers is eliminated since each 
wireless transceiver has a different chip code assigned to it. In this work orthogonal code-based 
operation mode will be used to broadcast network congestion information as control data to all the 
WIs in the system, so that the routing scheme can utilize this control data for a more efficient 
routing path for core data during data mode. From this point onwards, this orthogonal code-based 
control data broadcast will be as “control mode”. The separation of the control mode and the data 
mode will be further investigated in the next section in details, since the novelty of this work highly 
depends on it. 
Lastly the designed WiNoC architecture needs to satisfy the traffic need of the MPSoC system 
while reducing the overall energy consumption of the system since the wireless system enables 
communication at lower number of hops than a wired system. It also needs to be established that 
9 
 
the performance of these WiNoCs depends on the fault tolerance of the system in the case of a 
failing wireless or wired path. 
1.3: Significance of Routing schemes, communication protocols and selection 
strategy in NoCs 
From the previous section it is seen that the performance of the WiNoC is also dependent on the 
application running on the MPSoC. This statement is true for all NoCs since the application 
running on the system is responsible for traffic distribution within the system. Some application 
causes heavy traffic in the system which in turn creates lots of congestion in the NoC which results 
in the decrease of performance in the system. The effects of congestion can be alleviated using 
emerging interconnect technology as discussed in the previous subsections. But to keep the system 
mostly congestion free and more importantly deadlock and livelock free, the routing of the traffic 
within the NoC has to be implemented. 
Routing of packets within the system can be done in either through a deterministic or a non-
deterministic way. Based on this, various researches have looked into various static (deterministic) 
and dynamic/adaptive (non-determinstic) routing algorithms. Few static routing algorithms that 
has been looked into the past includes:  XY routing, Table based routing, etc. This kind of routing 
allows for simple router design but are not efficient at heavy traffic loads or in large MPSoC 
system. To overcome the issue of the heavy traffic load and large MPSoC systems the researchers 
have looked into non-deterministic routing algorithms such as DYAD [27], DyXY [28], Odd Even 
[29] and many more. The main advantage of nondeterministic routing over deterministic routing 
is that non-deterministic can “adapts” to the network traffic load and “dynamically” adjust the 
route through which the packet has to make sure that congestion in the network is avoided thus 
10 
 
increasing the system performance. The issue with such system is that non-deterministic 
algorithms are more intricate in nature and thus increases the switch complexity in the NoC. This 
work will focus primarily on non-deterministic routing algorithms and will propose a novel routing 
scheme based on existing dynamic routing algorithms.  
1.4:  Fault Tolerance in NoCs 
As feature size of the integrated circuits are decreasing, the reliability of such nanoscale devices 
are becoming a significant issues. Failing links in NoCs reduces the quality of service in the system 
and hence research has focused into developing fault tolerant NoC to alleviate the issues. Faults in 
a NoC can be either permanent, transient or intermittent in nature. Permanent fault arises due to 
failing links due to electromigration and other physical damage or defects. Transient and 
intermittent failure occurs due to crosstalk and noise picked up by the links in the interconnect 
network. If this faults are not recognized by the system the overall system performance degrades 
significantly and may lead to bigger failures in the system. Thus in order for a NoC to be resilient 
the underlying NoC architecture must account for reliability issues seen in the NoC systems. In 
this thesis work the proposed two-state hybrid WiNoC utilizes routing algorithm which is able to 
account for non-functioning WIs in the system and then choose the most optimized path in order 
to avoid any kind of congestions due to faulty WIs in the system. The detail of the work will be 
seen in the coming chapter (chapter 4) in the thesis book. 
1.5:  Contributions of this thesis work 
The motivation for this thesis work was to develop a fault tolerant wireless NoC framework which 
is also congestion aware. This work proposes a novel dynamic fault-tolerant wireless interconnect 
framework and shows that its implementation improves system performance over traditional wired 
mesh NoC and also some hybrid NoC based MPSoCs.  
11 
 
The contribution of this thesis work is summarized below: 
Firstly, in this work a novel routing scheme is presented for mesh based NoCs with a wireless 
interconnect framework on top of it, to create a WI based hybrid NoC. The work will first show 
how the wireless interconnect framework is first implemented on top of a wired mesh architecture. 
The novelty of the main routing scheme of the paper is based on how the wireless component of 
the network communicates with each other in two different modes, which are explained briefly 
below: 
a. The first operation mode is based on token-based operation principal (as discussed in the 
previous subsection) where core data packets will be transferred between two 
communicating routers, when the transmitting router is holding the channel access token. 
For the entirety of the thesis work we will address this operation mode as the “data mode”.  
b. The second operation mode is based on the orthogonal code-based operation principal (as 
discussed in the previous subsection). Unlike data mode where there is a one to one 
communication between two communicating routers, the orthogonal code-based operation 
mode will be used to broadcast the status of all the wireless routers in the system. This 
operation will occur at the end of each data mode cycle so that the next cycle of data mode 
can determine the best possible (least number of hops) path for the data packets to travel 
based on the traffic congestion in the system. For the entirety of the thesis work we will 
address this operation mode as the “control mode”. 
Secondly, this work will show the switching technique between these two modes and will also 
show that the global traffic information gathered during the control mode will be used to select 
between the wired and wireless routers and create the most optimized path for the data packet to 
traverse through. 
12 
 
Finally, the performance of the proposed system will be evaluated in an academic simulator, where 
the performance will be evaluated based on different system size, traffic level and fault tolerance 
and how it effects the throughput, latency and energy consumption of such system compared to 
traditional wired mesh NoCs and a default WiNoC system. 
1.6: Thesis organization 
The thesis is organized in 5 chapters. This chapter introduces the challenges of recent multicore 
system and discusses the emerging technology to solve them. And based on this problem statement 
we propose a novel two-state MAC based hybrid WiNoC design. Chapter two gives a background 
on the current state of the knowledge and discusses research work related to this thesis work. 
Chapter three will present the proposed two-state MAC based hybrid WiNoC design and its 
operation and furthermore its performance will be evaluated under various traffic condition. 
Chapter 4 will show the fault tolerance study for the proposed design, and based on network level 
simulation results the fault tolerance capability of the system will be discussed.  Finally, chapter 
five will summarize the important conclusions and will point out the direction of future research. 
 
 
 
 
 
 
 
 
 
 
13 
 
Chapter 2: RELATED WORKS 
The first idea of using NoC instead of using design specific global wires was first seen in research 
such as presented by Dally et al. [31]. This work showed that general purpose on-chip 
interconnection network can successfully replace traditional bus-based communication protocol 
which can be still be seen in various MPSoC to these days. The main advantages of using NoC 
based communication over bus-based systems include: (1) Modular design: Since the design of 
the interconnection network is not ad-hoc in nature (like bus-based systems), the network on chip 
layer of the system can be designed and tested independently of the module being attached to it. 
This allows for design reusability and also decreases the overall design and testing times for the 
MPSoC (2) Concurrent communication: Unlike many bus-based systems where a single wired 
backbone is used to communicate data over various modules, the NoC architecture allows data to 
travel through alternative routes if a certain link in the network is busy. (3) Reduced latency and 
higher bandwidth: due to the modularity and multiple communication path the NoC architecture 
allows for an overall reduced latency and higher bandwidth since, congestions are better handled 
in a NoC architecture than a bus-based system. Further research into this NoC architecture has 
shown that the underlying interconnection network can be made more efficient in design by 
redesigning the traditional mesh network into other network topologies.  
Besides mesh, various network topology has been proposed since the beginning of the NoC 
concept. A work presented by Pande et al. has evaluated the performance of various other wired 
NoC topology such as SPIN, CLICHÉ, Torus, Octagon and BFT [7]. Each of these topologies 
showed various level of performance improvement based on the number of virtual channels in the 
network routers and also the amount of load injected into the system. But up until now all the 
topologies mentioned above consists of wired links, which at the end of the work don’t address 
14 
 
the problems that are seen in wires when they are used to transmit data. According to work 
presented by Ho et al. - “Future of wires in integrated circuit technologies appears grim” [31] since 
a majority of the delay caused in semiconductor devices are seen to be in the metallic 
interconnections seen in them. There are three major electrical characteristics that effect the delay 
that is seen metallic wires and they are (1) resistance, (2) capacitance and (3) inductance. The 
paper creates electrical and models and show how each of these electrical parameters exacerbate 
the issue of delay with changing wire dimension. Based on the previous statement it was seen that 
as the network size was scaling the effect of delay in the wires were becoming noticeable. Besides 
this issue the paper also discuss how signal coupling and crosstalk between wires causes data 
signals to become weaker or even lost! One of the proposed solutions was to insert repeaters in 
intervals between wire links but this in turn increases the power consumption of the system and 
also increases the overhead costs.  
Another issue seen in the traditional wired based NoC is that the communication between distant 
nodes requires multiple hops through a long-wired line in the interconnection network. Multi-hop 
communication increases the system latency and also increases the power consumption. Research 
done by Ogras et al. proposed a “small-world” architecture [8] which showed that the performance 
of wired mesh-based system can be improved by inserting long range wires between two distant 
nodes based on pre-design calculations. Thus, communication between these two distant nodes 
can be done via this long-range wire in one-hop instead of multiple hops. But the issue that still 
remains is that the long-range links in the system are still physical metallic wires and hence suffer 
from the same issues of long wires as discussed above. Based on this observations research has 
looked into alternative methods to using wires for long links and this is where NoC with emerging 
technology came into the picture. 
15 
 
Various emerging technology in regards to NoC architecture have been discussed in the previous 
section. It was also established (with reasons) why the Wireless NoC was chosen for this work and 
this section will focus on previous research work that has been done with regard to WiNoC 
architecture. One of the initial works in WiNoC architecture were presented by Ganguly et al. [32]. 
In this work the paper shows that the multi-hop communication in traditional wired-based NoC 
can be replaced by the insertion of WIs within the existing wired NoC. This would allow for long 
distance communication via WIs in a single hop instead of multi-hop communication via wired 
links. The paper goes into details about the optimal number of wireless link insertion based on the 
network size and also evaluates the performance of hybrid network with different number of 
wireless links in the systems. Similar to this work Deb et al. presented various challenges and 
solutions to designing efficient and reliable WiNoC architectures. Both WiNoC research discussed 
above had shown that the long-distance communication between two distant nodes can be reduced 
to a single hop communication using WiNoC. It was also seen that the one-hop communication in 
WiNoC architectures were more efficient than the methodology proposed by Ogras et al. [8]. This 
is due to the fact that the WiNoC architectures don’t have any physical long-distant wired links 
and hence the effect of wire resistance, capacitance and inductance is taken out of the equation.  
Next criteria in the successful implementation of the WiNoC architecture is to come up with the 
communication protocol between the WIs. As discussed before various researches have 
implemented various efficient MAC schemes in order to utilize the limited channel bandwidth of 
the WIs. Token based communication has been proposed in [7, 19]. The issue with such system is 
that, if a WI in the system fails, the token passing mechanism might become inefficient as the 
token slot allocated for the faulty node is not being utilized for any wireless communication. Other 
research such as [33] have used orthogonal CDMA as a multiple access mechanism to enable 
16 
 
simultaneous transmission of the data. The issue in such design is that, orthogonal-code based 
operation decreases the effective bandwidth of the wireless channel and thus, even though data can 
be sent concurrently to multiple WIs the amount of data sent to each WI is reduced.  
One challenge that needs to be addressed irrespective of the designed NoC being wired or wireless, 
is the deadlock and livelock avoidance.  In order to avoid congestion and maximum utilization of 
the network routing schemes are adopted for the NoC architecture. But as mention before this 
routing schemes need to route data in such a way that there is no deadlock or livelock in the system. 
From the previous section we have seen that routing schemes in NoC based systems can be either 
deterministic or non-deterministic and it was also established that the non-deterministic algorithms 
are able to adapt and also dynamically route packets at the cost of higher design complexity. Since 
this thesis work focuses on non-deterministic routing schemes, previous research related to the 
development non-deterministic routing schemes will be discussed. 
Different non-deterministic routing schemes employs different methodologies to make the routing 
in the system dynamic in nature. Non-deterministic routing algorithm such as DyXY [28], Turn-
Model Odd-Even [29] and DyAD [27] utilizes information based on local traffic congestion to 
select the best path for packet routing. Deadlock is avoided in such system by restricting certain 
turns in the available routing paths.  The issue with such system is that local congestion awareness 
does not guarantee that the path ahead will not be congested since the amount of traffic flow to 
certain portion of the network depends on the application that is currently being executed in the 
system and hence situation may arise the data packet is not taking the most optimized path. Other 
adaptive routing schemes such as Hot Potato [34] and Deflection routing [35], routes packet to an 
output channel regardless of the fact that if routing to that direction will reduce the distance 
between the current location of the packet and the destination. But issues with such schemes is 
17 
 
they will not be always livelock free and hence may increase the system’s latency. Hence, 
researchers have looked into regional congestion awareness-based routing schemes as proposed in 
such works [36 - 38]. Instead of solely relying on the local traffic congestion level, these routing 
schemes also monitor global traffic conditions and based on that the routing decisions are made. 
The global congestion information is maintained by aggregating local traffic information with 
previous congestion level information that is sent with the packet at each hop as the packet traverse 
through the network. The issue with such schemes is that, the resolution of the congestion 
information is quite poor since previous router may no longer be congested and thus reducing the 
performance. Work done by Ramakrishna et al. [39] has shown that timely and complete 
congestion of the whole network can be done by per-hop lookahead routing. This technique 
introduces a new field in the header flit of a data packet called the “traffic vector”. This field 
consists of previous congestion information which is stored in a local congestion map and updated 
at each hop based on the information brought in by the incoming data packets. With the help of a 
pre-route table and the updated congestion map the system computes the most optimized path for 
the packet at every hop. The issue with such system is that the size of the “traffic vector” field will 
increase in size as the system size is increasing. Furthermore, since the study was done for a wired 
mesh system large flit size would increase energy consumption and would not be able to adapt if 
one of the links fail, since the final routing is based on the fixed pre-route table. None of the routing 
algorithm mentioned above have considered Hybrid NoC architecture where single hop 
communications can occur with the help of WIs. In such cases if a data packet sees that a single 
hop communication path exist the data packets will always try to access this WIs and thus creating 
an artificial traffic congestion in those WIs which can further exacerbate the traffic congestion in 
the network, even at low load. Thus, this work will first present a novel hybrid WiNoC design with 
18 
 
dual operation mode and then based on that design, give a novel routing scheme which is inspired 
by existing congestion aware routing schemes but without the drawbacks that can be seen in the 
traditional wired based mesh NoC. 
Since dynamic routing schemes are able to route packet based on network traffic condition, 
different research have proposed dynamic routing schemes to avoid faulty links in the networks. 
Most research focuses on fault tolerance my modelling the faults in the wired links [40 - 43] and 
some of this fault tolerant routing schemes can be seen for emerging NoCs with interconnects 
developed using emerging technologies such as 3D-NoCs [44], Photonic [45] and even Wireless 
interconnect [46, 47]. In this thesis the proposed routing scheme will be shown to have the ability 
to detect such faulty WI structure and the system will avoid them in order to maintain a high 
throughput and prevent packet loss in the system. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19 
 
Chapter 3: SYSTEM ARCHITECTURE 
In this chapter, we begin the discussion of the NoC design with the dynamic wireless 
interconnection framework. Next, the proposed wireless antenna and transceiver circuit design will 
be discussed and based on the proposed transceiver design the proposed two-mode MAC protocol 
and its switching will be discussed. After the two-mode MAC protocol is established the the 
routing scheme and the router architecture will be discussed. Finally based on the proposed routing 
scheme for data mode, various routing scenarios will be explained with the help of some example 
operations. 
3.1: Proposed WiNoC topology and design 
 
Figure 1: Proposed 8x8 hybrid WiNoC Framework 
20 
 
 Figure 1 shows the proposed hybrid NoC topology that will also be used to build upon for further 
architectural discussions from this point forward. From the figure it can be seen that the proposed 
system consists of an underlying 8x8 wired mesh with 64 core tiles in total. Each tile is labelled 
with an X and Y co-ordinates starting with the bottom left corner. From the figure it can also be 
seen that the entire system has been divided into 16 sub-networks (subnets) each consisting of 4 
core tiles as shown in the figure. Each of this subnet consist of a single wireless hub that is shared 
between the 4 core tiles in the subnet. The wireless hubs act as the wireless interfaces for the 
wireless communication between the wireless hubs in each subnet. 
As seen in figure 1 the wireless interconnection framework is formed my connecting the core tiles 
in the subnet to the central hub through traditional metal wires. Previous research work [32] on 
wireless link insertion and optimization has shown an efficient scaling technique for large system 
sizes. It was seen from that work [32], that for a given system size, increasing the number of 
subnets and at the same time keeping the inter-subnet distances minimal will give the best 
performance for the wired path in the WiNoC system. Based on this finding, this thesis work 
proposes a subnet size of 4 core tiles per subnet. Thus, for a 64-core system the total number of 
subnets is going to be 16. In this thesis work the maximum system size that will be considered is 
a 100-core system (10x10 mesh), this is because similar to other global congestion aware systems 
[39, 48, and 49] the proposed system has scalability issues for larger system size which will be 
discussed in detail in the later section. 
21 
 
 
A: Input and output VCs to and 
from the east neighbor 
B: Input and output VCs to and 
from the north neighbor 
C: Input and output VCs to and 
from the west neighbor 
D: Input and output VCs to and 
from the south neighbor 
E: Input and output VCs to and 
from the local wireless Hub-10 
F: Input and output VCs to and 
from Tile-36 
G: Input and output buffers to and 
from other wireless hubs 
Figure 2: Proposed subnet architecture 
 Figure 2 shows a zoomed in top-level view of view of the subnet-10 in figure 1. This figure shows 
the arrangement between core tile and the wireless interface and how they are connected with each 
other. The diagram also shows the various VC buffers of the core tiles and the hub, and from the 
figure it can be seen that each core tile has multiple input and output VC buffers for each direction 
the core tile is connected to. The figure does not show the local processing element or its VC buffer 
since they have to be present by default with each processing elements in the core tile. The VC 
buffer plays an important role in the selection and routing of data packets in the system and number 
and the size of each of these buffers will be discussed in detail further down this section. The 
processing elements in the NoC is based on the type of application for which the NoC will be 
implemented for and it can be either CPU, GPU, or DSP units or a combination of them. In this 
thesis work all of the processing elements in the system is going to be considered as identical CPU 
cores, in order to make a large multi-core CPU environment which can be seen in the researches 
mentioned in section 1 [3 - 6]. This assumption makes the system more symmetric in nature and 
22 
 
also makes the calculation for the number of clock cycle required for data to processed and 
transmitted more predictable. 
3.2:  Wireless interface physical layer 
In this section we will look into the design of the proposed antenna and the transceiver for the 
hybrid WiNoC system presented in this thesis work. As discussed in Chapter 1 on-chip antennas 
are required to establish links between the wireless hubs in the proposed hybrid NoC system. 
Furthermore, the proposed on-chip antenna has to provide the maximum power gain with the least 
amount of area overhead. Previously it was seen that various research has designed and effectively 
implemented different kinds of antenna for their WiNoC research. Out of the three antenna designs 
discussed above, the proposed hybrid NoC system for this thesis work is going to consider the use 
of mm-wave antenna in all of the wireless hubs. Research done in [17 - 19] has shown that the use 
of zig-zag mm-wave antennas with non-coherent OOK modulation scheme in the transceiver 
shows the best performance for current CMOS technology in terms of reliability, throughput and 
energy efficiency. Furthermore, the mm-wave has a transmission range of 20 mm which means 
that long range communication within the chip will not be an issue. 
 
Figure 3: proposed zig-zag antenna placement on the die [52] 
A. On-chip antennas: The metal zig-zag antenna which was shown to provide the best power 
gain with the smallest area overhead in previous researches [18, 19, 50 - 53]. Based on 
23 
 
those criteria, this thesis work utilizes the same antenna design as done in previous work 
[52] since it adopts an on-chip mm-wave zig-zag antenna tuned to 60 GHz operating 
frequency with a bandwidth of 16 GHz. As shown in figure 3 the on-chip mm-wav zig-zag 
antenna is based on the co-planar feed structure as it has low-losses compared to other feed 
structures such as microstrip. Furthermore, this type of antenna was seen to be non-
directional [53] which makes the wireless medium in the system a shared channel. 
 
(a) 
 
(b) 
Figure 4 (a) Transmitter (b) receiver block diagram 
B. Wireless Transceiver Circuit:  For the system to have a high throughput and low energy 
with the least bit error rate, this thesis work adopts the non-coherent on-off keying (OOK) 
modulated transceiver design to go with the above proposed on-chip antenna. Figure 4(a) 
and (b) shows the proposed transmitter and receiver circuit block diagram respectively, 
that will be used for the wireless communication in this thesis work. 
As mentioned in section one, this thesis work utilizes a two state MAC operation based on 
whether the WIs are trying to communicate core or control data. Considering the different 
operation modes this thesis work adopts the transceiver design from [54, 55] for the data 
24 
 
mode. In addition to the OOK modulator in [54] and the demodulator in [55], a orthogonal-
code encoder and decoder is added to the design to support the congestion information 
(control data) transmission during the control mode. 
In control mode, the congestion data bits are first encoded by XORing it with a transmitter-
specific code word. In this thesis work a Walsh code-based communication is adopted for 
the control messages during the control mode. The encoded control data or the unencoded 
core data (during data mode) is then modulated with the 60 GHz carrier generated by the 
Voltage Control Oscillator (VCO) by an OOK modulator and then the resulting signal is 
then amplified using the power amplifier (PA). Once the signal has been amplified the 
resulting signal is then coupled to the on-chip antenna, to be transmitted to the destination 
WI(s). Furthermore, the wireless channel is assumed to be an additive multipath channel 
which means that individual transmission encoded into different codes are added over the 
channel. 
On the receiver side, the received signal is first amplified by the Low Noise Amplifier 
(LNA), then this signal is sent to the Envelop Detector (ED), which will strip of the actual 
signal from the carrier frequency signal. The core signal is then amplified by a base-band 
amplifier (BA). Based on the operation mode the next step for the incoming signal will be 
decided. If the data transmitted was during the data mode then no further action is required 
and the received data can be transmitted to the destination core tile, thus only employing 
the first part of the receiver circuit boxed in grey as shown in figure 4(b). But if the data 
was transmitted during control mode the received data are from all other WIs in the system, 
thus the receiver needs to have additional decoders for every transmitter-specific code-
channel. Therefore, in receiver side of the transceiver, the output of the OOK demodulator 
25 
 
is further sent to a code decoder. An Analog-to-Digital Converter (ADC) converts the 
received envelop from the additive multi-path channel into digital signals. Then, the signal 
is correlated with each code word from the code book to create separate receiving channels 
corresponding to every code word. The digital signal enables the adoption of a digital 
correlator receiver that accumulates and compares the positive and the negative part of the 
received symbols to compute the received digit for each channel [56]. Since all the 
transmitter in the system has its own predetermined code word, a single receiver can 
receive data from multiple transmitter simultaneously. A Power Gating (PG) cell, 
controlled by the “Done” signal generated based on the operation mode (see next 
subsection for more details), separates the receiver circuitry and the orthogonal-code 
decoder circuits. The PG cell selectively turns on and off the part of the receiver based on 
the operation mode. Furthermore, the PG cell also helps to improve power efficiency of 
the transceivers in the WiNoC system. 
Lastly it is shown in [54, 55] that such OOK modulator –demodulator design achieves a 
very high spectral efficiency over the 60 GHz carrier, providing a physical data rate of 16 
Gbps in a point-to-point link at total energy consumption of 2.075pJ/bit. The signal-to-
noise (SNR) for these wireless links are given by: 
SNR = PT – PL – Nf (1)  
Where PT is the transmitted power, PL is the path loss and Nf is the noise floor of the 
receiver (all in decibels (dB) units). However, as noted in [57], the Bit Error Rate (BER) 
of such chip-to-chip wireless interconnects is governed by Inter-Symbol Interference (ISI) 
due to the high-speed transceivers and antennas being bandwidth limited and is 10-15 for a 
PT of -0.5dBm. 
26 
 
3.3: Operation modes 
 
Figure 5 (a) Control packet (b) State diagram 
As mentioned in the earlier sections the proposed WiNoC system consists of two operation modes 
each consisting of its own MAC protocol. In this subsection, the switching between these two 
protocols will be discussed in details. 
The “Data Mode” as mentioned in the previous section is the mode when the WiNoC system is 
utilizing both its wired and wireless medium to communicate core data with other core tiles. From 
figure 2 it can be seen that each subnet consists of a single WI and four core tiles and all of the 
modules are connected using wireline links and bidirectional ports. For the wireline links, a 
wormhole switching technique is adopted where data packets are broken down into flow control 
units or flits [58]. The wormhole switching was chosen because it provides low buffering 
requirements and high network utilization through the use of VCs. For the wireless links, the same 
wormhole switching principle is used but with a modified flow control which will discussed in the 
next subsection. For the wireless communication in data mode, the token-based MAC protocol is 
27 
 
used to establish on-to-one communication between the source WI (who is holding the token) and 
the destination WI. 
The “Control Mode” as mentioned in the previous section is the mode when the WiNoC system is 
utilizing only the wireless medium to communicate the network congestion information in the 
system. Adopting a token-based MAC protocol similar to that of that used in the data mode will 
introduce overheads and reduce effective bandwidth for the data transfer. Since one of the goals 
of the thesis is to create a system which has global traffic awareness, each WI needs to share its 
local congestion information with all the other WIs by broadcasting it at the same time. Therefore, 
in this thesis the orthogonal-code based MAC protocol is used, which is capable of supporting 
multiple broadcast transmission simultaneously for such control message transmission. The 
transmitter and receiver designs of such systems was discussed in the previous subsection. During 
the data mode the control data is XORed with the code word unique to that WI and then transmitted 
simultaneously to all the other WIs in the system. The issue with using orthogonal-code based 
encoding technique is that as the size of such system increases the effective wireless bandwidth is 
reduced and the average packet latency increase with the increase in code length. The number of 
cycles for such broadcasting transmission can be given by the general equation: 
T = N * V * Fclk / G (2) 
Here, T is the number of cycles for transmission, N is the total number of WIs in the system, V is 
the control message length in bits, Fclk is the clock frequency of the system in GHz and G is the 
aggregate wireless bandwidth in Gbps. Furthermore, the control packet for such system has to be 
as small as possible in order to maintain a lower packet latency and use the limited bandwidth of 
16 Gbps effectively. 
28 
 
From the previous chapters and subsection it has been established that in “Control Mode” the 
network congestion information is broadcasted using only the wireless links with the help of 
orthogonal-code based MAC policy. Each control packet contains VC status information for the 
transmitting WI. The broadcasted congestion information needs to communicate the following 
information to establish an effective global congestion awareness: (a) The address of the WIs, (b) 
The VC buffer status of each WI, and (c) The address of the free VC buffer in each WI. Since the 
information needs to be broadcasted using orthogonal-code based encoding technique, the control 
packet for such system has to be as small as possible, in order to maintain a low latency. For such 
reason the address of the WIs will not be broadcasted since in orthogonal-code based MAC policy 
assigns a unique communication channel for each WI. Thus each WI can be assumed to be aware 
of the identity of the other WIs in the system by associating the channel to a particular WI.     
Figure 5(a) shows the control packet that each transmitter broadcast during control mode. The first 
field of the control packet contain the VC status bit which is a 1-bit field. When the VC status bit 
is high, it signifies that the WI from which the corresponding control packet came from has no 
empty VC buffer and in the next data mode the WI corresponding to that control packet will not 
be used to transmit core data wirelessly from other WIs in the system. If the VC status bit is low 
it means that there is one or multiple free VC buffers available for wireless transmission in data 
mode, the second part of the control packet which contains the VC address will be used as the 
destination buffer for the wireless transmission in the data mode. For this thesis work, the designed 
hybrid WiNoC system has 4 VCs per input and thus only take 2 bits to represent the VC address. 
If multiple VCs are empty, the VC having the lowest address value is considered. Even though 
multiple VCs can be free for a given WI, the control packet will only consider one VC in order to 
maintain a low control packet length since, larger packet size will deter the system performance. 
29 
 
Thus, based on the control packet structure and the equation (2) described above, it can be inferred 
that for a system shown in figure 1 with 16 WIs, running at 1 GHz frequency, the data mode would 
require 3 cycles to transmit all the control information with all the WIs in the system and then 
return to the data mode for normal one-to-one data transmission. Since the two modes utilizes two 
different MAC protocol, the two state MAC operation can be realized into a hybrid MAC protocol 
controlled using a “DONE” signal as shown in figure 5(b). 
In data mode the token-based MAC protocol is used to establish a one-to-one communication 
between the source WI (the token holder) and the destination WI. The token period Tp in this data 
mode is defined as the round-trip time for a WI to get the token back once it finishes its 
transmission. Once a WI finishes it transmission after an epoch of Y cycles (where Y = Tp/N), it 
passes the token to the next Wi and then the done signal goes high which marks the transition of 
the system from the data mode to control mode. Once in the control mode all the WIs updates and 
share its VC buffer status with all other WIs in the system for T cycles (given by equation (2)) 
using the orthogonal-code based MAC protocol. Once the broadcast is completed the done signal 
goes low and the system once again transitions to data mode. The amount of time the “DONE” 
signal is in data mode depends on the data packet length the source WI is trying to transmit. Thus, 
the “DONE” signal can be can be thought of as a logical OR implementation of a cycle and flit 
counter in the data mode whereas in the control mode, it resets purely based on the cycle count 
given by equation (2). To keep the simulation more predictable and also ensure the system is 
robust, this thesis work will consider the packet size to be fixed in size and the buffer depth for 
each VC equal to the packet size in terms of flits per packet. In doing so, the proposed hybrid MAC 
protocol also ensures no flits are dropped for partial packet transmission over the wireless medium. 
 
30 
 
3.4: Routing scheme and controller design 
 
Figure 6: the routing scheme flowchart 
In a hybrid WiNoC system such as the one presented in this work, a contention free routing scheme 
must be developed in order to maintain high network utilization by using both the wired and the 
wireless path. Without the use of proper contention free routing algorithm data packets will always 
try use wireless path, since they provide the opportunity for one hop communication between a 
31 
 
source and destination core tiles. This causes the wireless line to be heavily congested even in low 
traffic applications and the wired paths remain underutilized. For this reason, this thesis work 
proposes a dynamic routing scheme for the data mode which is based on existing routing scheme 
and also with the help of the network traffic information gained during the control mode create to 
create a load balancing routing scheme which utilizes both the wired and the wireless path in the 
proposed hybrid WiNoC. 
Figure 6 shows the proposed routing scheme flow which will be used during the data mode for the 
proposed hybrid WiNoC. Once a data packet is generated by the processing element, the core data 
is packetized and a header flit is added to the data packet. In the header flit the current and the 
destination address for the message is assigned. Each current and destination address is divided 
into two parts. The first part of the message contains the subnet address which is common to all 
the core tiles in a given subnet and the second part contains the address of a specific core tile in 
the subnet where the message needs to be sent. Based on the proposed routing scheme, if the 
current subnet address of the message is equal to the message’s destination subnet address, the 
message is being transmitted between two core tiles in the same subnet and the wired path will be 
selected for its transmission. In the proposed hybrid WiNoC all inter-subnet communication will 
be done via the wired links since the Hybrid WiNoC allows one hop communication via wired 
links between the core tiles and the shared central WI (as shown in figure 1 and 2). 
If the message’s current subnet address is not equal to the destination subnet address, a second 
stage calculation is done by the router to determine the most optimized path for the packet based 
on the global traffic congestion map which contains the VC status of all the WIs in the system. 
The VC status of all the WIs are updated during the control mode and based on this and the routers 
local congestion information the router calculates which of the path will reduce the Manhattan 
32 
 
distance between the current and the destination core tile. If the router sees that the destination WI 
has free VC buffer then the message will be sent via wireless path in a one hop communication. If 
the destination WI VC buffer is not free, the router first calculates the worst-case Manhattan 
distance which is the maximum number of hops required if the message takes only wired path 
from the current tile to the destination tile. In the next stage the router checks if transmitting the 
message to any nearby WI reduces the distance, such that the effective number of hops are reduced 
from the intermediary WI to the destination. If there is such an intermediary WI, the message is 
sent to that intermediary WI and the current router address is updated and the done signal goes 
high to start the control mode. If no such intermediary WI is found the router transmit the packet 
via the wired link to the next core tile updating its current address and trying again for the WI in 
the next data mode after the control mode for this current cycle completes. All this routing scheme 
calculations are done in the individual router connected to each core tile in the proposed hybrid 
WiNoC system. Furthermore, a more detailed routing scheme and the selection strategy for the 
data packets will be explained with the help of example scenarios in the coming subsection. 
33 
 
 
Figure 7: Block diagram for the router architecture. 
The routing scheme described above utilizes the router at each core tile which carries out the non-
trivial task of finding out the most optimized path for a data packet to take during the data mode. 
Figure 7 shows the block diagram of the router architecture which needs to be implemented for 
the proposed hybrid WiNoC in this thesis work. From the figure it can be seen that each tile has 
an input and an output buffer space for each direction the router needs transmit the data. When a 
new header flit is seen, the address decoder decodes the destination address and sends it to the 
router controller which controls all the input and output port access. Here in the router controller 
the routing scheme algorithm is implemented in hardware and based on the congestion information 
from the neighboring router’s port controller and the global congestion map, the router controller 
34 
 
makes it decision on which direction the packet needs to be routed. The port controller then sends 
a connection request to the crossbar arbiter in order to set up the path to the corresponding output 
port for the data packet to take. Another reason for the crossbar arbiter is that to ensure that all 
input buffers from each direction gets equal access opportunity to the router controller based on a 
first come first serve basis. During control mode the crossbar arbiter stops any access to the output 
ports and wireless a separate wireless channel is used to broadcast and receive the global traffic 
congestion information and update its own global WI congestion map. Once the global WI 
congestion map is updated the system goes transitions to the data mode again and once again the 
crossbar arbiter and the router controller resumes its normal operations.  
3.5: Example operation 
To have a better understanding of the routing scheme and the selection strategy for the 
communication in the data mode, this subsection is going to present some case scenario based on 
which the router will mathematically calculate and logically decide which path will be most 
optimized for the data packet to take during the data mode. Since the proposed hybrid WiNoC has 
a mesh-based architecture, all the core tile can be assumed that they are equally far apart and each 
of their location can be coordinated as shown in figure 1. Since for a given system size the 
coordinates of the core tile are fixed, the router at each tile then have the wired path only Manhattan 
distance information from its own tile to all the corresponding core tiles in the system. Using this 
information and also the network traffic information the router will decide on the most optimized 
path based on the algorithm explained above. 
The Manhattan distance between two points is defined as the sum of the horizontal and vertical 
distance between those two points. Mathematically it can be defined as: 
35 
 
d (S, D) = |Sx - Dx| + |Sy - Dy| (3) 
Where, (Sx, Sy) = x and y coordinate of the source tile; (Dx, Dy) = x and y coordinate of the 
destination tile and “d” = the Manhattan distance. 
The Manhattan distance in the WiNoC architecture will be represented as number of hops from 
one tile to the next. Furthermore, one hop is counted for data packet moving between a core tile 
and the WI in the center of each subnet.  
Case Scenario 1: Source and destination are in the same subnet 
 
Figure 8: network condition for the case scenario 1 
For scenario-1 (as shown in figure 8), both the source and the destination are in the same subnet 
and based on the proposed routing scheme the data packet will take the wired path regardless of 
the condition of the source VC buffer status. Since the WI is in the center of the subnet, diagonal 
wired path between the source and the destination can be created via the WI. Therefore, based on 
the proposed routing scheme any intra-subnet communication can be done in 1 hop via the wired 
path. 
 
 
36 
 
Case Scenario 2: Source and destination are in different subnets but are adjacent routers 
OR diagonal communication 
 
 
Figure 9: network condition for the case scenario 2 (a) adjacent (b) diagonal 
For scenario-2 (as shown in figure 9(a)), the source and the destination tile are in different subnet 
but as the router controller first calculates the Manhattan distance for the wired path it sees that it 
will require only one hop whereas using the WI it will take 3 hops (source to source WI then WI 
to WI and finally the destination). The same can be said for diagonal communication (as shown in 
figure 9(b)) where using the wired path takes 2 hops to complete the transaction instead of 3 hops 
(using the WIs). Therefore, based on the proposed routing scheme the router will always take the 
wired path for communication between adjacent routers in adjacent subnets or for diagonal 
communications as shown in figure 9(b). 
 
 
 
 
37 
 
Case Scenario 3: Source and destination are in a different subnet (free destination WI VC 
buffer) 
 
Figure 10: network condition for the case scenario 3 
For scenario-3 (as shown in figure 10), the source and destination are in different subnet and both 
their WIs are available wireless communication. The Manhattan distance between the source and 
the destination via the wired path only will require 6 hops, whereas using the WIs it will take only 
3 hops. Thus, based on the proposed routing scheme the wireless path will be selected for this 
communication. 
 
 
38 
 
Case Scenario 4: Source and destination are in a different subnet (no free destination WI VC 
buffer) 
 
Figure 11: network condition for the case scenario 4 
For scenario-4 (as shown in figure 11), the source WI is available for wireless transmission but the 
destination WI is not available due to no free VC buffers. In this case, according to the proposed 
routing scheme, the next available WI, which will reduce the Manhattan distance to the destination 
tile will be selected. 
From figure 11 it can be seen that the WIs in subnet 7,9,13 and 14 are available for wireless 
transmission but out of those four transferring the data packet to tile position (7,5) will reduce the 
39 
 
most Manhattan distance to the destination tile (7,7). Thus, the proposed routing scheme will select 
the WI in subnet 14 to be the intermediary WI for its wireless communication. 
Lastly, in cases where there is no wireless based shortcuts available or in cases where taking the 
wired path and the wireless path reduces the Manhattan distance by equal number of hops, the 
proposed routing scheme will always choose the wired paths in these scenarios in order to reduce 
congestion in the wireless paths and also balance the traffic loads between the wired and the 
wireless paths.   
3.6: Simulation setup and methodology 
The performance of the proposed interconnect framework will be evaluated based on the average 
packet latency for varying traffic load in the system. The average packet latency is the average 
number of cycles required for each data packet to reach from its source to destination at saturated 
network conditions. All simulations were carried out in a cycle-accurate network simulator called 
Noxim [59], which is developed using systemC (a system level language based on C++). Noxim 
allows performance analysis of both conventional wired NoC and emerging WiNoC architectures 
based on the various network on chip parameters. The reason Noxim was chosen because it allows 
the user to implement and analyze customized routing algorithm and selection strategies such as 
the one presented in this thesis work. 
To evaluate the performance of the proposed design, three different mesh network sizes were 
evaluated with three different traffic patterns and varying PIR. The PIR measures as the number 
of packets per each core tile per cycle (packet/tile/cycle). The network sizes considered for this 
thesis work were 6x6, 8x8 and 10x10 mesh networks each with 36, 64 and 100 core tiles 
40 
 
respectively. The traffic patterns that were simulated consists of uniform random, transpose and 
hotspot traffic. 
The rest of the simulation parameters were kept constant throughout all the experiments. Table I 
shows the summarized list of all the parameters that were kept constant. The energy values for the 
wireless transceiver and codecs are based of designs done in research [13] using the 65nm 
technology node. This energy parameters will then be used to develop the power model in the 
Noxim environment for the simulated WIs. 
Table I: General and wireless configurations for simulation 
Parameters (General) Value 
System clock 1 GHz 
NoC router 3 stage pipelined, 6 ports (including wireless) 
VC number 4 
VC Buffer size 8 flits deep 
Flit width 64 bits 
Packet Size 8 flits  
Wired NoC links 64 bits, single cycle latency, 0.2pJ/bit/mm 
Wired NoC links 64-bit flits, single cycle latency64-bit flits, single 
cycle latency 
  
Parameters (Wireless) Value 
OOK Wireless transceiver 16Gbps, 2.07 pJ/bit, OOK modulated at 60GHz 
Orthogonal codec and ADC 16Gbps, 0.66pJ/bit, OOK modulated with ADC 
and CDMA decoder [13] at 60 GHz 
To and from tiles buffer number 4, each 8 flits deep 
Tx and Rx buffer size 8 flits deep 
The routing and path selection strategy was based on the proposed routing scheme in the previous 
section. The routing scheme and the mac protocol was modeled in the Noxim environment and the 
simulations were executed for 10,000 cycles with the first 1000 cycles eliminated for transient 
synthetic traffic patterns separately. The performance of the proposed two-state hybrid WiNoC 
was compared with traditional wired based mesh NoC and default mesh WiNoC in the Noxim 
environment. The default WiNoC system in Noxim follows a distance-based metric called the 
41 
 
“delta” value to select wireless communication between two communicating tile via the WIs. This 
delta value is fixed and does not allow for the dynamic selection operation that is done in this 
thesis work. Furthermore, the selection of the wireless path is also deterministic that is as long as 
the communicating WIs are separated by a distance greater than the delta value the router in the 
tiles will always allocate the wireless path for such communication and will not consider the buffer 
availability in the WIs. 
According to observations by Tang et al. [60], congestions in network is caused by any one or 
multiple reasons: 
a. Observation 1: Congestion usually occurs at partial nodes in a local network region or 
multiple regions 
b. Observation 2: Some particular communications pairs will be affected by the congestions 
and hence have the longest delays. 
c. Observation 3: The few nodes affected by the congestion greatly contribute to the overall 
global average delay of the system 
d. Observation 4: Under some routing and traffic conditions, throughput of the system will 
first increase and once the saturation point is reached further injection of packets into the 
system will cause the system throughput to remain unchanged or decrease. 
Hence based on these observations, the uniform random traffic, transpose traffic and hotspot traffic 
were selected to be used to observe the behavior of the developed hybrid WiNoC. Furthermore the 
developed two-state hybrid WiNoC (Hy-WiNoC) was tested against a traditional wired mesh NoC 
(Wired) and also a default WiNoC (D-WiNoC) scheme in Noxim (as explained above). 
 
42 
 
3.7: Performance evaluation under Uniform Random Traffic 
In uniform random traffic each node generates traffic for other nodes in the system randomly with 
the same probability. Figure 12 shows the improvement in the global average delay with varying 
packet injection rate for different network sizes where, uniform random traffic was generated and 
injected into the wired, D-WiNoC and Hy-WiNoC systems. Figure 13 shows the improvement in 
the throughput for varying injection rate under uniform random traffic. 
 
(a) 
 
(b) 
0
1000
2000
3000
4000
5000
0 0.01 0.02 0.03 0.04 0.05 0.06
G
lo
b
al
 A
ve
ra
ge
 D
el
ay
 (
C
yc
le
s)
PIR (packet/tile/cycle)
Wired
D-WiNoC
Hy-WiNoC
0
1000
2000
3000
4000
5000
6000
0 0.01 0.02 0.03 0.04 0.05 0.06
G
lo
b
al
 A
ve
rg
ae
 D
el
ay
 (
C
yc
es
)
PIR (packet/tile/cycle)
Wired
D-WiNoC
Hy-WiNoC
43 
 
 
(c) 
Figure 12: Global average delay VS PIR for network sizes (a) 6x6 (b) 8x8 (c) 10x10 under Uniform 
Random Traffic 
 
(a) 
 
(b) 
0
1000
2000
3000
4000
5000
6000
7000
0 0.01 0.02 0.03 0.04 0.05 0.06
G
lo
b
al
 A
ve
ra
ge
 D
el
ay
 (
C
yc
le
s)
PIR (packet/tile/cycle)
Wired
D-WiNoC
Hy-WiNoC
0
5
10
15
20
25
0 0.01 0.02 0.03 0.04 0.05 0.06
Th
ro
u
gh
p
u
t 
(F
lit
s/
C
yc
le
)
PIR (packet/tile/cycle)
Wired
D-WiNoC
Hy-WiNoC
0
5
10
15
20
25
30
0 0.01 0.02 0.03 0.04 0.05 0.06
Th
ro
u
gh
p
u
t 
(F
lit
s/
C
yc
le
)
PIR (packet/tile/cycle)
Wired
D-WiNoC
Hy-WiNoC
44 
 
 
(c) 
Figure 13: Throughput VS PIR for network sizes (a) 6x6 (b) 8x8 (c) 10x10 under Uniform Random 
Traffic 
Based on the presented results in figure 12 and 13, the global average delay decreases from the 
wired mesh to the D-WiNoC and a further decrease is seen from the D-WiNoC to the proposed 
Hy-WiNoC. Similarly the throughput for the various system sizes were increasing from the wired 
mesh to D-WiNoC and from the D-WiNoC to the Hy-WiNoC. For the 6x6 network size, the global 
average delay decreases by 32% from the wired mesh network to the D-WiNoC and from the D-
WiNoC to the proposed Hy-WiNoC, the global average delay decreases by almost 24%. The 
throughput of the D-WiNoC is shown to be almost 1.4 times higher than the wired mesh and a 
further 25% improvement is seen in the throughput from D-WiNoC to the proposed Hy-WiNoC. 
For the 8x8 network size, the global average delay decreases by 10% from the wired mesh network 
to the D-WiNoC and from the D-WiNoC to the proposed Hy-WiNoC, the global average delay 
decreases by almost 11%. The throughput of the D-WiNoC is shown to be almost 1.4 times higher 
than the wired mesh and a further 11% improvement is seen in the throughput from D-WiNoC to 
the proposed Hy-WiNoC. For the 10x10 network size the global average delay decreases by 10% 
from the wired mesh network to the D-WiNoC and from the D-WiNoC to the proposed Hy-
WiNoC, the global average delay decreases by almost 8%. The throughput of the D-WiNoC is 
0
5
10
15
20
25
30
35
40
0 0.01 0.02 0.03 0.04 0.05 0.06
Th
ro
u
gh
p
u
t 
(F
lit
s/
C
yc
le
)
PIR (packet/tile/cycle)
Wired
D-WiNoC
Hy-WiNoC
45 
 
shown to be almost 1.6 times higher than the wired mesh and a further 26% improvement is seen 
in the throughput from D-WiNoC to the proposed Hy-WiNoC. 
3.8: Performance evaluation under Transpose Traffic 
In transpose traffic the node (x,y) only sends data packet to the node (y,x). Figure 14 shows the 
improvement in the global average delay with varying packet injection rate for different network 
sizes and Figure 15 shows the improvement in the throughput for varying injection rate under 
uniform random traffic. 
 
(a) 
 
(b) 
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
0 0.01 0.02 0.03 0.04 0.05 0.06
G
lo
b
al
 A
ve
ra
ge
 D
el
ay
 (
C
yc
le
s)
PIR (packet/tile/cycle)
Wired
D-WiNoC
Hy-WiNoC
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
0 0.01 0.02 0.03 0.04 0.05 0.06
G
lo
b
al
 A
ve
ra
ge
 D
el
ay
 (
C
yc
le
s)
PIR (packet/tile/cycle)
Wired
D-WiNoC
Hy-WiNoC
46 
 
 
(c) 
Figure 14: Global average delay VS PIR for network sizes (a) 6x6 (b) 8x8 (c) 10x10 under Transpose 
Traffic 
 
(a) 
 
(b) 
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
0 0.01 0.02 0.03 0.04 0.05 0.06
G
lo
b
al
 A
ve
ra
ge
 D
el
ay
 (
C
yc
le
s)
PIR (packet/tile/cycle)
Wired
D-WiNoC
Hy-WiNoC
0
2
4
6
8
10
12
14
16
18
0 0.01 0.02 0.03 0.04 0.05 0.06
Th
ro
u
gh
p
u
t 
(F
lit
s\
C
yc
le
)
PIR (packet/tile/cycle)
Wired
D-WiNoC
Hy-WiNoC
0
5
10
15
20
25
0 0.01 0.02 0.03 0.04 0.05 0.06
Th
ro
u
gh
p
u
t 
(F
lit
s/
C
yc
le
)
PIR (packet/tile/cycle)
Wired
D-WiNoC
Hy-WiNoC
47 
 
 
(c) 
Figure 15: Throughput VS PIR for network sizes (a) 6x6 (b) 8x8 (c) 10x10 under Transpose Traffic 
Based on the presented results in figures 14 and 15, the global average delay decreases from the 
wired mesh to the D-WiNoC and a further decrease is seen from the D-WiNoC to the proposed 
Hy-WiNoC. Similarly the throughput for the various system sizes were increasing from the wired 
mesh to D-WiNoC and from the D-WiNoC to the Hy-WiNoC. For the 6x6 network size, the global 
average delay decreases by 17% from the wired mesh network to the D-WiNoC and from the D-
WiNoC to the proposed Hy-WiNoC, the global average delay decreases by almost 20%. The 
throughput of the D-WiNoC is shown to be almost 1.5 times higher than the wired mesh and a 
further 20% improvement is seen in the throughput from D-WiNoC to the proposed Hy-WiNoC. 
For the 8x8 network size, the global average delay decreases by 26% from the wired mesh network 
to the D-WiNoC and from the D-WiNoC to the proposed Hy-WiNoC, the global average delay 
decreases by almost 18%. The throughput of the D-WiNoC is shown to be almost 84% higher than 
the wired mesh and a 17% improvement is seen in the throughput from D-WiNoC to the proposed 
Hy-WiNoC. For the 10x10 network size the global average delay decreases by 16% from the wired 
mesh network to the D-WiNoC and from the D-WiNoC to the proposed Hy-WiNoC, the global 
average delay decreases by almost 10%. The throughput of the D-WiNoC is shown to be almost 
0
5
10
15
20
25
30
0 0.01 0.02 0.03 0.04 0.05 0.06
Th
ro
u
gh
p
u
t 
(F
lit
s/
C
yc
le
)
PIR (packet/tile/cycle)
Wired
D-WiNoC
Hy-WiNoC
48 
 
1.3 times higher than the wired mesh and a further 10% improvement is seen in the throughput 
from D-WiNoC to the proposed Hy-WiNoC. 
3.9: Performance evaluation under Hotspot Traffic 
In hotspot traffic, four hotspot nodes were selected for each network size and 20% of all the 
traffic generated by the other core tiles is sent to these four cores creating 4 separate but 
localized hotspots. The four selected hot spot nodes for the 6x6 network size were in the mesh 
coordinate position (1,1), (1,4), (4,1), (4,4). The four selected hot spot nodes for the 6x6 network 
size were in the mesh coordinate position (1,1), (1,6), (6,1), (6,6). The four selected hot spot 
nodes for the 10x10 network size were in the mesh coordinate position (1,1), (1,8), (8,1), (8,8). 
Figure 16 shows the improvement in the global average delay with varying packet injection rate 
for different network sizes and Figure 17 shows the improvement in the throughput for varying 
injection rate under uniform random traffic. 
 
(a) 
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
0 0.01 0.02 0.03 0.04 0.05 0.06
G
lo
b
al
 A
ve
ra
ge
 D
el
ay
 (
C
yc
le
s)
PIR (packet/tile/cycle)
Wired
D-WiNoC
Hy-WiNoC
49 
 
 
(b) 
 
(c) 
Figure 16: Global average delay VS PIR for network sizes (a) 6x6 (b) 8x8 (c) 10x10 under Hotspot 
Traffic 
 
(a) 
0
2000
4000
6000
8000
10000
12000
0 0.01 0.02 0.03 0.04 0.05 0.06
G
lo
b
al
 A
ve
ra
ge
 D
el
ay
 (
C
yc
le
s)
PIR (packet/tile/cycle)
Wired
D-WiNoC
Hy-WiNoC
0
2000
4000
6000
8000
10000
12000
0 0.01 0.02 0.03 0.04 0.05 0.06
G
lo
b
al
 A
ve
ra
ge
 D
el
ay
 (
C
yc
le
s)
PIR (packet/tile/cycle)
Wired
D-WiNoC
Hy-WiNoC
0
2
4
6
8
10
12
14
16
18
0 0.01 0.02 0.03 0.04 0.05 0.06
Th
ro
u
gh
p
u
t 
(F
lit
s/
C
yc
le
)
PIR (packet/tile/cycle)
Wired
D-WiNoC
Hy-WiNoC
50 
 
 
(b) 
 
(c) 
Figure 17: Throughput VS PIR for network sizes (a) 6x6 (b) 8x8 (c) 10x10 under Hotspot Traffic 
Based on the presented results in figure 16 and 17, the global average delay decreases from the 
wired mesh to the D-WiNoC and a further decrease is seen from the D-WiNoC to the proposed 
Hy-WiNoC. Similarly the throughput for the various system sizes were increasing from the wired 
mesh to D-WiNoC and from the D-WiNoC to the Hy-WiNoC. For the 6x6 network size, the global 
average delay decreases by 30% from the wired mesh network to the D-WiNoC and from the D-
WiNoC to the proposed Hy-WiNoC, the global average delay decreases by almost 29%. The 
throughput of the D-WiNoC is shown to improve by 77% from the wired mesh and a 42% 
improvement is seen in the throughput from D-WiNoC to the proposed Hy-WiNoC. For the 8x8 
network size, the global average delay decreases by 23% from the wired mesh network to the D-
0
5
10
15
20
25
0 0.01 0.02 0.03 0.04 0.05 0.06
Th
ro
u
gh
p
u
t 
(F
lit
s/
C
yc
le
)
PIR (packet/tile/cycle)
Wired
D-WiNoC
Hy-WiNoC
0
5
10
15
20
25
30
35
0 0.01 0.02 0.03 0.04 0.05 0.06
Th
ro
u
gh
p
u
t 
(F
lit
s/
C
yc
le
)
PIR (packet/tile/cycle)
Wired
D-WiNoC
Hy-WiNoC
51 
 
WiNoC and from the D-WiNoC to the proposed Hy-WiNoC, the global average delay decreases 
by almost 25%. The throughput of the D-WiNoC is shown to be almost 59% higher than the wired 
mesh and a 34% improvement is seen in the throughput from D-WiNoC to the proposed Hy-
WiNoC. For the 10x10 network size the global average delay decreases by 24% from the wired 
mesh network to the D-WiNoC and from the D-WiNoC to the proposed Hy-WiNoC, the global 
average delay decreases by almost 18%. The throughput of the D-WiNoC is shown to improve by 
89% than the wired mesh and a 50% improvement is seen in the throughput from D-WiNoC to the 
proposed Hy-WiNoC. 
3.10: Energy consumption 
 
Figure 18: Total energy consumption for three simulated systems 
The energy consumption of all the simulated system were evaluated in the Noxim environment 
using the power modelling tools within the simulator. Using the energy parameters from table 1 
and previous research [13] the total energy consumed by each of the simulated system was 
evaluated. Figure 18 shows the total energy consumed by the wired, the default WiNoC (D-
WiNoC) and the proposed hybrid WiNoC (Hy-WiNoC) during a simulation run with 10,000 
0.00E+00
1.00E-04
2.00E-04
3.00E-04
4.00E-04
5.00E-04
Wired D-WiNoC Hy-WiNoC
To
ta
l E
n
er
gy
 (
jo
u
le
)
6x6 8x8 10x10
52 
 
iterations. From figure 18 it can be seen that the highest energy consumption can be seen for the 
traditional wired system followed by the proposed Hy-WiNoC and the least energy consuming 
design was the default WiNoC. The increase in energy consumption from the D-WiNoC to the 
Hy-WiNoC is due the fact that the Hy-WiNoC has two modes of operation for its WIs whereas the 
D-WiNoC is only using the WIs based on the token policy. The additional orthogonal code-base 
broadcasting in the data mode causes all the WIs in the system to be active simultaneously and 
hence more energy is dissipated across the WIs in the system. Based on the presented results the 
proposed Hy-WiNoC shows almost 60% less energy consumption rates than the wired 
counterparts of the same size and with only 23% higher energy consumptions from the D-WiNoC 
systems of the same size. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53 
 
Chapter 4: FAULT TOLERANCE STUDY 
From the previous chapter it was seen that the proposed routing scheme is able to detect congestion 
in the network and based on the WIs VC buffer space the system is able to choose the most 
optimized path for the data packet to take. The same routing can be used to detect faulty WIs in 
the system. As seen in the previous chapters, during control mode one bit is allocated for the 
wireless hub status. This status bit can also be toggled to zero if the WI in that subnet becomes 
faulty. When this hub status bit is zero the routing scheme will not consider the faulty WIs in the 
transmission of the data packets wirelessly. If the failure is transient or intermittent the hub status 
can be once again toggled to 1 in order to resume wireless transmission to the subnets. Thus using 
the same routing scheme the system can be made fault tolerant and simulation results based on the 
fault tolerance study will be discussed below. 
 
Figure 19: Fault modeling and the Hotspot tiles 
54 
 
Figure 19 shows the faulty WIs which was modelled for the 8x8 system. From the figure it can be 
seen that the faulty WIs are considered to be in the 4 corner subnet of the proposed hybrid WiNoC 
network. The same 4 corner subnets were modelled to be faulty for the fault tolerance study in the 
10x10 system. The fault in the WIs were modelled by setting the specific hub’s reliability 
parameter in the Noxim simulator to zero. Furthermore in order to model the worst case behavior 
each tile in marked in orange in figure 19 was modelled to be core tiles with hotspot traffic. Based 
on the simulation run for 10,000 iterations, for 8x8 and 10x10 system the performance of the faulty 
networks were evaluated against the non-faulty D-WiNoC and the Hy-WiNoC respectively. 
 
(a) 
 
(b) 
Figure 20: 8x8 (a) Global Average Delay VS PIR (b) Throughput VS PIR 
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
0 0.01 0.02 0.03 0.04 0.05 0.06
G
lo
b
al
 A
ve
ra
ge
 D
el
ay
 (
C
yc
le
s)
PIR (packet/tile/cycle)
D-WiNoC
Hy-WiNoC
Faulty D-WiNoC
Faulty Hy-WiNoC
0
5
10
15
20
25
0 0.01 0.02 0.03 0.04 0.05 0.06
Th
ro
u
gh
p
u
t 
(F
lit
s/
C
yc
le
)
PIR (packet/tile/cycle)
D-WiNoC
Hy-WiNoC
Faulty D-WiNoC
Faulty Hy-WiNoC
55 
 
 
(a) 
 
(b) 
Figure 21: 10x10 (a) Global Average Delay VS PIR (b) Throughput VS PIR 
Figure 20 and 21 shows the performance degradation for the faulty D-WiNoC and the proposed 
Hy-WiNoC for system size of 8x8 and 10x10 respectively. From the figures it can be seen that that 
both of the faulty D-WiNoC and the Hy-WiNoC showed higher global average delay and lowered 
throughput, when compared to their non-faulty counterparts.  
For the 8x8 system the global average delay of the faulty D-WiNoC increased by 15% and 
throughput almost dropped by 30% when compared fault free D-WiNoC. The global average delay 
for the faulty Hy-WiNoC system increased by 10% and the throughput dropped by 12%, when 
compared to the fault-free Hy-WiNoC. For the 10x10 system the global average delay of the faulty 
0
2000
4000
6000
8000
10000
12000
0 0.01 0.02 0.03 0.04 0.05 0.06
G
lo
b
al
 A
ve
ra
ge
 D
el
ay
 (
C
yc
le
s)
PIR (packet/tile/cycle)
D-WiNoC
Hy-WiNoC
Faulty D-WiNoC
Faulty Hy-WiNoC
0
5
10
15
20
25
30
35
0 0.01 0.02 0.03 0.04 0.05 0.06
Th
ro
u
gh
p
u
t 
(F
lit
s/
C
yc
le
)
PIR (packet/tile/cycle)
D-WiNoC
Hy-WiNoC
Faulty D-WiNoC
Faulty Hy-WiNoC
56 
 
D-WiNoC increased by 20% and throughput almost dropped by 38% when compared fault free D-
WiNoC. The global average delay for the faulty Hy-WiNoC system increased by 11% and the 
throughput dropped by 23%, when compared to the fault-free Hy-WiNoC. In both network sizes 
it was seen that the the proposed Hy-WiNoC was more resilient to faults when compared to the 
default WiNoC with the same fault cases. This shows the proposed routing scheme in the proposed 
Hy-WiNoC system is able handle faults much more efficiently than the default WiNoC system. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57 
 
Chapter 5: CONCLUSION AND FUTURE WORK 
Wireless interconnection can be envisioned as the energy efficient communication framework for 
the current and the future multicore systems. The key aspect of this communication framework is 
to develop an underlying interconnect network that is able to communicate the maximum amount 
of information while creating the least amount of congestion in the system. In this thesis work the 
proposed two-state hybrid WiNoC was able to show just that by outperforming both the traditional 
wired based NoC and the non-deterministic WiNoC systems. 
The key aspect of the research was the development of a new routing scheme which is able to 
calculate the most optimized path for a packet to take based on the global congestion information. 
The global congestion information is shared between all the subnets via broadcasting this control 
information over the wireless interconnect framework using an orthogonal code-based 
broadcasting MAC protocol. On the other hand the core data is transmitted using both the wired 
and the wireless interconnect framework using the dynamic routing scheme and selection strategy 
and using a token based MAC protocol. Since the proposed system has two different MAC 
protocols, this thesis work also proposes a novel switching technique between the MAC protocols 
in order to maintain coherency between the two operation modes of the system. Results based on 
simulations has shown that the proposed system is able have lower latency and higher throughput 
when compared to a traditional wired NoC and non-deterministic WiNoC systems. One important 
observation that was seen was that, as the system size was increasing from 36 to 48 to 100 cores, 
the gain in performance was gradually decreasing which shows the proposed system has scalability 
issues as seen for other global congestion aware proposed in previous researches [38, 48, and 39]. 
The scalability issue of the system arise from the sharing of global congestion information. Since 
the system size is increasing the size of the global congestion information is becoming larger and 
58 
 
hence more clock cycles are utilized in broadcasting the congestion information rather than actual 
data transmission. Beside this drawback the proposed system is more suitable for systems having 
less than 100 cores due to its performance gain and is more resilient when compared to traditional 
wired mesh NoCs and non-deterministic WiNoC systems. 
Future work based on this thesis work can look into addressing the scalability issues seen in global 
congestion aware systems. One way to improve such system is to find a more optimized encoding 
technique for the congestion information in the control mode. If a more optimized encoding 
techniques is used instead of the orthogonal based coding the effective cycles for the data mode 
can be reduced. 
 
 
 
 
 
 
 
 
 
 
 
 
 
59 
 
Bibliography: 
1 Moore's law - https://spectrum.ieee.org/semiconductors/devices/transistors-could-stop-
shrinking-in-2021 
2 S. Vangal, 1. Howard, G. Ruhl, S. Dighe, H. Wilson, J. Tschanz, D. Finan, A. Singh, T. 
Jacob, S. Jain, V. Erraguntla, e. Roberts, Y Hoskote, N. Borkar, and S. Borkar, "An 80-Tile 
Sub-100-W TeraFLOPS Processorin 65-nm CMOS," Solid-State Circuits, IEEE Journal of, 
vol. 43, no. 1, pp. 29-41, Jan. 2008.  
3 Van Tol, Michiel W., Roy Bakker, Merijn Verstraaten, Clemens Grelck, and Chris R. 
Jesshope. "Efficient memory copy operations on the 48-core intel scc processor." In 3rd 
Many-core Applications Research Community (MARC) Symposium, vol. 7598. KIT 
Scientific Publishing, 2011. 
4 A. Agarwal, "The Tile Processor: A 64-Core Multicore for Embedded Processing," in 
High-Petformance Embedded Computing, 2007. 11th Annual Workshop on, Sep. 2007 
5 Cavium Processors - https://www.marvell.com/server-processors/thunderx2-arm-
processors/ 
6 Ryzen threadripper - https://arstechnica.com/gadgets/2017/05/amd-ryzen-threadripper-
price-specs-release-date/ 
7 Pande, Partha Pratim, Cristian Grecu, Michael Jones, Andre Ivanov, and Resve Saleh. 
"Performance evaluation and design trade-offs for network-on-chip interconnect 
architectures." IEEE transactions on Computers 54, no. 8 (2005): 1025-1040. 
8 Ogras, Umit Y., and Radu Marculescu. "" It's a small world after all": NoC performance 
optimization via long-range link insertion." IEEE Transactions on very large scale 
integration (VLSI) systems 14, no. 7 (2006): 693-706. 
9 Kaliraj, Pradheep Khanna. "Reliability-performance trade-offs in photonic NOC 
architectures." (2013). 
10 Briere, Matthieu, Bruno Girodias, Youcef Bouchebaba, Gabriela Nicolescu, Fabien 
Mieyeville, Frédéric Gaffiot, and Ian O'Connor. "System level assessment of an optical 
NoC in an MPSoC platform." In 2007 Design, Automation & Test in Europe Conference & 
Exhibition, pp. 1-6. IEEE, 2007. 
60 
 
11 Chang, M-CF, Ingrid Verbauwhede, Charles Chien, Zhiwei Xu, Jongsun Kim, Jenwei Ko, 
Qun Gu, and Bo-Cheng Lai. "Advanced RF/baseband interconnect schemes for inter-and 
intra-ULSI communications." IEEE Transactions on Electron devices 52, no. 7 (2005): 
1271-1285. 
12 Ko, Jenwei, Jongsun Kim, Zhiwei Xu, Qun Gu, Charles Chien, and M. Frank Chang. "An 
RF/baseband FDMA-interconnect transceiver for reconfigurable multiple access chip-to-
chip communication." In ISSCC. 2005 IEEE International Digest of Technical Papers. 
Solid-State Circuits Conference, 2005., pp. 338-602. IEEE, 2005. 
13 Vijayakumaran, Vineeth, Manoj Prashanth Yuvaraj, Naseef Mansoor, Nishad Nerurkar, 
Amlan Ganguly, and Andres Kwasinski. "CDMA enabled wireless network-on-chip." 
ACM Journal on Emerging Technologies in Computing Systems (JETC) 10, no. 4 (2014): 
28. 
14 Rahmani, Amir-Mohammad, Khalid Latif, Kameswar Rao Vaddina, Pasi Liljeberg, Juha 
Plosila, and Hannu Tenhunen. "Congestion aware, fault tolerant, and thermally efficient 
inter-layer communication scheme for hybrid NoC-bus 3D architectures." In Proceedings 
of the Fifth ACM/IEEE International Symposium on Networks-on-Chip, pp. 65-72. ACM, 
2011. 
15 Mo, Kwai Hung, Yaoyao Ye, Xiaowen Wu, Wei Zhang, Weichen Liu, and Jiang Xu. "A 
hierarchical hybrid optical-electronic network-on-chip." In 2010 IEEE Computer Society 
Annual Symposium on VLSI, pp. 327-332. IEEE, 2010. 
16 Razavi, Behzad. "Design of millimeter-wave CMOS radios: A tutorial." IEEE Transactions 
on Circuits and Systems I: Regular Papers 56, no. 1 (2009): 4-16. 
17 Deb, Sujay, Amlan Ganguly, Partha Pratim Pande, Benjamin Belzer, and Deukhyoun Heo. 
"Wireless NoC as interconnection backbone for multicore chips: Promises and challenges." 
IEEE Journal on Emerging and Selected Topics in Circuits and Systems 2, no. 2 (2012): 
228-239. 
18 Deb, Sujay, Kevin Chang, Xinmin Yu, Suman Prasad Sah, Miralem Cosic, Amlan 
Ganguly, Partha Pratim Pande, Benjamin Belzer, and Deukhyoun Heo. "Design of an 
energy-efficient CMOS-compatible NoC architecture with millimeter-wave wireless 
interconnects." IEEE Transactions on Computers 62, no. 12 (2013): 2382-2396. 
61 
 
19 Deb, Sujay, Amlan Ganguly, Kevin Chang, Partha Pande, Benjamin Beizer, and Deuk Heo. 
"Enhancing performance of network-on-chip architectures with millimeter-wave wireless 
interconnects." In ASAP 2010-21st IEEE International Conference on Application-specific 
Systems, Architectures and Processors, pp. 73-80. IEEE, 2010. 
20 Lee, Suk-Bok, Sai-Wang Tam, Ioannis Pefkianakis, Songwu Lu, M. Frank Chang, 
Chuanxiong Guo, Glenn Reinman et al. "A scalable micro wireless interconnect structure 
for CMPs." In Proceedings of the 15th annual international conference on Mobile 
computing and networking, pp. 217-228. ACM, 2009. 
21 Hanson, George W. "Fundamental transmitting properties of carbon nanotube antennas." 
IEEE Transactions on antennas and propagation 53, no. 11 (2005): 3426-3435. 
22 Saxena, Sagar, Deekshith Shenoy Manur, Md Shahriar Shamim, and Amlan Ganguly. "A 
folded wireless network-on-chip using graphene based THz-band antennas." In Proceedings 
of the 4th ACM International Conference on Nanoscale Computing and Communication, p. 
29. ACM, 2017. 
23 Floyd, Brian A., and Chih-Ming Hung. "Intra-chip wireless interconnect for clock 
distribution implemented with integrated antennas, receivers, and transmitters." IEEE 
Journal of Solid-State Circuits 37, no. 5 (2002): 543-552. 
24 Zid, Mounir, Abdelkrim Zitouni, Adel Baganne, and Rached Tourki. "New generic GALS 
NoC architectures with multiple QoS." In International Conference on Design and Test of 
Integrated Systems in Nanoscale Technology, 2006. DTIS 2006., pp. 345-349. IEEE, 2006. 
25 Krishna, Tushar, Amit Kumar, Patrick Chiang, Mattan Erez, and Li-Shiuan Peh. "NoC with 
near-ideal express virtual channels using global-line communication." In 2008 16th IEEE 
Symposium on High Performance Interconnects, pp. 11-20. IEEE, 2008. 
26 Mansoor, Naseef, Abhishek Vashist, M. Meraj Ahmed, Md Shahriar Shamim, Syed Ashraf 
Mamun, and Amlan Ganguly. "A Traffic-Aware Medium Access Control Mechanism for 
Energy-Efficient Wireless Network-on-Chip Architectures." arXiv preprint 
arXiv:1809.07862 (2018). 
27 Hu, Jingcao, and Radu Marculescu. "DyAD: smart routing for networks-on-chip." In 
Proceedings of the 41st annual Design Automation Conference, pp. 260-263. ACM, 2004. 
62 
 
28 Li, Ming, Qing-An Zeng, and Wen-Ben Jone. "DyXY: a proximity congestion-aware 
deadlock-free dynamic routing method for network on chip." In Proceedings of the 43rd 
annual Design Automation Conference, pp. 849-852. ACM, 2006. 
29 J. Wu, “A fault-tolerant and deadlock-free routing protocol in 2d meshes based on odd-
even turn model,” Computers, IEEE Transactions on, vol. 52, no. 9, pp. 1154–1169, Sept. 
2003. 
30 Dally, William J., and Brian Towles. "Route packets, not wires: on-chip inteconnection 
networks." In Proceedings of the 38th annual Design Automation Conference, pp. 684-689. 
Acm, 2001. 
31 Ho, Ron, Kenneth W. Mai, and Mark A. Horowitz. "The future of wires." Proceedings of 
the IEEE 89, no. 4 (2001): 490-504. 
32 Ganguly, Amlan, Kevin Chang, Sujay Deb, Partha Pratim Pande, Benjamin Belzer, and 
Christof Teuscher. "Scalable hybrid wireless network-on-chip architectures for multicore 
systems." IEEE Transactions on Computers 60, no. 10 (2011): 1485-1502. 
33 Vidapalapati, Anuroop, Vineeth Vijayakumaran, Amlan Ganguly, and Andres Kwasinski. 
"NoC architectures with adaptive code division multiple access based wireless links." In 
2012 IEEE International Symposium on Circuits and Systems, pp. 636-639. IEEE, 2012. 
34 Feige, Uriel, and Prabhakar Raghavan. "Exact analysis of hot-potato routing." In 
Proceedings., 33rd Annual Symposium on Foundations of Computer Science, pp. 553-562. 
IEEE, 1992. 
35 Nilsson, Erland, Mikael Millberg, Johnny Oberg, and Axel Jantsch. "Load distribution with 
the proximity congestion awareness in a network on chip." In 2003 Design, Automation 
and Test in Europe Conference and Exhibition, pp. 1126-1127. IEEE, 2003. 
36 Gratz, Paul, Boris Grot, and Stephen W. Keckler. "Regional congestion awareness for load 
balance in networks-on-chip." In 2008 IEEE 14th International Symposium on High 
Performance Computer Architecture, pp. 203-214. IEEE, 2008. 
37 Ebrahimi, Masoumeh, Masoud Daneshtalab, Pasi Liljeberg, Juha Plosila, and Hannu 
Tenhunen. "CATRA-congestion aware trapezoid-based routing algorithm for on-chip 
63 
 
networks." In 2012 Design, Automation & Test in Europe Conference & Exhibition 
(DATE), pp. 320-325. IEEE, 2012. 
38 Ma, Sheng, Natalie Enright Jerger, and Zhiying Wang. "DBAR: an efficient routing 
algorithm to support multiple concurrent applications in networks-on-chip." In ACM 
SIGARCH Computer Architecture News, vol. 39, no. 3, pp. 413-424. ACM, 2011. 
39 Ramakrishna, Mukund, Vamsi Krishna Kodati, Paul V. Gratz, and Alexander Sprintson. 
"GCA: Global congestion awareness for load balance in networks-on-chip." IEEE 
Transactions on Parallel and Distributed Systems 27, no. 7 (2016): 2022-2035. 
40 Valinataj, Mojtaba, Siamak Mohammadi, Juha Plosila, and Pasi Liljeberg. "A fault-tolerant 
and congestion-aware routing algorithm for networks-on-chip." In 13th IEEE Symposium 
on Design and Diagnostics of Electronic Circuits and Systems, pp. 139-144. IEEE, 2010. 
41 Hosseini, Amir, Tamer Ragheb, and Yehia Massoud. "A fault-aware dynamic routing 
algorithm for on-chip networks." In 2008 IEEE International Symposium on Circuits and 
Systems, pp. 2653-2656. IEEE, 2008. 
42 Zhu, Haibo, Partha Pratim Pande, and Cristian Grecu. "Performance evaluation of adaptive 
routing algorithms for achieving fault tolerance in NoC fabrics." In 2007 IEEE 
International Conf. on Application-specific Systems, Architectures and Processors (ASAP), 
pp. 42-47. IEEE, 2007. 
43 Charif, Amir, Nacer-Eddine Zergainoh, and Michael Nicolaidis. "Addressing transient 
routing errors in fault-tolerant Networks-on-Chips." In 2016 21th IEEE European Test 
Symposium (ETS), pp. 1-6. IEEE, 2016. 
44 Ahmed, Akram Ben, and Abderazek Ben Abdallah. "Adaptive fault-tolerant architecture 
and routing algorithm for reliable many-core 3D-NoC systems." Journal of Parallel and 
Distributed Computing 93 (2016): 30-43. 
45 Meyer, Michael Conrad, Akram Ben Ahmed, Yuki Tanaka, and Abderazek Ben Abdallah. 
"On the Design of a Fault-Tolerant Photonic Network-on-Chip." In 2015 IEEE 
International Conference on Systems, Man, and Cybernetics, pp. 821-826. IEEE, 2015. 
64 
 
46 Ganguly, Amlan, Paul Wettin, Kevin Chang, and Partha Pande. "Complex network inspired 
fault-tolerant NoC architectures with wireless links." In Proceedings of the fifth 
ACM/IEEE International Symposium on Networks-on-Chip, pp. 169-176. ACM, 2011. 
47 Mortazavi, Seyed Hassan, Reza Akbar, Farshad Safaei, and Amin Rezaei. "A fault-tolerant 
and congestion-aware architecture for wireless networks-on-chip." Wireless Networks: 1-
13. 
48 R.  Manevich,  I.  Cidon,  A.  Kolodny,  I.  Walter,  and  S.  Wimer,  “A  costeffective  centr
alized  adaptive  routing  for  networks-on-chip,”  in DSD, 2011. 
49 R.  S.  Ramanujam  and  B.  Lin,  “Destination-based  adaptive  routing  on2D mesh 
networks,” in ANCS, 2010. 
50 Branch, J., X. Guo, L. Gao, A. Sugavanam, and J-J. Lin. "Wireless communication in a 
flip-chip package using integrated antennas on silicon substrates." IEEE Electron Device 
Letters 26, no. 2 (2005): 115-117. 
51 Chang, Kevin, Sujay Deb, Amlan Ganguly, Xinmin Yu, Suman Prasad Sah, Partha Pratim 
Pande, Benjamin Belzer, and Deukhyoun Heo. "Performance evaluation and design trade-
offs for wireless network-on-chip architectures." ACM Journal on Emerging Technologies 
in Computing Systems (JETC) 8, no. 3 (2012): 23. 
52 Shamim, Md Shahriar, Naseef Mansoor, Rounak Singh Narde, Vignesh Kothandapani, 
Amlan Ganguly, and Jayanti Venkataraman. "A wireless interconnection framework for 
seamless inter and intra-chip communication in multichip systems." IEEE Transactions on 
Computers 66, no. 3 (2017): 389-402. 
53 Lin, Jau-Jr, Hsin-Ta Wu, Yu Su, Li Gao, Aravind Sugavanam, and Joe E. Brewer. 
"Communication using antennas fabricated in silicon integrated circuits." IEEE Journal of 
solid-state circuits 42, no. 8 (2007): 1678-1687. 
54 Shinde, Tanmay, Suryanarayanan Subramaniam, Padmanabh Deshmukh, M. Meraj Ahmed, 
Mark Indovina, and Amlan Ganguly. "A 0.24 pJ/bit, 16Gbps OOK Transmitter Circuit in 
45-nm CMOS for Inter and Intra-Chip Wireless Interconnects." In Proceedings of the 2018 
on Great Lakes Symposium on VLSI, pp. 69-74. ACM, 2018. 
65 
 
55 Yu, Xinmin, Hooman Rashtian, Shahriar Mirabbasi, Partha Pratim Pande, and Deukhyoun 
Heo. "An 18.7-Gb/s 60-GHz OOK demodulator in 65-nm CMOS for wireless network-on-
chip." IEEE Transactions on Circuits and Systems I: Regular Papers 62, no. 3 (2015): 799-
806. 
56 Wang, Xin, Tapani Ahonen, and Jari Nurmi. "Applying CDMA technique to network-on-
chip." IEEE transactions on very large scale integration (VLSI) systems 15, no. 10 (2007): 
1091-1100. 
57 Ganguly, Amlan, M. Ahmed, Rounak Singh Narde, Abhishek Vashist, Md Shamim, Naseef 
Mansoor, Tanmay Shinde et al. "The Advances, Challenges and Future Possibilities of 
Millimeter-Wave Chip-to-Chip Interconnections for Multi-Chip Systems." Journal of Low 
Power Electronics and Applications 8, no. 1 (2018): 5. 
58 Duato, J. "Interconnection Networks: An Engineering Approach, M. Kaufmann Pub." Inc., 
USA (2002). 
59 Catania, Vincenzo, Andrea Mineo, Salvatore Monteleone, Maurizio Palesi, and Davide 
Patti. "Cycle-accurate network on chip simulation with noxim." ACM Transactions on 
Modeling and Computer Simulation (TOMACS) 27, no. 1 (2016): 4. 
60 Tang, Minghua, Xiaola Lin, and Maurizio Palesi. "Local congestion avoidance in Network-
on-Chip." IEEE Transactions on Parallel and Distributed Systems 27, no. 7 (2016): 2062-
2073. 
 
 
 
 
 
 
 
 
 
