Rochester Institute of Technology

RIT Scholar Works
Theses
1-2014

Design Trade-offs for reliable On-Chip Wireless Interconnects in
NoC Platforms
Manoj Prashanth Yuvaraj

Follow this and additional works at: https://scholarworks.rit.edu/theses

Recommended Citation
Yuvaraj, Manoj Prashanth, "Design Trade-offs for reliable On-Chip Wireless Interconnects in NoC
Platforms" (2014). Thesis. Rochester Institute of Technology. Accessed from

This Thesis is brought to you for free and open access by RIT Scholar Works. It has been accepted for inclusion in
Theses by an authorized administrator of RIT Scholar Works. For more information, please contact
ritscholarworks@rit.edu.

Design Trade-offs for reliable On-Chip Wireless
Interconnects in NoC Platforms
by

Manoj Prashanth Yuvaraj
A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of
Master of Science in Computer Engineering
Supervised by
Dr. Amlan Ganguly
Department of Computer Engineering
Kate Gleason College of Engineering
Rochester Institute of Technology
Rochester, NY
January, 2014

Approved By:

_____________________________________________________________________________________________________

Dr. Amlan Ganguly
Primary Advisor – R.I.T. Dept. of Computer Engineering

_____________________________________________________________________________________________________

Dr. Andres Kwasinski
Secondary Advisor – R.I.T. Dept. of Computer Engineering

_____________________________________________________________________________________________________

Dr. Muhammad Shaaban
Secondary Advisor – R.I.T. Dept. of Computer Engineering
1

Dedication

I would like to dedicate this thesis to my parents Mr. A.R. Yuvaraj and Mrs. Sathya
Praba Yuvaraj and my brother Madan Kumar Yuvaraj who have supported me from
the beginning of this journey. I would also like to dedicate this to my mentor and all
my friends who have been a great source of motivation and inspiration.

2

Acknowledgements

I take this opportunity to express my profound gratitude and deep
regards to my primary advisor Dr. Amlan Ganguly for his exemplary
guidance, monitoring and constant encouragement throughout this thesis.
Dr. Ganguly dedicated his valuable time to review my work constantly and
provide valuable suggestions which helped in overcoming many obstacles
and keeping the work on the right track. I would like to express my deepest
gratitude to Dr. Andres Kwasinski and Dr. Muhammad Shaaban for sharing
their thoughts and suggesting valuable ideas which have had significant
impact on this thesis. I am grateful for their valuable time and cooperation
during the course of this thesis. I also take this opportunity to thank my
research group members for all the constant support and help provided by
them.
Lastly, I would like to thank my family and friends for their constant
motivation, encouragement and heartfelt support during the course of this
work.

3

Abstract
The massive levels of integration following Moore’s Law making modern
multi-core chips prevail in various domains ranging from scientific applications to
bioinformatics applications for consumer electronics. With higher and higher
number of cores on the same die traditional bus based interconnections are no
longer a scalable communication infrastructure. On-chip networks were proposed
enabled a scalable plug-and-play mechanism for interconnecting hundreds of cores
on the same chip. Wired interconnects between the cores in a traditional Networkon-Chip (NoC) system, becomes a bottleneck with increase in the number of cores
thereby increasing the latency and energy to transmit signals over them. Hence,
there has been many alternative emerging interconnect technologies proposed,
namely, 3D, photonic and multi-band RF interconnects. Although they provide
better connectivity, higher speed and higher bandwidth compared to wired
interconnects; they also face challenges with heat dissipation and manufacturing
difficulties. On-chip wireless interconnects is one other alternative proposed which
doesn’t need physical interconnection layout as data travels over the wireless
medium. They are integrated into a hybrid NOC architecture consisting of both
wired and wireless links, which provides higher bandwidth, lower latency, lesser
area overhead and reduced energy dissipation in communication. An efficient media
access control (MAC) scheme is required to enhance the utilization of the available
bandwidth. A token-passing protocol proposed to grant access of the wireless
channel to competing transmitters. This limits the number of simultaneous users of

4

the communication channel to one although multiple wireless hubs are deployed
over the chip. In principle, a Frequency Division Multiple Access (FDMA) based
medium access scheme would improve the utilization of the wireless resources.
However, this requires design of multiple very precise, high frequency transceivers
in non-overlapping frequency channels. Therefore, the scalability of this approach is
limited by the state-of-the-art in transceiver design. The Code Division Multiple
Access (CDMA) enables multiple transmitter–receiver pairs to send data over the
wireless channel simultaneously. The CDMA protocol can significantly increase the
performance of the system while lowering the energy dissipation in data transfer.
The CDMA based MAC protocol outperforms the wired counterparts and several
other wireless architectures proposed in literature in terms of bandwidth and
packet energy dissipation.
However, the reliability of CDMA based wireless NoC’s is limited, as the
probability of error is eminent due to synchronization delays at the receiver. The
thesis proposes the use of an advanced filter which improves the performance and
also reduces the error due to synchronization delays. This thesis also proposes
investigation of various channel modulation schemes on token passing wireless
NoC’s to examine the performance and reliability of the system. The trade-off
between performance and energy are established for the various conditions. The
results are obtained using a modified cycle accurate simulator.

5

Contents
Dedication………. .............................................................................................................. 2
Acknowledgements ........................................................................................................... 3
Abstract……………. .............................................................................................................. 4
Chapter 1.

Introduction ................................................................................................. 11

1.1. End of Uniprocessor systems .................................................................. 11
1.2. Early multi-core interconnections ........................................................... 12
1.3. Network-on-Chip Paradigm (NoC) ......................................................... 13
1.4. Switching techniques............................................................................... 14
1.5. Emerging interconnects ........................................................................... 16
1.6. Thesis Contributions ............................................................................... 19
Chapter 2.

Related Work ............................................................................................... 22

Chapter 3.

Network Architecture .................................................................................. 24

3.1. Small World topology ............................................................................. 25
3.2. On-chip Antennas .................................................................................... 27
3.3. CDMA Based Wireless Interconnects..................................................... 29
3.4. CDMA MAC protocol ............................................................................ 31
3.5. Adaptive CDMA protocol ....................................................................... 33
3.6. Data Routing ........................................................................................... 34
3.7. Performance Metrics ............................................................................... 35
3.8. Achievable Bandwidth of the CDMA-WiNoC ....................................... 38
3.9. Packet Energy Dissipation ...................................................................... 39
3.10. Performance Evaluation with varying Packet Size ................................. 43
3.11. Performance Evaluation with Non-Uniform Traffic ............................... 43
6

3.12. Area Overheads ....................................................................................... 47
Chapter 4.

Reliability analysis of the CDMA WiNoC.................................................. 50

4.1. Interference of a Matched Filter .............................................................. 50
4.2. Interference Suppression using Advanced Decoder................................ 56
4.3. Advance Modulation Schemes ................................................................ 63
Chapter 5.

Conclusion and future work ........................................................................ 66

5.1. Summary ................................................................................................. 66
5.2. Future Work ............................................................................................ 67
Bibliography ……………………………………………………………………..68

7

List of Figures
Figure 1-1: Transistor count & Moore’s law (reproduced from [1]) ............................... 12
Figure 1-2: Network-on-Chip architecture ................................................................................ 15
Figure 1-3: Network Switch with virtual channels ................................................................. 16
Figure 3-1:CDMA-WiNoC Architecture ........................................................................................ 27
Figure 3-2: (a) On-chip metal zig-zag antenna(reproduced from [3]) (b) On-chip
antenna placement on the die(reproduced from [23]) ................................................ 29
Figure 3-3: The adopted CDMA (a) encoder and (b) decoder ............................................ 32
Figure 3-4: Peak achievable bandwidth of mesh, SWNoC and CDMA-WiNoCs for
64,128 and 256 core systems ................................................................................................. 39
Figure 3-5: Packet energy dissipation of mesh, SWNoC and CDMA-WiNoCs for
64,128 and 256 core systems ................................................................................................. 40
Figure 3-6: Energy breakup comparison for (a) Mesh (b) SWNoC and (c) CDMAWiNoC .............................................................................................................................................. 41
Figure 3-7 : Peak Bandwidth and Packet Energy dissipation comparison of various
WiNoCs with 64 cores. .............................................................................................................. 41
Figure 3-8 : Packet Energy dissipation and Throughput comparison of various
packet sizes with 64 cores ....................................................................................................... 43
Figure 3-9 : Bandwidth comparison of various non-uniform traffic in CDMA-WiNoC
with 64 cores ................................................................................................................................. 45
Figure 3-10: Area overheads of various WiNoCs with 64 cores. ....................................... 48
Figure 4-1: The CDMA channel model ......................................................................................... 50

8

Figure 4-2: The effects of delay between two CDMA signals in misaligning the
spreading code sequences ....................................................................................................... 52
Figure 4-3: SIR (dB) vs

................................................................................................................. 54

Figure 4-4 : Flow diagram of CDMA decoder with MVDR filter. ........................................ 60
Figure 4-5: (a) SIR(dB) vs Chip Shifts (b)SINR(dB) vs Chip Shifts ................................... 61
Figure 4-6: Energy Comparison between matched filter and MVDR filter. ................... 62
Figure 4-7: Bandwidth and throughput of BPSK, QPSK and 16-QAM modulation
schemes. .......................................................................................................................................... 65

9

List of Tables
Table 1: Characteristics of the Transceiver Modules ............................................................. 38
Table 2: Percentage of busy and idle cycles in a 64-core system given default
problem sizes ................................................................................................................................ 46
Table 3: Comparison of Area overhead of WiNoCs ................................................................. 49

10

Chapter 1. Introduction
The integration of transistors on a single chip is increasing at a massive scale
following Moore’s law. While moving into the billion-transistor era, chip designers
are encouraged to come up with computationally powerful processors pervasive in
several domains ranging from astrophysics, weather forecasting, and bioinformatics
to consumer electronics.

1.1. End of Uniprocessor systems
In, figure 1-1, a plot of the transistor count versus their introduction timeline
is shown, where it can be observed that from the past few years, there is a shift to
incorporate multiple core systems on a chip instead of the traditional single
processor systems. [1]
The traditional method of increasing the operating frequency in a uniprocessor system in order to achieve better performance has reached its limit due to
the soaring power dissipation. The increase in power density and switching activity
due to increase in frequency results in higher power dissipation. Hence designers
have shifted to designing multiple cores on a single chip. This surely avoided the
limiting frequency problem and increased the computational power of the system
by introducing core-level parallelism. However it also posed a new challenge of
interconnecting these multiple cores. As the number of cores increases, the
interconnection topology becomes critical in determining the system performance.

11

Figure 1-1: Transistor count & Moore’s law (reproduced from [1])

1.2. Early multi-core interconnections
Initial dual-core systems like IBM Power4/5, Intel Pentium Core Duo systems
had their cores communicating with each other by sharing memory. However if the
number of cores are increased, there is a need for more sophisticated interconnects
to deliver better performance in terms of throughput and latency.
Current multi-core systems predominantly use shared-bus based or a peer to
peer based architecture for interconnecting different cores on a chip. These
architectures work fine for a few cores. However as the number of cores increases,

12

in shared bus architecture, the bandwidth available to each core reduces resulting in
lower throughput and increased contention delays. Hence the system becomes non
scalable beyond a certain limit. In case of a peer to peer network, though it provides
good connectivity among cores, it would be impractical to scale the system beyond a
certain limit due to the huge amount of wiring needed to connect all the cores
together. Hence there is a need for a scalable interconnect system which addresses
these issues. Also, the delay exceeding a single clock cycle due to increasing wire
delay with increase in wire length makes it hard to maintain a globally synchronized
system.

1.3. Network-on-Chip Paradigm (NoC)
To overcome the above mentioned problems, research has been going on to
develop a communication centric approach to integrate cores on a single chip. This
new approach of designing scalable communication fabrics between the cores is
called the Network-on-Chip (NoC) paradigm. [2] This approach separates the
processing elements (i.e. cores) from the communication network as shown in
figure 1-2. It is a scalable plug-and-play system where the communication
infrastructure is isolated from the functionality of the cores. The data, usually in a
packetized form routes between cores over a dedicated interconnection network
consisting of network switches and inter-switch links. Such an approach facilitates
reusability and inter-operability of the modules by defining standard interfaces.

13

1.4. Switching techniques
For a conventional NOC system, there can be basically three types of
switching that can be considered for data routing. Namely, Circuit Switching, Packet
Switching and Wormhole Switching.
In case of circuit switched networks, a dedicated path is reserved for the
complete duration of the transmission. Even though the network bandwidth is
reserved during the transmission it is highly inefficient when there are many nodes
waiting for transmission along the same path which eventually degrades the system
performance.
In case of packet switching, data is divided into packets and sent over the
network to the destination. Even though there is no reservation of path for
transmission, the packets needs to be buffered in the switches along the path to the
destination. In an SOC, this means more area overhead for the switches which are
not acceptable as on-chip silicon real estate is limited.
In this research work, wormhole switching is adopted wherein packets are
divided into small units called flow control units or flits. The size of flit is chosen
such that a single flit can traverse a single hop in a single clock cycle. These flits are
transmitted along the network across switches .Hence the large buffer requirement
for the switches are avoided. The first flit or the header flit of a packet contains the
routing information .This information enables the switches to setup the path and the
rest of the flits follow this path in a pipelined fashion [2]. But a problem associated
with such a switching technique is that distinct messages cannot be sent over a
switch at the same time, as the path would be reserved for a particular packet till it
14

is completely transmitted. Hence to solve this problem a concept called virtual
channels was introduced.

Figure 1-2: Network-on-Chip architecture

Basically a virtual path is reserved for each distinct message. This is
accomplished by reserving separate buffers for each message in all the switches
along the path, forming a distinct virtual path for each message. Figure 1-3 shows a
block diagram of how this is accomplished. Here node A and node B are allocated

15

separate buffers along the path which enables the switch to receive and send
messages from both the nodes, simultaneously using a multiplexer.

Figure 1-3: Network Switch with virtual channels

1.5. Emerging interconnects
In the future as the technology shrinks, longer wired interconnects would
result in higher power dissipation and delay. According to the International
Technology Roadmap for Semiconductors (ITRS) interconnects are the major
bottlenecks to overcome the power-performance barrier in the future generations.
This clearly indicates the challenges facing future chip designers associated with
traditional scaling of conventional on-chip, metal/dielectric based interconnects. To
enhance the performance of such conventional interconnect-based multi-core chips
a few radically different interconnect technologies are being currently explored;
such as 3D integration, Photonic interconnects and multi-band RF or wireless
interconnects [3, 4]. All these new technologies along with appropriate signaling
techniques have been predicted to be capable of enabling multi-core NoC designs,
which improve the speed and energy dissipation in data transfer significantly.

16

However, these alternative interconnect paradigms are in their formative stage and
need to overcome significant challenges pertaining to integration and reliability.,
The following paragraphs explain each of these emerging interconnects briefly.
Three-dimensional integration consists of integrating multiple active layers
onto a single chip. Some of the advantages are that it results in lesser hop counts due
to the reduction in the number of interconnections and their average lengths. But as
claimed in [2], it has its disadvantages as well. Due to smaller foot print, the power
density on a 3-D structure would be high which causes high heat dissipation. Also
there are technological challenges in actually fabricating such structures such as
thinning of wafers, inter-device layer alignment, bonding, and interlayer contact
patterning [5]. Moreover, this also increases the risk of manufacturing defects and
also demands new CAD tool which support 3D integration.
In photonic interconnects, wired interconnects are replaced with optical
interconnects. It is stated in [6] that such interconnects would considerably enhance
the bandwidth and decrease latency as the data would be transmitted at the speed
of light. In [7] it is mentioned that due to the low loss in optical waveguides, data
could be transmitted from one end to the other without the need for regeneration
and buffering. However, some of the challenges in this field are the technology
needed to manufacture these photonic devices which is still in preliminary stages.
Also integrating these devices with silicon-compatible circuits under the constraints
of area, delay and other performance metrics remains a challenge.
Normally in a wired interconnect; data is transmitted by charging and
discharging of wires to a certain voltage signifying a ‘0’ or a ‘1’. However, in multi17

band RF interconnects, data is transmitted by sending electromagnetic (EM) waves
along the wires that act as transmission lines. The data is modulated onto a carrier
using amplitude or phase shift keying [8]. Data bandwidth over the wire can be
increased by combining multiple non-overlapping carriers onto a single
transmission line. Also, EM waves travel at the speed of light. Hence, low latency and
high bandwidth communication can be achieved. However, designing high
frequency oscillators and filters on the chip for the transceivers is a non-trivial
challenge.
On-chip wireless interconnects is an alternative to wired links wherein long
wired paths are replaced with wireless interconnects. In addition to better
bandwidth utilization, lower delays and avoiding cross-talk interference in wired
interconnects, it also stands out from the rest of the emerging interconnects in the
sense that, they don’t need physical interconnection layouts as data travels in free
space.
Long range wireless shortcuts were introduced between distant cores on the
chip in [3]. However, the limited bandwidth of the wireless channels at such high
frequencies limits the achievable performance benefits. In this work Code Division
Multiple Access (CDMA) based long-range wireless links are used to enable multiple
transmitters sharing the wireless channel simultaneously. It is already shown in [4]
increasing the number of parallel communications over the wireless channel can
improve the performance as the network is better connected even though the total
wireless bandwidth is shared between the links. However, the reliability of CDMA
based wireless NoC’s is limited, as the probability of error is eminent due to
18

synchronization delays at the receiver. Hence there is a need for different
modulation schemes and advanced transceiver design, which are more efficient.

1.6. Thesis Contributions
In this thesis work it will be demonstrated that by using various modulation
schemes, the wireless NoCs can be designed to achieve higher throughput and
dissipate lower energy compared to the conventional wired/wireless counterparts
without significant area overheads. The MVDR filter will be implemented and will
prove to be more effective than the previous CDMA decoder. Furthermore, the
numbers of wireless users in the CDMA network will be varied to obtain
performance and packet energy dissipation. The trade-off between performance and
energy will be established for the various conditions. The following point
summarizes the contributions made during this work.


Proposed Development of Network


Development and design of an advanced transceiver for improved
efficiency.



Design of efficient wireless NoC architecture with CDMA based
interconnects using advanced modulation schemes.



Reliability analysis


Reliability analysis of the CDMA based MAC protocol over the wireless
medium.



Evaluating effect of loss of synchronization on the reliability of the
wireless interconnects.

19



Development of an advanced decoder to improve the reliability of the
wireless interconnects.



RTL Design


Develop the RTL level design of the CDMA encoder and decoder and
synthesize using 65nm standard cell libraries.



Develop the RTL level design for the model of the advanced decoder
used to improve the reliability of the wireless interconnects.



Development of simulations framework


Develop a cycle accurate simulator to implement the wireless NoC
architectures with 3-stage switches namely, input, output arbitrations
and routing to determine the following parameters



Obtain experimental results of the various MAC protocol wireless
NoC architecture with other wired and wireless architectures with
respect to the following parameters using the cycle accurate simulator





Peak achievable bandwidth



Packet energy dissipation



Non-uniform traffic patterns



Scalability - Increasing packet sizes



Area overheads

Publications
a. Vineeth

Vijayakumaran,

Manoj

Prashanth

Yuvaraj, Naseef Mansoor, Nishad Nerurkar,
Amlan

Ganguly,

Andres

Kwasinski

“CDMA

Enabled Wireless Network-on-Chip” Proceedings
20

of the ACM Journal on Emerging Technologies in
Computing Systems (JETC), October 9 2013.
b. Naseef Mansoor, Manoj Prashanth Yuvaraj,
Amlan Ganguly “A Robust Medium Access
Mechanism

for

Millimeter-Wave

Wireless

Network-on-Chip Architectures” Proceedings of
the

IEEE

Conference

International
(SOCC),

System-on-Chip

Erlangen,

Germany,

September 04-06, 2013.
c. Naseef Mansoor, Manoj Prashanth Yuvaraj,
Amlan Ganguly “An Energy-Efficient and Robust
Millimeter-Wave
Architecture”

Wireless

Proceedings

Network-on-Chip
of

the

IEEE

International Symposium on Defect and Fault
Tolerance in VLSI and Nanotechnology Systems
(DFT), New York, United States of America,
October 02-04 2013.
The following thesis is broken down into many sections as follows
 Chapter 2: Related Work – This chapter talk about the previous technologies and
developments similar to this field of work.
 Chapter 3: Network Architecture – This chapter explains in detail the required
elements for the network on chip used for this work.
 Chapter 4: Reliability – This chapter derives the reliability of the previous work
and introduces an advance decoder design.
 Chapter 5: Conclusion and Future Work – This chapter concludes all the results
from this thesis and explores a key enhancement for future development.

21

Chapter 2. Related Work
There have been many NOC architectures proposed. In [2], the authors lists
the most prominent interconnect architectures suggested so far which includes
SPIN (Scalable, Programmable and Integrated Network), CLICHE (Chip-Level
Integration of Communicating Heterogeneous Elements), torus, folded torus,
octagon and Butterfly Fat-Tree (BFT). However if all of these topologies are
implemented as completely wired interconnects, none of them would be scalable
beyond a certain point. This is because as the technology shrinks, delay and power
dissipations on traditional metal wires become the limiting factor in performance
compared to gate delays. Also as the wires become thinner, they become more
susceptible to noise and thus become less reliable.
Recently, the design of a wireless NoC based on CMOS Ultra Wideband
(UWB) technology was proposed [9]. In [10] a wireless Media Access Control (MAC)
protocol based on time-multiplexing of ultra-short pulses from the UWB
transceivers was proposed to enable concurrent use of the wireless channels. A
wireless NoC with unequal RF transceivers is proposed in [11] to improve the
performance in a conventional mesh topology overlaid with wireless interconnects.
In [12] the design of on-chip wireless communication network with miniature
antennas and simple transceivers that operate at the sub-THz range of 100-500 GHz
has been proposed.
Design of a wireless NoC using the small-world topology using carbon
nanotube (CNT) antennas operating in the THz frequency range is elaborated in [3].
Due to the possibility of tuning CNT antennas to various frequencies it was possible
22

to communicate using Frequency Division Multiplexing (FDM) on non-overlapping
channels. However, challenges of fabrication and integration of CNT antennas with
CMOS processes may hinder its adoption in the near future. In [3] design of a
wireless NoC with CMOS compatible mm-wave transceivers was proposed. The
access to transfer data over the wireless channel was shared between multiple
transmitters using a token passing mechanism. This granted access of the wireless
medium to only one transmitter at a time. In [13] combination of Time and
Frequency Division Multiplexing is used to transfer data over inter-router wireless
express channels. However, the issues of inter-channel interference due to multiple
adjacent frequency channels remain unresolved in this work.
In [14] and [15] digital implementations of a CDMA-based wireline NoC were
proposed. However, both these CDMA based NoCs have centralized controllers that
allocate codes to the transceivers and add the encoded CDMA bits (chips) prior to
sending over the NoC fabric. Such centralized control schemes are not suitable for
the distributed MAC protocol desired in the Wireless NoC. The reliability of CDMA
based wireless NoC’s is limited, as the probability of error is eminent due to
synchronization delays at the receiver.

In this work an advanced decoder is

proposed which will reduce the interference caused due to synchronization delays.

23

Chapter 3. Network Architecture
The earlier interconnect technologies have been used in existing NoC
platforms without significant architectural innovations, which undermines the
performance gains. However, the emerging technologies make direct connections
between physically distant cores on the chip viable due to their high communication
bandwidth and low power dissipation characteristics. This allows innovation in the
design of the NoC architecture to maximize the utilization of the performance
benefits of these emerging interconnects, specifically the wireless communication
channels.
Many naturally occurring networks are known to have the so-called smallworld property. Networks with the small-world property have a very short average
path length, which is commonly measured as the number of hops between any pair
of nodes. The average shortest path length of small-world graphs is bounded by a
polynomial in log(N), where N is the number of nodes, which makes them
particularly interesting for efficient communication with minimal resources [16,
17]. This feature of small-world graphs makes them particularly attractive for
constructing scalable WiNoCs. Most complex networks, such as social networks, the
Internet, as well as certain parts of the brain exhibit the small-world property. This
makes them scalable with increase in system size. Thus such connection topologies
are suitable for modern multi-core systems, which have hundreds of cores on a
single die.
The adopted small-world topology essentially inserts long-range links in the
NoC. However, long wireline interconnects incur high energy dissipation and
24

latency in data transfer. So as many long-range links as possible are replaced with
wireless interconnects based on the number of CDMA channels as discussed later in
this section. First the adopted scalable small-world based wireless NoC architecture
is discussed and then the CDMA based wireless interconnects are described which
make the NoC more energy efficient.

3.1. Small World topology
In this type of topology, each core is connected to a NoC switch and the
switches are interconnected using wireline and wireless links. The topology is a
small-world network where the links between switches are established following a
power law distribution as shown below.
Pi, j  

lij



 
i

f ij
l



j ij

(1)
f ij

where, the probability of establishing a link, between two switches, i and j, P(i,j),
separated by an Euclidean distance of lij is proportional to the distance raised to a
finite power [17]. The distance is obtained by considering a tile-based floorplan of
the cores on the die. The frequency of traffic interaction between the cores, fij, is
also factored into (1) so that more frequently communicating cores have a higher
probability of having a direct link. This frequency is expressed as the percentage of
traffic generated from i that is addressed to j. This frequency distribution is based
on the particular application mapped to the overall NoC and is hence known prior to
wireless link insertion. Therefore, the apriori knowledge of the traffic pattern is
used to establish the topology with a correlation between traffic distribution across

25

the NoC and network configuration as in [18]. This optimizes the network
architecture for non-uniform traffic scenarios. The parameter α govern the nature of
connectivity. Higher the value of alpha, lesser the number of longer links which
brings down the total wiring cost for the system. Also, it is established in [17] that
choosing a value of α<D+1, where D is the dimension of the network a small-world
network connectivity can be established.
In our case the NoC is arranged in a 2D tile and consequently, D=2. The value
of α was chosen to be 1.8 to establish a small-world connectivity [17] for which it also
noticed that the system has maximum throughput with minimum wiring cost. As the links

are established probabilistically following (1) the number of ports of each switch
may not be the same. The average number of ports per switch is however
constrained to be 5 to have the total number of connections same as that of a mesh.
According to [17] an upper bound of 9 ports was imposed on each switch such that
no switch becomes unrealistically large.
As long wired interconnects are extremely costly both in terms of power and
latency wireless links are used to realize as many long-range links as possible. The
number of wireless transceivers depends on the number of CDMA channels created.
As the particular antenna chosen is not directional in its radiation pattern as
discussed in section later any transceiver can communicate with any other
transceiver on the chip and form a fully connected wireless network overlaid on the
wireline small-world topology. Starting with the longest, the long-range links are
realized with the wireless interconnects until all the channels are used up to form
the overlaid fully connected wireless network. Figure.3-1 represents such a CDMA26

WiNoC with 25 cores where each core is associated with a NoC switch (not shown
for clarity).

Wireless
Switch

Wireline
interconnect
Wireless
Interconnect

NoC Switch

Figure 3-1:CDMA-WiNoC Architecture

3.2. On-chip Antennas
Suitable on-chip antennas are necessary to establish wireless links for
WiNoCs. In [13] the authors demonstrated the performance of silicon integrated onchip antennas for intra- and inter-chip communication. They have primarily used

27

metal zig-zag antennas operating in the range of tens of GHz. Design of an ultrawideband (UWB) antenna for inter- and intra-chip communication is elaborated in
[19]. This particular antenna was used in the design of a wireless NoC [9] mentioned
earlier in chapter 1. The above mentioned antennas principally operate in the
millimeter wave (tens of GHz) range and consequently their sizes are on the order of
a few millimeters. If the transmission frequencies can be increased to THz/optical
range then the corresponding antenna sizes decrease, occupying much less chip real
estate. Characteristics of metal antennas operating in the optical and near-infrared
region of the spectrum of up to 750 THz have been studied [20].
Antenna characteristics of carbon nanotubes (CNTs) in the THz/optical
frequency range have also been investigated both theoretically and experimentally
[21-22]. Although CNT antennas will support higher data bandwidth but significant
manufacturing challenges need to be overcome to make them feasible for adoption
in mainstream chip fabrication processes. That is why a metal based CMOS process
compatible antenna structure is used in this work which can be adopted in the near
future.
The on-chip antenna for the wireless NoC has to provide the best power gain
for the smallest area overhead. A metal zig-zag antenna [23] has been demonstrated
to possess these characteristics. This antenna also has negligible effect of rotation
(relative angle between transmitting and receiving antennas) on received signal
strength, making it most suitable for on-chip wireless interconnects. This thesis
work uses the zig-zag antenna used in [3] designed with 10μm trace width, 60μm
arm length and 30° bend angle. The axial length depends on the operating frequency
28

of the antenna. The characteristics of the antennas are simulated using the ADS
momentum tool. High resistivity silicon substrate (=5kΩ-cm) is used for the
simulation. The details of the antenna simulation setup and antenna structure are
shown in Figure 3-2(a) [24]. To represent a typical inter-subnet communication
range the transmitter and receiver were separated by 20 mm. The forward
transmission gain (S21) of the antenna obtained from the simulation is shown in
Figure. 3-2(b). As shown in Figure. 3-2(b), we are able to obtain a 3 dB bandwidth
of 16 GHz with a center frequency of 57.5 GHz. For optimum power efficiency, the
quarter wave antenna needs an axial length of 0.38 mm in the silicon substrate.

Free
Space
(εr=1)

Axial Length

r =3.9

10 µm

Di
s ta
An nce
ten be
na twe
2 µm
s ( en
SiO
d)
2 (ε

30°

Antenna

)

633 µm
Silicon
Substrate
(εr=11.7)

60 µ

m

Zig- Zag Antenna

Figure 3-2: (a) On-chip metal zig-zag antenna(reproduced from [3]) (b) On-chip antenna
placement on the die(reproduced from [23])

3.3. CDMA Based Wireless Interconnects
All the wireless transceivers operate in the same frequency channel and
hence an appropriate medium access mechanism is required to grant access of the
shared wireless medium to a particular transmitter. The adopted small-world
topology inserts long-range links in the NoC. However, long wireline interconnects
incur high energy dissipation and latency in data transfer. So as many long-range
links as possible is replaced with wireless interconnects. As the wireless links
29

connect distant cores on the chip, the wireless nodes have to be distributed over
long distances. Hence, it is difficult to have a centralized arbitration mechanism
which will grant access of the wireless medium to the transmitters. As this would
require laying out and transmitting signals over long wires connecting the arbiter to
the wireless transceivers.
In [3] a token passing protocol was developed to grant access of the shared
wireless medium to a single transmitter at any instant of time. This restricted the
communication over the wireless medium to only a particular pair. The token
passing scheme was enhanced by introducing FDMA with three non-overlapping
channels each with its own token in [25]. This effectively increased the number of
simultaneous accesses of the wireless medium to three. While the performance can
be improved with this methodology, increasing the number of channels to more
than three is a non-trivial challenge from the perspective of designing the
transceivers.
On the other hand, a distributed multiple access mechanism is used for the
wireless medium such that there can be simultaneous communication between
multiple pairs of source and destinations. In order to enable multiple simultaneous
accesses to the wireless medium a CDMA based MAC is used in the WiNoC. In this
case multiple source and destination pairs can access the wireless medium
simultaneously without any centralized control or arbitration. Using CDMA each
transmitter encodes its bits using a unique codeword consisting of multiple code
bits called chips, before transmission. Each code is orthogonal to the other codes
such that the crosscorrelation between different codewords is zero. This eliminates
30

the interference between transmissions from different wireless transceivers using
different codewords.

3.4. CDMA MAC protocol
As the longest links will be wireless, the NoC switches equipped with the
wireless transceivers will be spread over the chip and would require a distributed
and scalable mechanism to access the medium without collision and interference.
Thus a Direct Sequence Spread Spectrum (DSSS) CDMA based scheme is used to
establish multiple simultaneous code-channels between multiple wireless switches.
The Walsh codes are used to create orthogonal code-channels for multiple
access of the wireless medium. Walsh codes are commonly used in many CDMA
applications as they have a low spreading factor. Spreading factor can be defined as
the number of chips in a single codeword. As each bit in encoded into one of these
codewords the effective data transfer rate decreases by the spreading factor.
Consequently, the latency in data transfer over the wireless link increases by the
same factor. Hence, the Walsh codes with a low spreading factor have a lower
impact on bandwidth of the individual code-channels. The encoding can be
performed digitally by simply XORing the bit and the codeword. The result is then
modulated and mixed with the carrier using a Binary Phase Shift Keying (BPSK)
modulator [23]. Figure. 3-3(a) shows the CDMA based transmitter.
At the receiver, a demodulator comprising of a Low Noise Amplifier (LNA)
and a mixer [26] is combined with a low-power, high speed Analog to Digital
Converter (ADC) [27] and a CDMA decoder. To digitally decode the received CDMA
31

signal, orthogonal as well as balanced Walsh codes are required [28]. Orthogonal
codes ensure that in the ideal case when all the transmitters are synchronized such
that they send bits at exactly the same time, the correlation between different codechannels is zero and bits transmitted in other channels do not affect the received bit.

(a)

(b)
Figure 3-3: The adopted CDMA (a) encoder and (b) decoder

The correlator is digitally implemented by an accumulator [28], which either
adds or subtracts the received signal depending on whether that particular code
chip of the Walsh code is high or low respectively. Balanced codes have an equal
32

number of high and low valued chips. Consequently, the sign of the result can be
used to determine the transmitted bit. Figure. 3-3(b) shows the CDMA based
receiver.
This particular form of CDMA that is used in the WiNoC results in decreasing
the effective data transmission bandwidth per channel as each bit is encoded into a
codeword consisting of several chips before transmission. However, it is shown in
[4] that the same aggregate wireless bandwidth when distributed into multiple links
improves performance of the wireless NoC compared to a single link with high
bandwidth due to better connectivity of the network. The adopted Walsh codes have
as many orthogonal codewords as the number of chips. For instance, in a set of
Walsh codes with eight chips there are eight orthogonal codewords. However, only
seven out of them are balanced with equal number of high and low chips which is
required for the simple digital correlator in the CDMA receiver. This implies that
seven wireless channels can operate simultaneously.

3.5. Adaptive CDMA protocol
Several applications require multicast data transfer such as passing global
states, managing the network and implementing cache coherency. Therefore the
adopted CDMA protocol must be adaptive to various types of traffic namely unicast,
multicast or broadcast in the NoC. A collision-free T (transmitter)-protocol [14] is
adopted in which each transmitter encodes the data according to a unique code. At
the receiver the received signal is correlated with all the code words to decode the
transmissions from each transmitter. So the T-protocol can operate normally under
33

one-to-one (uniform random) or many-to-one (hotspot) type of traffic scenarios
where each receiver can receive data from multiple transmitters at the same time on
different code-channels specific to each transmitter. Also, the T-protocol naturally
supports one-to-many (multicast or broadcast) traffic conditions as each receiver
receives data from all the transmitters at the same time. The receivers check the
address information in the data to accept and route it farther if it is intended for the
particular wireless switch. The adopted scheme enables concurrent unicast and
multicast from different sources as well. Thus the CDMA MAC supports both unicast
and multicast traffic patterns. It is completely distributed and does not require
centralized control or arbitration circuitry, as any transmitter can encode and
transmit data independently of other transmitters.

3.6. Data Routing
The Wormhole routing policy is used in the NoC where data packets are
broken down into smaller flow control units (flits) such that a whole flit can be
transmitted over a NoC link together [29]. The small-world topology is essentially a
random network. The adopted routing policy should not introduce substantial
computational overheads and hence be distributed in nature. In addition it should
be deadlock and livelock free as well. In order to achieve this layered shortest path
routing (LASH) as proposed in [30] is used.
In LASH, shortest paths between source/destination pairs are separated into
multiple virtual layers if cyclic dependencies exist between them. Packets in a
particular virtual layer are routed along specific virtual channels reserved for that
34

layer. The shortest path between any source and destination is pre-computed offline
to eliminate the overheads of path computation for every packet. Each switch has a
routing table, which needs to contain only the identity of the next switch
corresponding to all possible destinations. When a header flit arrives at a particular
switch the next switch is determined based on this table and the final destination of
the packet. The header flit is then routed to the appropriate port along the particular
virtual channel reserved for its source/destination pair. Consequently, deadlock is
avoided in this routing scheme. The routing scheme is distributed, as only the next
switch is determined at each intermediate switch making the routing decision really
fast. Each routing table only contains the next switch in the path towards all possible
destinations. Hence, the memory requirement is proportional only linearly to the
system size. The routing paths being shortest paths also enable highly efficient data
transfer resulting in high data rates as shown in [30].

3.7. Performance Metrics
The experiments are carried out using a cycle accurate simulator
implementing the NoC architectures with 3-stage switches namely, input, output
arbitrations and routing [2]. The number of VCs in the CDMA-WiNoC switches
depends on the system size and the number of interconnects. As shown in [30]
irregular networks of size 64, 128 and 256 cores require 4, 6 and 9 layers for
deadlock-free routing. Each layer is considered to have a single VC reserved. The
mesh architecture is considered to have 4 VCs in each input and output port. Each
VC has a buffer depth of 2 flits. The CDMA receivers have an increased buffer depth

35

of 32 flits to accommodate simultaneous reception from multiple sources. A uniform
random spatial distribution of traffic is used for the all experiments. All the NoC
components are driven with a 2.5GHz clock. All simulations are performed for ten
thousand cycles allowing for transients to settle in the first few thousand cycles. If
the wireline links are long enough to take more than 1 clock cycle for transmission
of a flit they are pipelined by insertion of FIFO buffers such that between any two
stages it is possible to transfer an entire flit in 1 clock cycle. The on-chip zig-zag
antennas are able to provide a bandwidth of 16GHz around a center frequency of
60GHz [3] while the transceivers [23] are able to sustain a maximum data rate of
6Gbps. All the wireless switches are equipped with the same transceivers.
The Walsh codes result in spreading or widening the spectrum of the
transmitted bits by a spreading factor depending on the number of code chips. We
chose a Walsh code with 8 chips per code resulting is a spreading factor of 8. The
digital decoding technique adopted requires balanced code words where the
number of high and low chips are the same. The number of balanced orthogonal
Walsh codes with 8 chips is 7 [28]. Hence, we can have 7 wireless switches, each
with its unique code for transmission. We have considered a flit size of 32 bits and a
packet size of 64 flits.
The metrics for performance evaluation are maximum achievable bandwidth
and packet energy dissipation. Maximum achievable bandwidth is the peak
sustainable data rate in number of bits successfully routed per second. Bandwidth, B
can be determined as,

36

(2)
where, t is the maximum throughput in number of flits received per core per clock
cycle at network saturation,  is the number of bits in a flit, N is the number of cores
in the NoC and f is the clock frequency. The throughput is directly obtained from
system level simulations performed by the NoC simulator. The packet energy
dissipation, Epkt is the average energy dissipated in transmission of a packet from
source to destination over the NoC. It can be measured as,
∑

)

)

(3)

Where, Npkt is the number of packets routed in the NoC, Li is the latency of the
ith packet, hi is the number of hops in the path of the packet and Ebuf is the energy
dissipation of a flit in the NoC switch buffers. The energy dissipation of a wireline
hop is Ewire and  is the packet length in number of flits. Nsim is the duration of the
simulation and Ewireless is the energy dissipated by all the CDMA transceivers in the
CDMA-WiNoC in one cycle.
As can be seen in (3) the energy dissipation of all the wireless transceivers
for the entire duration of the simulation is considered as an overhead. In addition to
the output power of the transmitter obtained later, the power dissipation of the
CDMA codec, modulator, LNA, mixer and the ADC are also considered while
evaluating Ewireless. The design of a low-power, high-speed ADC with a 5-bit
resolution is proposed in [27]. As noted above the CDMA-WiNoC has 7 wireless
switches and hence, each switch can receive data concurrently from 6 transmitters.
Consequently, the maximum number of levels of the received signal at any wireless
37

switch can be 12 for NRZ data. Therefore, the ADC with 5-bit resolution is enough
for this scenario.
The power dissipation, speed and area overheads of all the components of
the CDMA transceivers are mentioned in table I. The designs of the
modulator/demodulator and the ADC in 65 nm CMOS technology are adopted from
[26] and [27] respectively. The energy dissipation, area overheads and timing
requirements of the NoC switches and the CDMA codecs are obtained from post
synthesis RTL design using 65nm standard cell libraries (http://cmp.imag.fr) using
SynopsysTM tool suites. The energy dissipation of the wireline links are obtained
from Cadence layout tools considering their actual dimensions obtained from
assuming a tile-based floorplan of the NoC on a 20mmx20mm die area.
Table 1: Characteristics of the Transceiver Modules
Module

Power Dissipation

Speed

Area

ADC

10.125 mW

12 GS/s

0.055mm2

Modulator/
Demodulator

9.332 mW

6 Gbps

0.34mm2

CDMA Codec

0.3762 mW

6 GHz

0.0002599mm2

3.8. Achievable Bandwidth of the CDMA-WiNoC
The peak achievable bandwidth of the CDMA-WiNoC at network saturation
using uniform random traffic for three different system sizes of 64, 128 and 256
cores are evaluated. In figure. 3-4 we show the peak achievable bandwidth at

38

network saturation for the conventional mesh, small world NoC (SWNoC) and
CDMA-Wireless NoC (WiNoC). The bandwidths are determined according to (2). It
can be seen that the small-world based topologies outperform the conventional
multi-hop wireline mesh significantly. This is because the small-world topology
scales well with increase in size, as the average distance between cores is
significantly less in comparison to regular multi-hop topologies like mesh. The
CDMA-WiNoC performs better than the wireline small-world NoC (SWNoC) due to
the high bandwidth long range CDMA wireless links. The SWNoC architecture is
formed by link insertion following (1) without replacing any long-range wireline
link with the wireless links.
14

mesh
SWNoC
CDMA-WiNoc

Bandwidth(Tbps)

12
10
8
6
4
2
0
64

128
System Size(# of cores)

256

Figure 3-4: Peak achievable bandwidth of mesh, SWNoC and CDMA-WiNoCs for 64,128
and 256 core systems

3.9. Packet Energy Dissipation
In this section the packet energy dissipation of the CDMA-WiNoC is
compared with that of the conventional mesh and wireline small-world NoC for
different system sizes. Figure. 3-5 shows the packet energy dissipation of the CDMA-

39

WiNoC and the other wireline architectures. The gains in packet energy dissipation
are significant and grow with increase in system size. In a regular multi-hop NoC
like the mesh the packet energy dissipation increases significantly with increase in
system size as packets have to travel over longer distances due to an increase in the
average distance between the cores.
The small-world architecture of the SWNoC and the CDMA-WiNoC are more
scalable as their average distances do not increase significantly. However, the long
wireline links in the SWNoC are very power hungry and result in considerable
energy dissipation. Due to their strategic placement, packets use the long range low
energy wireless links whenever they are traveling between distant cores. Bypassing
the multi-hop long distance wireline paths using the low energy wireless paths
reduce the packet energy dissipation significantly. Although the small-world
connectivity of the SWNoC improves its bandwidth, the data has to travel over long
range wireline links which consumes significant amount of energy limiting the gains
in packet energy dissipation of the SWNoC architecture compared to the CDMAWiNoC.
3000
Packet Energy(nJ)

2500

mesh

SWNoC

CDMA-WiNoC

2000
1500
1000
500
0
64

128
System Size (cores)

256

Figure 3-5: Packet energy dissipation of mesh, SWNoC and CDMA-WiNoCs for
64,128 and 256 core systems

40

In figure. 3-6 a detailed breakup of the various components of packet energy
dissipation for the various architectures considered for a 64 core system is shown.
In the SWNoC direct long-range wireline links reduce the energy dissipation in the
switches. However, there is significant energy dissipation in the long wires
Switch,
39.00%

Wire,
61.00%

Switch,
4.25%

Wire,
95.75%

(a)

(b)

Wire,
96.14%

Wireless,
0.79%
Switch,
3.07%

(c)

Figure 3-6: Energy breakup comparison for (a) Mesh (b) SWNoC and (c) CDMAWiNoC

increasing the proportion of the energy dissipation in the interconnects. In the
CDMA-WiNoC with uniform random traffic pattern 16.54% packets were routed
over the wireless links. However, these messages are not routed entirely over the
wireless links but also consume energy over wireline links and switches.

Packet Energy(nJ)

700

Packet Energy

4

Bandwidth

3.5

600

3

500

2.5

400

2

300

1.5

200

1

100

0.5

0

Bandwidth(Tbps)

800

0
SD-MAC

T-WiNoC

CDMA-WiNoC

CNT-WiNoC

Figure 3-7 : Peak Bandwidth and Packet Energy dissipation comparison of various
WiNoCs with 64 cores.

41

Additionally, wireless links dissipate significantly less energy compared to long
wireline links. Consequently, the contribution of the long-range wireless links to
packet energy is much less compared to the wireline counterparts. Higher
bandwidth of the CDMA-wireless links compared to wireline links of the same
length channelizes data through the low-energy wireless links and hence the energy
dissipation per packet is significantly less compared to the wired counterparts.
The figure 3-7 shows a comparative performance evaluation of CDMAWiNoC along with other wireless NoC’s like CNT-WiNoC and token passing WiNoC
(T-WiNoC) for a system size of 64 cores for uniform random traffic pattern. The
peak achievable bandwidth is the maximum for the CNT based WiNoC as the
wireless channels operate at a much higher frequency providing significantly higher
wireless bandwidth compared to the metal antenna based WiNoCs. However,
manufacturing CNT antennas for large-scale production is defect prone and may
result in high rates of failures. The metal antenna based architectures are therefore
readily CMOS manufacturing process compatible and are a more near-term solution
to the problem of soaring energy dissipation in data transfer over a NoC.
Among the metal antenna based architectures the CDMA-WiNoC performs
the best with the highest bandwidth. This is because the SDMAC architecture relies
on multi-hop wireless paths between cores and in the T-WiNoC only a single
wireless link whose transmitter possesses the token is active at any instant of time.
The long range concurrent CDMA based wireless links enhance the performance of
the NoC significantly. Due to higher bandwidth, the packet energy of the CDMAWiNoC is also the lowest.
42

3.10. Performance Evaluation with varying Packet Size
In this section we evaluate the performance of the CDMA-WiNoC by varying
the packet sizes. The performance of the CDMA-WiNoC will differ according the
packet size. Fig. 3-8 shows the performance of a 64 core CDMA-WiNoC for varying
packet size with uniform traffic. The bandwidth decreases as longer packets result
in the network links and buffers being occupied for longer by a particular packet.
The energy per packet increases with increase in packet size as more flits are being
routed over the NoC.

Bandwidth

Packet Energy(nJ)

300

3.6
3.5
3.4
3.3
3.2
3.1
3
2.9
2.8
2.7

250
200
150
100
50
0
32

64

Bandwidth (Tbps)

Packet Energy

128

Packet Size (flits)

Figure 3-8 : Packet Energy dissipation and Throughput comparison of
various packet sizes with 64 cores

3.11. Performance Evaluation with Non-Uniform Traffic
In this section we evaluate the performance of the CDMA-WiNoC in presence
of non-uniform traffic patterns. In Fig. 3-9 we present the normalized peak
achievable bandwidth and packet energy dissipation of the CDMA-WiNoC and the
wireline SWNoC with 64 cores for non-uniform traffic patterns. The peak bandwidth
43

is normalized with respect to the offered data rate as it varies across the various
traffic patterns. To simulate synthetic traffic we chose hotspot and transpose
distribution patterns. For hotspot traffic all the cores sent 20% of all packets they
generated to an arbitrarily chosen core. The 80% traffic would be uniformly
distributed among the remaining cores. In transpose traffic all cores only send
packets to cores that are diametrically opposite to itself on the die. For all these
traffic patterns the bandwidth and packet energy are evaluated at network
saturation. To model application based traffic, several real benchmarks are
considered. We use GEM5 [31] a full system simulator, to obtain detailed processor
and network-level information. We consider a system of 64 alpha cores running
Linux within the GEM5 platform for all experiments. The memory system is
MOESI_CMP_directory, setup with private 64KB L1 instruction and data caches and
a shared 64MB (1MB distributed per core) L2 cache. Three SPLASH-2 benchmarks,
FFT, RADIX, LU [32], and the PARSEC benchmark CANNEAL [33] are considered.
These benchmarks vary in characteristics from computation intensive to
communication intensive in nature and thus are of particular interest in this work.
The behavior and problem size of the benchmarks is shown in Table 2. One
advantage of the CDMA based WiNoC is its inherent suitability for multicast traffic.
Since, the antennas are not directional they transmit power to all the other
transceivers. Hence we also present results for multicast traffic patterns. For
simulating multicast traffic we consider one core injecting multicast traffic for 3
other cores. We have considered 50% of the traffic injected from that source to be

44

multicast as a case study while the other 50% is considered to be uniformly
distributed unicast traffic.

Packet Energy SWNoC

Packet Energy CDMA-WiNoC

Normalized Bandwidth CDMA-WiNoC

Normailzed Bandwidth SWNoC
1.2

Packet Energy(nJ)

700

1

600
0.8

500
400

0.6

300

0.4

200
0.2

100
0

Normalized Bandwidth

800

0

Taffic Patterns

Figure 3-9 : Bandwidth comparison of various non-uniform traffic in CDMA-WiNoC with 64
cores

The advantage of the CDMA-WiNoC over the wireline SWNoC in terms of
packet energy increases with skewed traffic patterns like the hotspot and transpose.
This is because the wireline network is not as efficient as the CDMA-WiNoC in
handling such highly skewed data at high injection loads greater than that causing
network saturation. Also, it is likely that the hotspot is close to a WI enabling
efficient data transfer to the hotspot. The absence of the wireless nodes in the
SWNoC degrades the packet energy significantly more than the CDMA-WiNoC. In
case of the application specific benchmarks the offered data injection rate is well
45

below that of the network saturation. Hence, both the SWNoC and CDMA-WiNoC
perform well under these scenarios. However, the packet energy is still between
13.5 to 43.3 % lower in the CDMA-WiNoC compared to the SWNoC. The maximum
gain is observed for CANNEAL as it is the most communication intensive benchmark
capturing the benefit of the energy-efficient wireless interconnects in the CDMAWiNoC the most.
Table 2: Percentage of busy and idle cycles in a 64-core system given
default problem sizes.
Benchmark

Busy
%

Idle %

Default Problem Size

FFT

81.99

18.01

RADIX

84.98

15.02

LU

87.62

12.38

65,536 Data Points
262,144 Integers, 1024
RADIX
512x512 Matrix, 16x16 Blocks

CANNEAL

56.74

43.26

200,000 Elements

For multicast traffic in the worst case the wireline SWNoC will perform just
like in the unicast case which is shown together with the uniform traffic case in fig.
3-9. This is because the multicast destination may not have any common portion of
their paths from the source. This will cause the multicast traffic to resemble the
unicast traffic. However, in the CDMA-WiNoC the wireless nodes are the multicast
source and destination, the data bandwidth can increase because the single packet
transmitted from the source will be received at multiple destinations. The energy
expenditure remains the same while multiple packets are being delivered over the
wireless channels hence reducing the energy dissipation per packet. However, as we
have considered a single multicast source sending multicast traffic only 50% of the
time the overall reduction in packet energy compared to the unicast scenario is only
46

5.5%. This will increase with increase in proportion of multicast traffic. It can be
observed from fig. 3-9 that the CDMA-WiNoC outperforms wireline SWNoC for all
the traffic patterns considered here.

3.12. Area Overheads
In this section we present the area overheads of the transceivers used for all
the wireless NoCs. Table 3 shows the area overheads required by the transceivers
for all the wireless NoCs considered in this work. The transceivers consisting of the
CDMA codec, the modulator/demodulator, the ADC and the wireless ports add area
overheads to the NoC switches with wireless capability. As mentioned earlier, the
area overheads of the NoC switches and the CDMA codecs are obtained from post
synthesis RTL design using 65nm standard cell libraries (http://cmp.imag.fr) using
SynopsysTM tool suites. The area overheads of the modulator/demodulator and ADC
in the CDMA-WiNoC are obtained from [26] and [27] respectively. The area
requirements of these individual modules are mentioned in table I where their
characteristics are shown. The transceiver area for the SD-MAC, T-WiNoC and CNTWiNoC are obtained from [9], [25] and [4]. As can be seen the CDMA-WiNoC has the
least area overhead among all the wireless NoCs except the CNT-WiNoC. This is
because the CNT based transceivers have very small area overhead as the
transceivers only require a few small electro-optic modulators and demodulators.
The CNT-WiNoC, T-WiNoC and the CDMA-WiNoC have a fixed number of WIs and
hence the overhead remains constant with size. However, in the SD-MAC
architecture each switch is equipped with a wireless transceiver resulting in an

47

increasing overhead with size. The CDMA-WiNoC uses the least number of WIs
compared to the other WiNoCs using the mm-wave technology and hence has the
least overhead.

Area of NoC Switches

Area of Wireless Tranceivers

60
Area (sq. mm)

50
40
30
20
10
0
CDMA WiNoC

SD-MAC

T-WiNoC

CNT-WiNoC

WiNoC Architectures

Figure 3-10: Area overheads of various WiNoCs with 64 cores.

Fig. 3-10 shows the area overhead of the various WiNoC architectures for a
system size of 64 cores. The total area overhead of the WiNoCs is due to the area of
the NoC switches and the wireless transceivers. The area of the NoC switches is
nearly same for all the architectures as that depends on the number of ports, which
is the same in all of them. Overheads of the wireless transceivers vary as noted in
table 3. As can be seen from table 3 and fig. 3-10 the wireless transceivers in the
CDMA-WiNoC occupy only about 5.52% of the entire die area of 400 mm2.

48

Table 3: Comparison of Area overhead of WiNoCs
System
Size

CDMAWiNoC
(mm2)

SD-MAC
(mm2)

T-WiNoC
(mm2)

CNTWiNoC
(mm2)

64

22.08

40.96

36.84

0.085

128

22.08

81.92

36.84

0.085

256

22.08

163.84

36.84

0.085

The multi-hop data transfer in traditional wireline NoC networks has high
latency and lower performance. As the system size scales with technology, the
performance of the wireline NoCs and wireless NoCs decreases. The CDMA enabled
wireless communication channels can improve the performance and dissipate lower
energy without compromising the efficiency in data transfer. The CDMA wireless
Network-on-chip architecture outperforms the other wireless NoCs with similar
transceivers and wired counterparts in packet energy and bandwidth. However, all
the transceivers communicate simultaneously through the wireless medium. Due to
the synchronization problem, there is interference between the transceivers which
in turn questions the reliability if the NoC architecture. In the next chapter the
reliability issues of this CDMA based wireless NoC are discussed in detail and
present an analytical model to evaluate the bit error rate (BER) of the wireless
channels.

49

Chapter 4. Reliability analysis of the CDMA WiNoC
Since the transceivers send data simultaneously, at the each receiver, the
wirelessly transmitted CDMA bits from different transmitters may arrive in or out of
synchronization. This is because data transmission happens in a distributed
manner, thus making impossible to have all transmitters synchronized with all
receivers. The ensuing difference in clock phases between the received CDMA
streams results in loss of orthogonality between the different Walsh codes. This loss
of orthogonality increases the interference at the receivers and the bit error rate
(BER). This section studies this effect, leading to the calculation of the worst case
BER in the proposed system. The analysis follows the work in [34] that studied
CDMA for standard radio communication devices. Here, we summarize the study for
completeness of presentation and adapted where necessary to the particularities of
CDMA applied to an on-chip scenario.

Figure 4-1: The CDMA channel model

4.1. Interference of a Matched Filter
In this section the interference caused due to the misalignment of the
incoming data of a simple matched filter is explained. Figure. 4-1 shows the channel
50

model for the CDMA system under consideration. Let us assume the th transmitter
is assigned a CDMA spreading code sequence
periodic with period

where

)

{

} of infinite duration but

is the duration of each code chip and

is the

duration of an information bit. For wireless transmission, each chip multiplies a
pulse shaping signal
)

∑

is implemented),
)

)

). Consequently, the th transmitter will emit a signal of the
)

form √

) resulting in the CDMA code waveform

)

), where

is the signal power (no power control

is the carrier angular frequency,

∑

is the th carrier phase and

) is the sequence (assumed of infinite duration) of

information bits

. Over the wireless channel, all the transmitted waveforms

overlap in time and frequency. Consequently, the received signal is
)
where,

∑√

)

)

)

)

)

)

) is the additive white Gaussian noise (AWGN) background noise,

number of transmitters and

is a delay associated with the

is the

th transmission,

which accounts for all possible timing mismatches between the transmitted and the
interfering signal due to variations in propagation delay. Figure. 4-2 shows two
transmitted bits that are misaligned due to the timing mismatch. Without loss of
generality, the symbol with
interest and the other

in (4) can be considered the transmission of
signals are the interference. At the receiver, after

correlating r(t) with the ith CDMA code, the received signal has three components:

51

the transmitted power, the interference power and the noise power. The
interference power, I2 can be computed as [34]:

)

∑{[

[

)]

))

)

))

)] }

)

CDMA code chips for bit bi , 0

k
Tc

lk  3

2Tc 3Tc 4Tc

(i )
(i )
a 0( i ) a1(i ) a2 a3( i ) a4 a5(i ) a6(i ) a7(i )

a4(k ) a5( k ) a6( k )

a7( k ) a0( k )

CDMA code chips for bit bk,1

time

a1(k ) a2(k ) a3( k ) a4(k ) a5( k ) a6( k )
CDMA code chips for bit

bk , 0

Tc k lkTc  (lk 1)Tc k

k lkTc

Figure 4-2: The effects of delay between two CDMA signals in misaligning the
spreading code sequences

where,

indicates how many integer chip durations the delay

such that,

)

∑

)

corresponds to

, and

)

)

)
∑

)

)

{
The Signal-to-Interference Ratio (SIR) can be, then, computed using (5). The
SIR is independent of the particular type of wireless transceivers used and solely
depends on the nature of the orthogonal codes used and the extent of their timing

52

misalignment. This interference power along with the thermal noise characteristics
of the receiver can be used to compute the received SNR.
The BER of the CDMA links is calculated using the transmitted power. The
transmitted power, Pt in dBm on the wireless channels is given by the following
equation.
(7)
Where, SNR is the signal to noise ratio at the receiver in dB, PL is the path loss in dB
and Nf is the receiver noise floor in dBm. We first estimate the SNR required to
achieve a reasonable BER. However, the BER on the CDMA links depends not only
on the SNR but also on the interference noise due to the misaligned, interfering
CDMA channels. The effective signal to interference and noise ratio, SINR, can be
computed in dB as,
(8)
(

)

The SIR is computed based on the model developed above. Figure. 4-3 shows
the plot of the SIR in dB due to all interfering transmitters as a function of the
difference in the distance between the intended and interfering transmitters from
the receiver. The longest possible wireless link on the proposed 64 core CDMAWiNoC architecture was chosen as the intended transmitter-receiver pair. All the
interferers were considered to be at the same distance from the receiver.
Interference power from these interfering transmitters was calculated for the entire
range of misalignment,

of the spreading codes using (5). It can be observed from
53

Figure. 4-3 that for realistic on-chip dimensions of less than 20mm, the SIR is always
more than 15dB. The actual worst-case SIR for the 64 core CDMA-WiNoC
architecture was calculated by considering the actual positions of the wireless
transceivers on the die. The worst case SIR by considering all the possible
interfering channels is 20dB.

Figure 4-3: SIR (dB) vs

A commensurate SNR budget of 20dB makes the SINR 17dB. An SINR of 17dB
results in a BER of less than

for the BPSK modulated scheme adopted here. A

BER of 10-15 is comparable to wireline data transfer in current technologies. Hence,
we consider an SNR of 20dB in our link-budget analysis. The path loss, PL for the
longest link can be obtained from fig. 3-2(b) as 26.5dB. The noise floor of the
receiver is given by,

54

(9)
Where, k is the Boltzmann constant, T is the temperature, B is the bandwidth of the
receiver and NF is the noise figure of the receiver in dB. The noise figure of the
receiver depends on the LNA and is given by,
)

(10)

Where, FLNA, Fmixer and GLNA are the noise figure of the LNA, mixer and the gain of the
LNA respectively. According to the design in [26] the value of NF is 6.3dB. This
makes the receiver noise floor

dBm at 50 degrees C. Consequently, the

output power of the transmitter is as low as

d m.

The receiver of the CDMA WiNoC receives from multiple transmitters at a
time. Due to the lack of synchronization there is a small time delay in reception of all
the data from multiple transmitters. Therefore, the orthogonality of the Walsh codes
is lost which in turn leads to interference at the receiver. The interference between
multiple receivers increases the BER. The BER depends on the link distances,
shorter links have lower BER and longer links have higher BER. The maximum
distance of the link in the proposed network is around 20 millimeters for which the
values of the BER and the respective SIR are acceptable. As the link distances
increases, the error due to misalignment will prove to be a major hindrance and a
problem to the network reliability. In the next section, an advanced transceiver
design is proposed which suppress the interference caused due to the
synchronization issues.

55

4.2. Interference Suppression using Advanced Decoder
Due to the difference in distances between various transmitters from
receivers and lack of synchronization between multiple transmitters there will be a
misalignment of the received data bits from different transceivers. Globally
synchronizing all wireless transmitters can be difficult and impractical for large
NoCs as they can be distributed over the entire die being separated by long
distances. Routing a single, precise clock to all the wireless transceivers is
impractical in such complex chips. The Walsh-Hadaamard codes are orthogonal only
when all the incoming bits are synchronized. A small time delay can cause errors
while decoding due to loss of orthogonality. Due to this reason the reliability of the
CDMA receiver is affected when the transmitters send data bits without precise
synchronization. A solution to this issue is to use an advanced transceiver which
suppresses the interference caused due to misalignment of the information.
In this work we propose the use of an advanced decoder, which performs
better in spite of the synchronization delays. The minimum variance distortion-less
response (MVDR) filtering is similar to a linear filter but minimizes the variance at
the output and also maintains distortion-less response towards a specific input [35].
Mathematically, if r is a random, zero mean, input vector of dimension L, r ≡ RL,
processed by an L-tap filter w ≡ RL, then the filter output variance is wHRw, where R
= E{rrH} is the input autocorrelation matrix, wH is the hermitian of w. The hermitian
of a matrix is the transpose conjugate of a complex vector. Since we use BPSK
modulation, all the channels are real. Therefore the elements of the matrix are real
numbers. Since they are real numbers the hermitian of this matrix is just the
56

transpose of the original matrix.. The MVDR filter minimizes wTRw and
simultaneously satisfies wTv = 1, where v is the input signal to be protected. The
MVDR filtering is a standard linear constraint optimization problem which when
solved yields [32]
(11)
where ρ* denotes the power received at the receiver considered to be 1 dB. The
denominator will turn out to be a positive scalar because the matrix R is a positive
semi-definite matrix of real numbers and the value of v is always non zero. The
matrix R can be calculated from the finite set of Walsh codes used in this WiNoC and
hence R-1 can be computed. The equation now resembles that of a simple matched
filter whereas in this case the filter output will be a weighted sum of R-1 due to the
other positive scalar components. This filter will reduce the problem due to
synchronization delays.
Let us assume k number of transceivers with the power of the kth transceiver
to be Ak and the ith bit of kth transceiver be denoted by bk[i]. The signal after being
passed through a matched filter is given by
[]

[]

∑
∑

[
[

]
]

∑
[]

[]

∑

[]
(12)

where j ranges from 0 to number of transceivers, Pkj and Pjk are the interference
caused due to misaligned signals and nk is the noise of the channel. The noise in the

57

channel is considered to be additive white Gaussian noise.

The interference

between the misaligned signals is given by
)
)

)

∫

)

)

∫

(13)

)

[

]

(14)

where T is the duration of one chip of the walsh code, τ is the overlap duration
between any two walsh codes, Sk and

Sl are the walsh codes for any two

transceivers. Considering all the possible conditions for eight transceivers and
rewriting equation (12) we get
[]

[ ]

[

]

[ ]

[]

[ ]

[

]

[]

(15)

where A is a k*k matrix of the power of all the transceivers, b[i] is the desired bit of
k transceivers at ith bit interval and n is the noise matrix. The R[0] and R[1] matrix
are cross correlation matrix of the walsh codes and is defined as

[ ]

{

(16)

[ ]

{

(17)

The above equation can be generalized as
[]
̂[ ]

[

[]
]

̂[ ]
[ ]

58

(18)
[

]

[]

(19)

where H = R[0]A. After obtaining the desired signal the MVDR filter is applied on
this equation. The bit index of the equation (18) is dropped and the filter is applied
which is given by
̂̂

̂
[

]
̂̂

[

̂) ]

̂
{
where

is the desired signal and

̂̂

}

̂)

(20)

(21)

(22)
(23)

̂) is the interference due to other

transceivers and the additive white Gaussian noise of the channel. The SINR is the
ratio of expectation of the desired bit along with the MVDR filter and the expectation
of interfering bits and noise along with the MVDR filter. The SINR for this system is
calculated following these equations. The values of SINR are discussed later in this
chapter.
The figure 4-4 shows the flow chart of the CDMA receiver with the MVDR
filter. The revceiver has a demodulator comprising of a LNA and a mixer which is
combined with a low-power, high speed ADC. The signal is demodulated and
converted to digital signals. The MVDR filter then iterates multiple times to obtain a
distortion less response with minimum variance. The desired signal is obtained
after multiple iterations. In case of CDMA walsh codes, a single iteration has proven

59

to be sufficient to suppress the interference. The SINR is calculated using a single
iteration of the MVDR filter.

Figure 4-4 : Flow diagram of CDMA decoder with MVDR filter.

Assuming a SNR budget of 20dB for our calculation, the SINR obtained after
using this filter is greater than the simple matched filter. Figure 4-5 shows the SIR vs
chip shifts and SINR vs chip shifts between the matched filter and the MVDR filter.
60

According to figure 4-5(a) and 4-5(b) the SIR and the SINR after using the MVDR
filter is clearly higher than the matched filter. Since the SINR is high, without
affecting the performance and the reliability of the network the transmitter power
can be reduced by the difference between the SINR.

Matched Filter

MVDR Filter

Matched Filter
120

20

100

15

80

SIR in dB

SINR in dB

25

MVDR Filter

10

60
40

5

20

0

0
1

2

3

4

5

6

7

1

Chip Shifts

2

3

4

5

6

7

Chip Shifts

(a)

(b)

Figure 4-5: (a) SIR(dB) vs Chip Shifts (b)SINR(dB) vs Chip Shifts

The difference in SINR between the simple matched filter and the MVDR
filter is found to be around 14dB from the above figure. The transmitted power can
be scaled down by 14dB which will reduce power consumed at the transceivers. Due
to lower power consumption the wireless packet energy reduces which in turn
reduces the overall packet energy without affecting the reliability of the network.
The total packet energy and transceiver energy with the MVDR filter are compared
with the total packet energy and transceiver energy with the matched filter as
shown in fig 4-6. The overall packet energy reduces by 7.47% from the baseline

61

matched filter. Furthermore, the transceiver energy reduces by 34% from matched
filter.

Transceiver Energy(pJ)

164
162
160
158
156
154
152
150
148
146
144

16
14
12
10
8
6
4
2

Transceiver Power (pW)

Packet Energy in (nJ)

Packet energy (nJ)

0
Matched Filter

MVDR Filter
Filter Type

Figure 4-6: Energy Comparison between matched filter and MVDR filter.

According to [32], the MVDR filtering is an iterative algorithm starting with a
simple matched filter and generates a sequence of filters that converges the
variance at the output. Repetitive iterations of the MVDR filter can result in energy
and timing overheads. However, in the case of walsh code based encoding, a single
iteration is sufficient to reduce the SIR by 14dB compared to a matched filter
receiver as seen in Fig. 4-5(a). The energy of the CDMA codec as shown in table 1 is
about 1.89% of the total transceiver energy. Therefore, for a single iteration the
energy consumption of the MVDR filter is not higher than that in a matched filter
receiver. Even if multiple iterations are performed to improve the reliability further
the energy overhead will not be significantly higher. The energy overhead for the

62

MVDR filter is considered in the above energy estimates. On the other hand, the
higher SIR enables us to reduce the transmitted power and hence save energy in the
transceiver. This reduction in total transceiver energy consumption is shown in fig.
4-6. The single iteration of the MVDR receiver does not require additional timing
overheads compared to the matched filter receiver. Consequently, the performance
of the CDMA WiNoC with the MVDR filter remains unaltered.

4.3. Advance Modulation Schemes
In this section the channel modulation schemes are discussed and the
importance of an efficient modulation scheme is highlighted. Channel modulation is
used to modulate the information before is it sent to the receiver over the wireless
medium. The data rate of the network depends on the modulation scheme, a higher
data rates for an efficient modulation scheme.
There are various modulation schemes, among which the most robust is
binary phase shift keying (BPSK). The bit rate in this scheme is one bit per cycle.
Hence this modulation scheme is limited as the transmission of large data takes a
longer duration. In other words, the latency of the entire system increases. In this
work we propose the use of various advanced modulation schemes to improve the
performance of this WiNoC.
There are many advanced modulation schemes used for channel modulation
which enhance the efficiency of the channel. Quadrature Phase Shift Keying (QPSK)
is an advanced channel modulation scheme in which two bits can transmitted
simultaneously in one cycle. In QPSK the data is separated into two channels called

63

in-phase (I) and quadrature phase (Q). These channels are modulated with two
carrier frequencies with a 90 degree phase shift between them and hence are
orthogonal to each other. They are summed together and transmitted. At the
receiver, the channels are separated using the same carrier frequencies. The
efficiency of this channel modulation scheme is twice as better as that of BPSK.
On a similar note, the 16-Quadrature amplitude modulation (16 - QAM) is a
higher order modulation scheme in which four bits can be transmitted
simultaneously in one cycle. 16 – QAM is similar to QPSK modulation scheme as in
QPSK each channel can take two phases, however, the 16 – QAM also has two
intermediate amplitude values along with the phases. These two bits are added to
each channel and then summed together and transmitted. The efficiency of this
channel modulation is twice as better as that of QPSK and four times better than that
of BPSK.
The achievable bandwidth of the CDMA based WiNoC with these advanced
modulation scheme is shown in fig. 4-7. The bandwidth does not increase
proportionally with different technologies as explained earlier. The use of these
advance modulation schemes improves only the efficiency of the wireless links in
the network. The efficiency of the wired links remains the same regardless of the
channel modulation scheme. Therefore, due to increase in the efficiency of wireless
links, there is an increase in bandwidth with the use of advance modulation
techniques. The 16-QAM has a bandwidth which is 8% greater than the original
modulation scheme and the QPSK has 5% higher bandwidth than the baseline
modulation scheme.
64

4.1
Bandwidth in Tbps

4
3.9
3.8
3.7
3.6
3.5
3.4
BPSK

QPSK

16-QAM

Modulation Schemes

Figure 4-7: Bandwidth and throughput of BPSK, QPSK and 16-QAM modulation
schemes.

The future has many challenges involved in the design of these advance
channel modulators. Even though the bandwidth increases with higher order
modulation schemes, the complexity of the modulators will also increase
proportionally. Furthermore, these advance channel modulators will require high
power and will be a primary cause for increased energy consumption. From figure
4-7 even though 16-QAM has higher bandwidth, the better channel modulation
scheme would be QPSK because the complexity and power requirement for 16-QAM
channel modulator is higher than that of QPSK modulator. Furthermore, the
difference in bandwidth between QPSK and 16-QAM is 2%. Therefore, the QPSK
modulation scheme will prove to be more efficient.

65

Chapter 5. Conclusion and future work
The multi hop data transfer in conventional wireline NoC result in very high
latency and lower performance. The CDMA based WiNoC proved to be desirable
solution to the appalling performance of the wireline NoCs. The CDMA based NoC
showed better bandwidth, packet energy and performance when compared with
other wireless counterparts.

5.1. Summary
The CDMA based WiNoC has proven to be efficient and reslilient while
compared to other wireline and wireless counterparts. The CDMA based WiNoC’s
bandwidth of 3.65 Tbps is double than that of a traditional mesh and 13% more
than that of a T-WiNoC. The packet energy of the CDMA-WiNoC is almost three times
lower than that of the mesh and 22% lower than T-WiNoC. The experimental result
showed that the CDMA based WiNoC was shown to scale well in terms of bandwidth
and packet energy dissipation while compared to other state of the art WiNoCs. The
worst case SIR for the CDMA based WiNoC was found to be about 20dB.
The receiver of the CDMA WiNoC receives from multiple transmitters at a
time. Due to the lack of synchronization there is a small time delay in reception of all
the data from multiple transmitters. Therefore, the orthogonality of the Walsh codes
is lost which in turn leads to interference at the receiver. In this work an advanced
filter is used to suppress the interference on the CDMA links. The MVDR filter
minimizes the variance at the output and maintains distortion-less response
towards a specific input vector. This has proved to increase the SINR by almost

66

14dB and hence reduced the transmitted power. The transceiver energy is reduced
by 34 % and the overall packet energy reduced by about 7% compared to the
matched filter. The use of this advanced filter has improved the CDMA WiNoC’s
reliability and performance.

5.2. Future Work
The future holds key developments which can improve the performance and
efficiency of the current CDMA based WiNoC. The field of higher order filters should
be investigated which can improve the resiliency of the current WiNoC. The design
and implementation of these filters can prove to be challenging. Furthermore,
developing an energy-efficient circuit implementations of the higher order
modulation schemes would lead to promising results improving the efficiency of the
network. Lastly, the investigation of high frequency oscillators and filters could help
in improving the performance of the network.

67

Bibliography
1. Moore’s law http://www.intel.com/content/www/us/en/silicon-innovations/mooreslaw-technology.html
2. P. Pande, C. Grecu, M. Jones, A. Ivanov, R. Saleh, “Performance Evaluation and
Design Trade-Offs for Network-on-Chip Interconnect Architectures”, August
,2005
3. S. Deb, A. Ganguly, K. Chang, P. Pande, . eizer, D. Heo, “Enhancing
Performance of Network-on-Chip Architectures with Millimeter-Wave
Wireless Interconnects”, Proc. of IEEE International Conference on
Application-specific Systems, Architectures and Processors (ASAP), 7–9 July,
2010
4. A. Ganguly, K. Chang, S. Deb, P. Pande, . elzer, C. Teuscher,“Scalable Hybrid
Wireless Network-on-Chip Architectures for Multi-Core Systems”, IEEE
Transactions on Computers (TC), vol. 60, issue 10, pp. 1485-1502
5. V. F. Pavlidis and E. G. Friedman, “3-D Topologies for Networks-on-Chip,”
IEEETransactions on Very Large Scale Integration (VLSI), Vol. 15, Issue 10,
October 2007, pp. 1081-1090.
6. M. Brière, B. Girodias, Y. Bouchebaba, G. Nicolescu École Polytechnique de
Montréal Montréal –Canada, F. Mieyeville, F. Gaffiot, I. O'Connor École
Centrale de Lyon Écully – France ,” System Level Assessment of an Optical
NoC in an MPSoC Platform”,2007

68

7. A. Shacham, Member, IEEE, K. Bergman, Senior Member, IEEE,and L. Carloni,
Member, IEEE,“Photonic Networks-on-Chip for Future Generations of Chip
Multiprocessors”, September 2008
8. M.F Chang, J. Cong, A. Kaplan, M. Naik, G. Reinman, E. Socher, S-W Tam,“CMP
Network-on-Chip Overlaid With Multi-Band RF-Interconnect,” Proc. of IEEE
International Symposium on High-Performance Computer Architecture
(HPCA), 16-20 February, 2008, pp. 191-202.
9. D. Zhao and Y. Wang, “SD-MAC: Design and Synthesis of A Hardware-Efficient
Collision-Free QoS-Aware MAC Protocol for Wireless Network-on-Chip,” IEEE
Transactions on Computers, vol. 57, no. 9, September 2008, pp. 1230-1245.
10. D. Zhao, Y. Wang, J. Li, T. Kikkawa, “Design of multi-channel wireless NoC to
improve on-chip communication capacity”, Proc. of the fifth ACM/IEEE
International Symposium on Networks-on-Chip, 1-4 May, 2011.
11. Zhao D and Wu R “Overlaid Mesh Topology Design and Deadlock Free
Routing in Wireless Network-on-Chip”, Proceedings of IEEE/ACM
International Symposium on Networks-on-Chips. 27-34 2012.
12. Lee S B, Tam S W, Pefkianakis I. Lu, S. Chang, M. F Guo, C. Reinman, G. Peng, C
Naik, M Zhang L, and Cong J “A Scalable Micro Wireless Interconnect
Structure for CMPs.”, Proceedings of ACM Annual International Conference
on Mobile Computing and Networking (MobiCom ’09). 20-25 2009.
13. Ditomaso D, Kodi A, Kaya S, Matolak, D “Iwise: Inter-router Wireless Scalable
Express Channels for Network-on-Chips (NoCs) Architecture”, Proceedings of

69

IEEE Annual Symposium on High Performance Interconnects (HOTI). 11-18
2011.
14. W. Lee and G. Sobelman, “Mesh-star Hybrid NoC architecture with CDMA
switch”, Proc. of IEEE International Symposium on Circuits and Systems
ISCAS 2009.
15. M. uchanan. “Nexus: Small Worlds and the Groundbreaking Theory of
Networks.” Norton, W. W. & Company, Inc, 2003.
16. C. Teuscher, “Nature-Inspired Interconnects for Self-Assembled Large-Scale
Network-on-Chip Designs,” Chaos, 17(2):026106, 2007.
17. A. Ganguly, P. Wettin, K. Chang, P. Pande, “Complex Network Inspired Faulttolerant NoC Architectures with Wireless Links”, Proc. Of IEEE/ACM
International Symposium on Networks-on-Chip (NOCS), Pittsburgh, May,
2011.
18. M. Fukuda, P.K Saha, N. Sasaki, T. Kikkawa ,”A 0.18 μm CMOS Impulse Radio
Based UWB Transmitter for Global Wireless Interconnections of 3D StackedChip System,” Proc. of International Conference Solid State Devices and
Materials, Sept. 2006, pp. 72-73.
19. G. W. Hanson, “On the Applicability of the Surface Impedance Integral
Equation for Optical and Near Infrared Copper Dipole Antennas,” IEEE
Transactions on Antennas and Propagation, vol. 54, no. 12, December 2006,
pp. 3677-3685.

70

20. P.J Burke, S. Li, Z. Yu, “Quantitative Theory of Nanowire and Nanotube
Antenna Performance,” IEEE Transactions on Nanotechnology, Vol. 5, No. 4,
July 2006, pp. 314-334.
21. K. Kempa, J. Rybczynski, Z. Huang, K. Gregorczyk, A. Vidan, B. Kimball, J.
Carlson, G. Benham, Y. Wang, A. Herczynski, Z.F. Ren,"Carbon Nanotubes as
Optical Antennae," Advanced Materials, vol. 19, 2007, pp. 421-426.
22. D. Park, S. Eachempati, R. Das, A.K Mishra, X. Yuan, N. Vijaykrishnan, C.R
Das,“MIRA: A Multi-layered On-Chip Interconnect Router Architecture,” IEEE
International Symposium on Computer Architecture, ISCA, 21-25 June 2008,
pp. 251-261.
23. A. Tomkins, A. R.A Aroca, T. Yamamoto, S.T Nicolson, Y. Doi, S.P Voinigescu, ,
“A Zero-IF 60 GHz 65 nm CMOS Transceiver With Direct BPSK Modulation
Demonstrating up to 6 Gb/s Data Rates Over a 2 m Wireless Link”, IEEE
Journal of Solid-State Circuits, Vol. 44, No. 8, August ,2009
24. Zhang y P, Chen Z. M, and Sun M, “Propagation Mechanisms of Radio Waves
Over Intra-Chip Channels with Integrated Antennas: Frequency-Domain
Measurements and Time-Domain Analysis.” IEEE Transactions on Antennas
and Propagation. 55, 10, 2900-2906. 2007.
25. Chang K, Deb S, Ganguly A, Yu X, Sah S. P, Pande P, Belzer B, Heo D,
“Performance Evaluation and Design Trade-offs for Wireless Network-onChip Architectures.” ACM Journal on Emerging Technologies in Computing
Systems (JETC). 8, 3, 23:1-23:25. 2012.

71

26. Shang Y, Cai D, Fei W, Yu H, and Ren J. “An 8mW Ultra Low Power 60GHz
Direct-conversion Receiver with 55dB Gain and 4.9dB Noise Figure in 65nm
CMOS.” IEEE International Symposium on Radio-Frequency Integration
Technology(RFIT). 47-49. 2012.
27. El Chammas M and Murmann . “A 12-GS/s 81-mW 5-bit Time-Interleaved
Flash ADC With Background Timing Skew Calibration.” IEEE Journal of Solid
State Circuits. 46, 4, 838-847. 2011.
28. Wang X, Tapani A, and Nurmi J. “Applying CDMA Technique to Network-onChip.” IEEE Transactions on VLSI. 15, 10, 1091-1100. 2007.
29. Duato J, Ylamanchili S, and Ni l. “Interconnection Networks-An Engineering
Approach. Morgan Kaufmann.” 2002.
30. Lysne O, Skeie T, Reinemo S, and Theiss I, “Layered routing in irregular
networks.” IEEE Trans. on Parallel and Distributed Systems, vol. 17, no. 1,
2006. 51-65. 2006.
31. Binkert N, Beckman B, Black G, Reinhardt S.K, Saidi A, Basu A, Hestness J,
Hower D.R, Krishna T, Sardashti S, Sen R, Sewell K, Shoaib M, Vaish N, Hill
M.D, and Wood D.A. “The GEM5 Simulator”, ACM SIGARCH Computer
Architecture News, 39(2), 2011, 1-7
32. Woo S.C, Ohara M, Torrie E, Singh J.P, and Gupta A. “The splash-2 programs:
characterization and methodological considerations.” Proceedings of ISCA,
24-36 1995.
33. Bienia C. “Benchmarking modern multiprocessors.” Ph.d. Dissertation,
Princeton Univ, Princeton NJ.
72

34. M. Pursley, "Performance Evaluation for Phase-Coded Spread-Spectrum
Multiple-Access Communication--Part I: System Analysis," Communications,
IEEE Transactions on , vol.25, no.8, pp. 795- 799, August, 1977.
35. D. A Pados, G N Karystinos “An Iterative Algorithm for the Computation of the
MVDR Filter” IEEE Transcations on Signal Processing, Vol. 49, No. 2, 2001

73

