Engineer the Channel and Adapt to it: Enabling Wireless Intra-Chip
  Communication by Timoneda, Xavier et al.
Engineer the Channel and Adapt to it:
Enabling Wireless Intra-Chip Communication
Xavier Timoneda§, Sergi Abadal§, Antonio Franques¶, Dionysios Manessis∗,
Jin Zhou‡, Josep Torrellas¶, Eduard Alarco´n§, Albert Cabellos-Aparicio§
§NaNoNetworking Center in Catalunya (N3Cat), Universitat Polite`cnica de Catalunya (UPC), Barcelona, Spain
‡Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign (UIUC), Illinois, USA
¶Department of Computer Science, University of Illinois at Urbana-Champaign (UIUC), Illinois, USA
∗System Integration & Interconnection Technologies, Fraunhofer Institute for Reliability and Microintegration (IZM), Berlin, Germany
Email: xavier.timoneda@upc.edu
Abstract—Ubiquitous multicore processors nowadays rely on
an integrated packet-switched network for cores to exchange and
share data. The performance of these intra-chip networks is a
key determinant of the processor speed and, at high core counts,
becomes an important bottleneck due to scalability issues. To
address this, several works propose the use of mm-wave wireless
interconnects for intra-chip communication and demonstrate
that, thanks to their low-latency broadcast and system-level
flexibility, this new paradigm could break the scalability barriers
of current multicore architectures. However, these same works
assume 10+ Gb/s speeds and efficiencies close to 1 pJ/bit without
a proper understanding on the wireless intra-chip channel. This
paper first demonstrates that such assumptions are far from
realistic by evaluating losses and dispersion in commercial chips.
Then, we leverage the system’s monolithic nature to engineer the
channel, this is, to optimize its frequency response by carefully
choosing the chip package dimensions. Finally, we exploit the
static nature of the channel to adapt to it, pushing efficiency-speed
limits with simple tweaks at the physical layer. Our methods
reduce losses by 47 dB and dispersion by 7.3×, enabling intra-
chip wireless communications over 10 Gb/s and only 1.9 dB away
from the dispersion-free case.
I. INTRODUCTION
Multicore processors are present in virtually every comput-
ing domain nowadays. They integrate a number of processor
cores within the same chip and, in the past few years, manufac-
turers have been consistently increasing the core count seeking
higher execution speeds. However, in order to translate this po-
tential into effective performance, the on-chip communication
problem must be solved: cores need an integrated interconnect
to exchange or share data and, for densely populated chips,
traditional interconnects are burdensome and slow down the
processor. Communication, not computation, thus becomes the
main performance bottleneck in multicore systems [1].
In the past, most chips did not contain more than a handful
of cores and on-chip communication was easily performed
through a bus. Since buses do not scale well with the number
of stations, a completely different approach was soon required.
The adopted solution, called Network-on-Chip (NoC), consists
of a packet-switched network of routers that are co-integrated
with the cores as represented in Figure 1. Since then, NoCs
have been widely applied not only in research works [2]–
[5], but also in commercial chips such as Tilera’s TILE-GX
[6] or Intel’s Xeon Phi [7]. Nevertheless, with the arrival of
extreme scaling and massive multicore architectures, standard
NoCs start to show performance and efficiency issues [8]. New
paradigms are thus required in the manycore era.
The scalability problems of NoCs are mainly the network
diameter and overprovisioning. As further elaborated in Sec.
II, these cause the communication latency and power to
increase, especially for chip-wide transactions. Therefore, any
new candidate to improve existing NoCs should address them
and, among a few alternatives [9], Wireless Network-on-Chip
(WNoC) shows great promise in this regard. In short, WNoC
basically consists in overlaying a set of wireless intra-chip
links over a backbone wired NoC. This reduces the latency
of chip-wide transfers, including broadcasts, by virtue of the
omnidirectional speed-of-light propagation of radio waves, and
also combats overprovisioning thanks to its global reconfigura-
bility [10]. As shown in the literature, these unique features
become key enablers of new multicore architectures capable
of pushing current scalability limits [11]–[13].
The WNoC paradigm builds on the foundations of
widespread millimeter-wave (mm-wave) technology. A wide
variety of on-chip antennas is already available [14]–[16] and
wireless intra-chip communication with such antennas has
been experimentally confirmed in multiple works [17]–[19].
Additionally, 60/90 GHz integrated transceivers specifically
designed for WNoC have been tested [20]–[23]. On top of
this, a great variety of works have evaluated new topologies
and routing protocols [24]–[29] in an attempt to exploit the
potential of WNoC at the network level.
The main caveat of the majority of WNoC research is
that it lays on incorrect channel models. Many works [18],
[30]–[35] either neglect the influence of the chip package,
which introduce losses and dispersion, or directly neglect
dispersion whatsoever. This does not invalidate the potential
of the WNoC paradigm, but leads to erroneous assumptions
on the achievable speed and power. For instance, many WNoC
architectures assume rates over 10 Gb/s [12], [27], [28], which
may not be achievable due to multipath effects. Other works
obtain power consumption estimates by assuming path losses
between 25 and 30 dB [36]–[39], values that are far from the
true in standard chip packages.
ar
X
iv
:1
90
1.
04
29
1v
1 
 [c
s.N
I] 
 23
 D
ec
 20
18
. . .
. . .
. . .
. . .
. . .
. . .
. . .
Core
Router
Wireless
Interface Antenna
Next nodes
Memory
Hierarchy Controller
Fig. 1. Sketch of a Wireless Network-on-Chip architecture.
This paper aims to fill this gap and restate the potential
of WNoC by proposing, as the main contribution, a novel
co-design methodology that (i) properly characterizes the
wireless intra-chip channel, and (ii) identifies and exploits its
uniquenesses. It can be summarized in three pillars:
• Channel characterization: we study of propagation
within a realistic computing package, which has been of-
ten overlooked. Frequency and time domain analyses are
performed to extract attenuation and dispersion scaling
trends. With this, we prove that the assumptions made in
most WNoC works are overly optimistic, and that path
loss and delay spread often follow contradicting trends.
• Channel engineering: the intra-chip channel is unique
in that it can be engineered. Therefore, we propose an
optimization scheme that explores the package design
space to jointly minimize attenuation and dispersion. We
reduce them by 30 dB and 3.52× together, or by 47 dB
and 7.32× in separated extreme cases.
• Static transceiver optimization: the intra-chip channel
is also unique in that it is quasi-deterministic. Based
on this, we propose to combat dispersion by predicting
the multipath effects and adapting the transceiver back-
end to them. We easily accommodate 10 Gb/s and reach
beyond the coherence bandwidth limit, figures that would
be unattainable with conventional coding.
Although the static and monolithic nature of the WNoC
scenario were already discussed in [24], [40], this is the first
work that, to the best of the authors’ knowledge, systematically
exploits the unique traits of the wireless intra-chip channel.
The proposed methodology allows to operate at 10–20 Gb/s
with 1–2 pJ/bit, figures that are widely assumed in the lit-
erature but that would be otherwise unattainable. It is worth
noting that very few other scenarios, if any, allow to engineer
the channel to enhance propagation.
The remainder of this paper is organized as follows. Sec.
II provides some background. Sec. III details the proposed
methodology, which is then evaluated in Sec. IV. Finally, Sec.
V discusses the results and Sec. VI concludes the paper.
II. BACKGROUND
Network-on-Chip: NoCs generally implement a 2-D mesh
topology wherein every router is connected to a core and to its
four neighbors (Fig. 2). The choice is driven by the regularity
Fig. 2. Wireless propagation within a flip-chip package (top) and typical
cross-section (bottom).
of the topology and the short path lengths, which simplifies
the routers and the links. Topologies requiring long links are in
fact discouraged as their energy and delay scale exponentially
with length and technology [41]. Short links, however, come
at the cost of a network diameter that scales as 2(k − 1) in
a k × k mesh. Thus, 64-core chips, which are commercially
available [6], [7], have a network diameter of 14 hops with
a chip-wide latency of several tens of nanoseconds without
contention. This delay would be incurred by global packets
or, even worse, broadcasts that also increase contention as
they flood the mesh. Alternatively, carefully designed WNoCs
can reduce this delay to a few nanoseconds regardless of the
location of data and number of destinations. This difference
in performance is crucial because communications are often
on the critical path of the program and any added delay can
slow down execution [11].
Wireless Network-on-Chip: WNoC broadly refers to the
implementation of wireless intra-chip links on top of a wired
NoC. A packet arriving to a wireless interface is serialized,
modulated and radiated by the antenna with a given pattern
as we show in Figure 2. Radio waves propagate through the
package at nearly the speed of light until reaching the intended
destinations, where they are demodulated and deserialized.
Since intermediate router hops are avoided, WNoC reduces the
latency of global and broadcast communications by an order
of magnitude as outlined above. On the downside, wireless
bandwidth is limited and needs to be shared among the cores.
The physical layer of WNoC adapts to chip resource
constraints. The use mm-wave bands allows antennas to be
commensurate with cores, whereas simple modulations such
as On-Off Keying (OOK) are adopted to avoid bulky or
power-hungry components at the transceiver. With such low
order modulations, high symbol rates are needed to reach
the 10+ Gb/s speeds expected for WNoC. This, together
with the stringent Bit Error Rate (BER) requirements of the
scenario (10-15 to be comparable to that of a wire), makes
signals particularly vulnerable to Inter-Symbol Interference
(ISI). Fortunately, multipath effects can be mitigated through
package–transceiver co-design, as we propose in this work.
In upper layers, research revolves around developing
Medium Access Control (MAC) protocols, routing algorithms,
and new topologies that make the most of the WNoC potential.
Since this paper focuses on the channel and the physical layer,
we refer the reader to the vast literature for more details [10],
[12], [26].
Chip Structure and Antenna Placement: The typical
cross-section of a standard chip consists of a metal stack with
5–10 layers, separated by an insulator and placed over a lossy
silicon substrate [14]. Chips are then generally covered by
package that provides mechanical support and facilitates its
interfacing with the rest of components. Flip-chip packages,
wherein the chip is flipped over and connected to the PCB
board through solder bumps, are currently widespread and
preferred over wired bonding. As shown in Figure 2, the chip
ends up surrounded by (i) a metallic heat sink contacted by a
heat spreader and (ii) the package carrier, with several metal
layers on top the PCB.
The flip-chip package does not leave much space for the
antennas. Due to the presence of solder bumps, antennas
cannot be implemented in the first metal layer anymore [42].
Alternatively, designers have to use the metal layers closer
to the silicon or, as proposed recently, drill Through-Silicon
Via (TSV) to implement vertical monopoles [43]. Note that,
since most WNoC research does not consider a chip package,
antenna placement is either incorrect or simply not discussed.
Chip-scale Channel Characterization: At the chip scale,
there are two propagation aspects worth considering. First,
the low resistivity silicon used to facilitate transistor operation
introduces significant losses and, therefore, shall be avoided
[14]. Second, materials used as heat spreader like Aluminum
Nitride (AIN) introduce low electrical losses and, thus, would
enhance propagation [30]. This opens interesting perspectives
to the manufacturer, which can now take chip design decisions
based on the potential for wireless intra-chip communication.
Being enclosed in a metallic package, electromagnetic prop-
agation is confined within the limits of the package. Such
field confinement has positive implications on security as
eavesdropping or jamming are physically avoided, but also
leads to strong multipath effects. This has been formulated
by Matolak et al. through micro-reverberation theory [40],
yet without detailing the package structure. In fact, very
few studies include the chip package in their simulations
or measurements and, those that do it, are limited to low
frequencies or lack proper justifications on the antenna type
and placement [42], [44], [45]. Others simply assume free
space over the insulator layer [18], [31]–[34]. To find similar
results, we need to refer to works at the data center cabinet
scale [46], or at the motherboard scale in desktops or laptops
[19], [47], [48], which have structural resemblances. However,
their results are not directly applicable to the chip scale due to
substantial differences in dimensions, materials, and antenna
placement restrictions.
Remind that, without proper understanding of the wireless
channel within the package, the impact of the wireless chip-
scale paradigm cannot be really assessed. In next section, we
proposed a methodology to bridge this gap.
Optimizer
Characterize the package Adapt to the channel
Engineer the channel
Full-wave 
Solver
Transceiver 
Design
Weight   w
h, H
Materials
Floorplan
Thicknesses
Ts (silicon)
Th (heat spr.)
Frequency fc Path loss PL
Delay spread DS
Data rate rb
Error rate BER
Fig. 3. Proposed methodology: characterize the package, engineer the
channel, and adapt to it.
III. SYSTEM DESIGN
Our methodology provides a way to systematically co-
design the chip package and the transceiver exploiting the
static and monolithic nature of the system. This way, the
methodology (i) validates the WNoC concept, (ii) increases the
achievable data rate, and (iii) reduces the power consumed by
the transceiver circuitry. Here, we first overview our proposal
and then detail its design.
A. System Overview
The wireless intra-chip channel is largely unknown and
prevents architects from assessing the true potential of WNoC.
The proposed methodology, summarized in Figure 3, solves
the problem in three steps.
First, a comprehensive characterization of the wireless chan-
nel within a chip package is performed. Through accurate
modeling and full-wave solving, we obtain the response of
the channel as a function of the frequency band and the
dimensions of the package, which are parameters that chip
makers can modify at design time. As further elaborated in
Section III-B, the results are then processed to evaluate path
loss and dispersion.
The next step in the methodology is referred to as channel
engineering and is uniquely suited to this monolithic sys-
tem. Its main goal is to find the combination of package
dimensions and frequency band that jointly minimizes path
loss and dispersion. To this end, we define a figure of merit
that takes both aspects into account with adjustable weights,
allowing manufacturers to model the importance of power and
performance in the system. This figure of merit drives an
optimizer that, thanks to heuristics derived from the previous
characterization process, navigates through the package design
tradeoffs efficiently. More details are given in Section III-C.
Once we have found the best package and frequency band
for our purposes, we optimize the transceiver by leveraging the
static nature of the channel. As shown in Section III-D, simple
but effective modifications are carried out at both sides of the
communication: the transmitter uses Return-to-Zero (RZ) to
mitigate the ISI level, whereas the receiver uses a small and
fixed set of decision thresholds to decode the current symbol
based on previous bits. Both modifications are static and allow
pushing the data rate beyond the theoretical ISI limits.
B. Channel Characterization
Simulation Setup. The structure shown in Figure 1 is
introduced in a full-wave solver. To reduce the computational
burden, the bump array is approximated as a solid metallic
element. This assumption is driven by the small pitch of the
array compared to the excitation wavelength (< 0.1 mm and
∼1 mm, respectively) and validated through simulation. The
antenna used for the simulations is a broadband omnidirec-
tional aperture, which allows to focus on the channel effects.
Unless noted, we consider a homogeneous distribution of 4×4
antennas within a 20×20 mm2 chip and a central frequency
of 60 GHz.
Frequency Domain Analysis. The full-wave solver allows
to obtain the field distribution, the antenna gain, and the
coupling between antennas in the frequency domain. Then,
the channel frequency response Hij(f) is evaluated for each
antenna pair as
GiGj |Hij(f)|2 = |Sji(f)|
2
(1− |Sii(f)|2) · (1− |Sjj(f)|2) , (1)
where Gi and Gj are the transmitter and receiver antenna
gains, Sji is the coupling between transmitter i and receiver
j, whereas Sii and Sjj are the reflection coefficients at both
ends [49]. Once the whole matrix of frequency responses H
is obtained, a path loss analysis can be performed by fitting
the attenuation L over distance d to
L = 10n · log10(d/d0) + L0, (2)
where L0 is the path loss at the reference distance d0 and n is
the path loss exponent [18]. The path loss exponent is around
2 in free space, below 2 in guided or enclosed structures, and
above 2 in lossy environments. Since losses at the channel are
crucial to determine the power consumption at the transceiver
(see Section V) we will report improvements in terms of worst-
case Lmax, average Lavg , and path loss exponent n.
Time Domain Analysis. In the time domain, the EM solver
allows to define an input excitation x(t) at the input of the
transmitting antenna. We obtain the output signal y(t) at the
antennas, including the transmitting one, so that the impulse
response hij(t) between transmitter i and receiver j can be
derived with the classical formulation
yj(t) = xi(t) ? hij(t), (3)
where ? denotes the convolution operator. Once calculated, it
is straightforward to evaluate the Power Delay Profile (PDP)
as
Pij(τ) = |hij(t, τ)|2, (4)
therefore obtaining a matrix of PDP functions P. To character-
ize the multipath richness of the channel, we obtain the delay
spread τrms of each PDP as
τ (i,j)rms =
√∫
(τ − τij)2Pij(τ) dτ∫
Pij(τ) dτ
, (5)
where τij =
∫
τPij(τ)dτ∫
Pij(τ) dτ
is the mean delay of the channel.
In this work we will assume that all wireless channels are
broadcast and, therefore, they should be operated at the lowest
speed ensuring correct decoding at all nodes. As a result, we
will take the worst delay spread as limiting case and use it to
evaluate the coherence bandwidth Bc, as follows
τrms = max
i,j 6=i
τ (i,j)rms ⇒ Bc =
1
τrms
. (6)
C. Channel Engineering
Our methodology defines a figure of merit or fitness function
φw that we will attempt to maximize. Since the aim is to
mitigate the path loss and the delay spread, the fitness function
takes the form
φw =
1
PLwDS(1−w)
(7)
where PL is the loss metric, DS is the delay spread metric,
and w ∈ [0, 1] models the importance of power or speed in
different designs. Small values will be used if the architect
wants to optimize speed over power for high performance
devices, whereas large values imply minimization of the path
loss for low-power computers. In this paper, our metrics are
PL = n and DS = τrms as defined in Section III-B.
The package engineering process considers three variables
that can be modified at design time: the silicon thickness Ts,
the heat spreader thickness Th, and the central frequency fc.
Then, this can be treated as an optimization problem
max
Ts,Th,fc
φw (8)
where, for a given w, we find the Ts, Th, and fc values
that maximize the fitness function within the bounds given
by the manufacturer or the architect. Note that, although
we consider three key design parameters in this work, the
optimization can be extended to any other design decision such
as antenna placement, lateral chip dimensions, or additional
material choices.
To solve the optimization problem, it is first worth noting
that the full-wave simulations required to obtain φw for each
{Ts, Th, fc} combination are very computationally intensive,
especially as fc increases, which renders exhaustive searching
impractical. Also, path loss and dispersion are related to
{Ts, Th, fc} in non-monotonic ways and often showing op-
posed trends. This creates local peaks in the φw function, thus
discarding methods such as the gradient-based hill climbing,
which tends to get stuck into local maxima. An alternative
would be Simulated Annealing (SA), which uses a probabilis-
tic method to avoid local peaks and progressively approach a
global optimum. SA has been used in other electromagnetic
problems [50], [51] and is widely known so, for the sake of
brevity, we will not detail its implementation. We just note that
the results of the channel characterization can help deriving
the appropriate heuristics (e.g., candidate generation, cooling
schedule) for SA to converge fast to the global optima.
D. Static Transceiver Optimization
Once the channel is engineered to minimize path loss and
delay spread, we leverage the static nature of the channel to
VCO
1
𝑇
න
0
𝑇
⋅ 2
K
D
es
er
ia
liz
er
Selector
S
er
ia
liz
er
RZ 
Coder
PA LNA
From
TX Core
To RX 
CoreChannel
(engineered, 
time-invariant)
TRZ < Tb
W bits W bits
|h(t)|
Fig. 4. Physical layer in a wireless intra-chip link with OOK and non-coherent detection. Shaded blocks identify the improvements proposed in this work.
perform simple yet effective optimizations in the RF back-end.
The idea is to apply faster-than-Nyquist signaling [52] to push
the symbol rates while resorting to the known, deterministic
channel response to keep complexity at a minimum.
Figure 4 shows the block diagram of a typical wireless intra-
chip link. As pointed out in Section II, OOK modulation is
generally considered. Assuming a bit-energy of Eb = Prx/rb,
where Prx is the received power and rb is the symbol rate,
the BER of OOK is bounded by
BEROOK =
1
2
erfc
(√
Eb
4N0
)
(9)
where erfc is the complementary error function and Eb/N0
is the signal-to-noise ratio. This bound assumes coherence
detection with optimal threshold calculation and no ISI. In
our case, however, ISI manifests when pushing the data rate
beyond the Nyquist rate. To mitigate its effects, we propose
two techniques: threshold adaptation and RZ modulation.
Threshold adaptation: The main issue in conventional
wireless environments is that multipath is space- and time-
dependent. Therefore, its impact on the Euclidean distance
between the OOK symbols and on the optimal decision
threshold cannot be predicted. In the worst case, ISI is modeled
as added noise, reducing the noise margin and leading to an
approximate BER of
BERisiOOK ≈
1
2
erfc
(√
Eb
4(N0 + I)
)
< BEROOK (10)
where I is the interference energy.
In WNoC, the channel is time-invariant and we can calculate
the exact position of each symbol at all times. This means that
we can find the Euclidean distance between symbols and the
optimal decision threshold for any combination of previous
symbols even in the presence of ISI. This information can be
used to design a receiver composed by K parallel deciders,
each with its own threshold, and a register that selects the
appropriate leg. Assuming that with K deciders we address
all ISI effects, we can approximate the BER as
BERadapOOK ≈
1
K
K∑
k=1
1
2
erfc
(√
αkEb
4N0
)
(11)
where αk models the effect of a given past symbol combina-
tion to the Euclidean distance between current symbols. The
number of required deciders scales as K ∼ τrms/Tb where
Tb = 1/rb is the symbol period assuming a binary modulation.
In any case, the associated overheads are small compared to
the cost of the RF front-end.
Return to zero: a classical way to mitigate ISI effects
is by using RZ techniques, which reduce the length of the
symbol through duty cycling. One the one hand, this shortens
the length of the current symbol as seen by the receiver,
which implies lower spillage into the next symbols. On the
other hand, the lower ISI comes at the cost of a drop in the
received energy, which may offset the gains of reduced ISI
if RZ is not designed properly. However, since the channel
is time-invariant, we can infer the duty cycle that maximizes
the signal-to-interference ratio and, thus, minimizes the BER
for any symbol combination. In Equation (11), this would be
equivalent to increasing αk for all k.
IV. EVALUATION
The three pillars of the proposed methodology are evaluated
separately. Section IV-A discusses channel scaling trends, Sec-
tion IV-B shows the gains of the channel engineering process,
and Section IV-C illustrates the transceiver improvements.
A. Channel Characterization
Here, we quantify the impact of the silicon thickness Ts,
the heat spreader thickness Th and the central frequency fc on
the path loss and delay spread. Unless noted, we take fc = 60
GHz and the dimensions of a standard chip (Ts = 0.7 mm
and Th = 0.2 mm) as default values.
Figure 5(a) shows the scaling trends with respect to silicon.
This layer is highly lossy, as mentioned in Sec. II, and we
observe that the benefits of thinning it down are significant. A
100-µm chip has a maximum path loss of Lmax = −36.29 dB
and a maximum delay spread of τrms = 0.19 ns. Compared to
a standard chip, the thinned alternative is 2.1× better in terms
of path loss (39 dB difference) and 2.73× better in terms of
delay spread (0.33 ns difference). Additionally, the path loss
exponent is reduced from n = 4.32 to n = 1.32, confirming
the transition from a lossy environment (n > 2) to a guided
medium (n < 2). The performance also scales better in terms
of delay spread, reducing the slope from 25.05 to 5.83 ps/mm.
Figure 5(b) repeats the analysis by varying the heat spreader
thickness Th. Given its low electrical losses, this layer can aid
propagation and its inclusion is thereby highly recommended.
The delay spread improves up to 3× (from 0.6 to 0.2 ns)
due to the presence of a stronger reflection cluster coming
from the heat spreader. As for the path loss, the case here
presented shows a limited impact in terms of path loss (∼10
5 10 15 20 25
Distance [mm]
0
.2
.4
.6
D
el
ay
 S
pr
ea
d 
[ns
]
5 10 15 20 2515
30
45
60
75
Pa
th
 lo
ss
 [d
B]
0.1
mm
0.3
mm
0.5
mm
0.7
mm
0.2 0.4 0.6
Silicon thickness [mm]
0
0.2
0.4
0.6
0.8
M
ax
 D
el
ay
 S
pr
ea
d 
[ns
]
1
2
3
4
5
Pa
th
 lo
ss
 e
xp
on
en
t
(a) Scaling with silicon thickness Ts (Th = 0.2 mm)
5 10 15 20 2520
40
60
80
Pa
th
 lo
ss
 [d
B]
No
AIN
0.4
mm
0.8
mm
5 10 15 20 25
Distance [mm]
0
.2
.4
.6
.8
D
el
ay
 S
pr
ea
d 
[ns
]
0.2 0.4 0.6
AIN thickness [mm]
0
0.2
0.4
0.6
0.8
M
ax
 D
el
ay
 S
pr
ea
d 
[ns
]
4
5
6
Pa
th
 lo
ss
 e
xp
on
en
t
(b) Scaling with heat spreader thickness Th (Ts = 0.7 mm)
Fig. 5. Left: path loss and delay spread over distance. Right: resulting scaling trends of the maximum delay spread τrms and the path loss exponent n.
60 12080 100 
Frequency [GHz]
0
0.02
0.04
0.06
0.08
0.1
M
ax
 D
el
ay
 S
pr
ea
d 
[ns
]
2
3
4
Pa
th
 lo
ss
 e
xp
on
en
t
Fig. 6. Scaling of the maximum delay spread τrms and the path loss exponent
n with respect to frequency (Ts = 0.3 mm and Th = 0.8 mm).
dB improvement in average) because most of the energy is
dissipated in the 0.7-mm silicon layer before reaching the heat
spreader. Although not shown due to space constraints, the
effect of AIN on path loss is much more evident for thinned
down silicon as the exponent drops from n = 4.01 (no AIN)
to close to 1.1 (0.8 mm). In that case, the delay spread also
oscillates between 0.2 and 0.6 ns, sometimes contradicting the
path loss tendency.
Finally, Figure 6 presents the results of the frequency
scaling analysis, which we limit to the 60–120 GHz span
due to computational constraints. Additionally, we fix the
silicon and heat spreader thicknesses to small and large values,
respectively, following the design recommendations justified
above. We chose this particular (Ts = 0.3 mm and Th = 0.8
mm) because it is close to an optimal point with respect to
dispersion. We find that fc = 110 GHz leads to a minimum
in terms of delay spread, although the improvement is limited
with respect to the other frequencies. The impact on path loss,
on the other hand, is substantial yet counter-intuitive as the
path loss drops substantially both in exponent and in average
value (10–20 dB; not shown in the interest of space) from 60
GHz to 100 GHz.
B. Engineering the Channel
Here, we show the potential of channel engineering through
a partial exploration of the {Th, Ts, fc} design space. Our aim
is not to fully implement the optimizer, but rather to validate
the potential of the approach by confirming both the complex
0 0.3 0.6 0.9
AIN thickness [mm]
0
0.5
1
1.5
w = 0
w = 0.5
w = 1
60 80 100 120
Frequency [GHz]
0
0.2
0.4
0.6
0.8
1
0.2 0.4 0.6
Silicon thickness [mm]
0
0.5
1
1.5
2
2.5
Fi
gu
re
 o
f M
er
it 
[ns
-
1 ]
Fig. 7. Figure of merit φw as function of {Ts, Th, fc} for different priority
weights. Unless noted, Ts = 0.1 mm, Th = 0.8 mm, and fc = 60 GHz.
interactions between inputs and the presence of local optima,
as well as by giving good approximations of the path loss and
delay spread improvements that we can expect.
We first plot the fitness function φw as function of each
exploration parameter while leaving the others fixed. The
results, summarized in Figure 7, confirm the main lessons
learned in Section IV-A: thin silicon is preferable (left plot),
it is hard to obtain clear tendencies with respect to the heat
spreader (middle plot), and performance may plateau close
to local optima (right plot). The choice of w also plays an
important role in the optimization and Figure 7 confirms it.
Since path loss and delay spread often show opposed trends,
the shape of φ changes in unexpected ways and causes wild
variations in the optimal design points. Take, for instance, the
silicon thickness scaling trend. The optimal thickness is clearly
around 0.3 mm for w = 0, but that peak dilutes progressively
and disappears around w = 0.6. At that point, the optimal
silicon thickness becomes 0.1 mm. A similar behavior arises
in the heat spreader scaling figure.
In order to estimate the maximum gains that we can
achieve through channel engineering, we further explored the
design space in the quest for points close to a hypothetical
global optima. We chose three representative values of w and
compared the results with those of a standard chip (Ts = 0.7
mm, Th = 0.2 mm, fc = 60 GHz). Figure 8 and Table I
illustrates the outcome of this process.
We first set w = 0 to simulate the extreme of high perfor-
mance, thereby pushing the limits on the delay spread. The
5 10 15 20 25
Distance [mm]
0
20
40
60
80
Pa
th
 lo
ss
 [d
B]
0 1 2 3
Time [ns]
0
0.2
0.4
0.6
0.8
1
CD
F 
of
 h
(t)
Standard
Opt (w=0)
Opt (w=0.5)
Opt (w=1)
Fig. 8. Comparison between standard package (Ts = 0.7 mm, Th = 0.2 mm,
fc = 60 GHz) and optimal points for three different power–speed weights
from the path loss (left) and delay spread perspectives (right).
peak has been found around {Ts = 0.3, Th = 0.8, fc = 110}
and yields a worst-case delay spread of τrms = 71.32 ps for
a coherence bandwidth of Bc = 14.02 GHz. This is roughly
one order of magnitude better than the standard chip case (0.52
ns for 1.92 GHz) and confirms that the speeds assumed in the
WNoC literature are feasible. In terms of path loss, this design
point is also 10–15 dB better than the standard.
A second representative case would be w = 1, which pushes
the limits on the path loss. The peak has been found by
thinning the silicon down to our lower limit and using a thick
spreader: {Ts = 0.1, Th = 0.8, fc = 60}. This case achieves
an outstanding path loss reduction of 47.07 dB for Lmax and
32.69 dB for Lavg (n = 1.32). Further, this confirms that the
path loss figures assumed in the literature, around 25–35 dB,
are indeed achievable even in the presence of a chip package.
However, the delay spread is maintained at the levels of the
standard chip in this case.
Finally, let w = 0.5 to model a channel engineering process
searching a balance between power and performance. In this
case, a local peak has been found around the point {Ts =
0.1, Th = 0.38, fc = 70}. With respect to the standard chip,
this design allows to improve the coherence bandwidth 3.52×
and the path loss by over 1.5×. Although this may not be a
global optimum, it illustrates the potential of the methodology.
C. Static Transceiver Optimization
Since we are interested in pushing the limits of performance,
this section evaluates the transceiver improvements in the
package engineered for high performance. Thus, we take the
worst-case transient response of the {Ts = 0.3, Th = 0.8, fc =
110} design point with a delay spread of τrms = 71.32
ps. In all the studied cases, OOK-modulated waveforms are
convoluted with the transient response at the channel and fed
to the receiver, which determines the hypothetical position of
the next ’0’ or ’1’ symbol. The BER is calculated assuming
independent and equiprobable symbols.
Threshold adaptation: We simulate our proposed receiver
with different number of decision thresholds K. We first
obtain the threshold values by looking at the previous log2(K)
symbols and then use conventional erfc formulation to derive
the error probability. Figure 9(a) plots the resulting BER for a
fixed rb of 10 Gb/s, assumed in numerous WNoC works, and
TABLE I
SUMMARY OF THE OPTIMIZED PACKAGE DESIGNS
τrms (ns) Bc (GHz) Lmax (dB) Lavg (dB) n
w = 0 0.07 14.02 58.62 42.76 3.28
w = 0.5 0.15 6.76 45.49 36.48 1.74
w = 1 0.59 1.69 28.55 21.88 1.32
Std. 0.52 1.92 75.62 54.57 4.61
as a function of Eb/N0. Although we are below the coherence
bandwidth, ISI effects disable the use of a priori thresholds
based on steady state measurements alone. The performance
for K = 4 is far from ideal, but starts to improve significantly.
At K = 8, the receiver performs close to a coherent receiver
in an ISI-free environment. In fact, it only needs to be 24.1
dB above the noise floor achieve the stringent BER required
for WNoC (10-15). This is only 3.1 dB over the ideal case.
To further evaluate the faster-than-Nyquist potential of the
proposed scheme, we fix the received power and push the
data rate way beyond the coherence bandwidth. The results,
shown in Figure 9(b), reveal that the receiver by default stops
working upon reaching the ISI wall at around 5 Gb/s. With as
few as K = 2 thresholds, our proposed scheme improves the
achievable data rate between 20% and 40%. Again, increasing
the number of decision thresholds allows to further mitigate
ISI (the bitrate increases from 7.32 up to 10.56 Gb/s at
BER = 10−9), to the point of becoming indispensable as we
keep pushing the data rate. These results illustrate the tradeoff
between performance and receiver complexity, although the
overhead of our proposed scheme is arguably small.
Return-to-zero: One of the conclusions that can be ex-
tracted from Figures 9(a) and 9(b) is that we can minimize
ISI, but we cannot get rid of it completely. The adaptive
threshold moves along with the average received energy, but
cannot eliminate the case where the ’0’ and ’1’ symbols
move closer. This is precisely the case targeted by RZ. To
evaluate it, we assume a receiver with K = 8 and set the
Eb/N0 for all transmission speeds. The results, plotted in
Figure 9(c), demonstrate that there is indeed a duty cycle value
that minimizes the error rate. The optimal point depends on
the transmission speed and yields an improvement of up to
two orders of magnitude with respect to non-RZ. The Eb/N0
scaling analysis, not shown here in the interest of space, also
revealed that RZ brings our scheme 1.2 dB closer to the ideal
receiver for BER = 10−15.
V. DISCUSSION
Impact on transmission speed. The channel engineering
process, by means of substantial delay spread cuts, increases
the ISI-free speed by an order of magnitude with respect to
in a standard chip. Further, the transceiver optimizations have
demonstrated that (i) achieving a BER of 10-15 at 10 GHz
is affordable, and that (ii) it would be otherwise impossible.
This thereby proves that our methodology enables the speeds
generally assumed in the WNoC literature.
Impact on power consumption. By reducing the path loss
in up to 47 dB, we achieve attenuation levels close to those
assumed in recent transceiver proposals (26.5 dB in [36], [37]
5 10 15 20 25
Signal to Noise Ratio [dB]
10-20
10-15
10-10
10-5
10-0
BE
R Default
Optimized (k=2)
Optimized (k=4)
Optimized (k=8)
No ISI
3.1dB
(a) Scaling with SNR (rb = 10 GHz, NRZ)
0 5 10 15 20 25 30
Bitrate [Gb/s]
10-18
10-15
10-12
10-9
10-6
10-3
BE
R
Default
Optimized (k=2)
Optimized (k=4)
Optimized (k=8)
No ISI
(b) Scaling with bitrate (Prx = −52 dBm, NRZ)
0 0.2 0.4 0.6 0.8 1
Duty Cycle
10-16
10-14
10-12
10-10
10-8
BE
R
2 Gb/s
8 Gb/s
6 Gb/s
4 Gb/s
10 Gb/s
(c) Scaling with RZ (K = 8, Eb/N0 = 21 dB)
Fig. 9. Impact of transceiver optimizations on the Bit Error Rate (BER) assuming OOK modulation.
and 26 dB calculated with data in [38], [39]). Meeting such
assumptions would lead to a bit energies of 1.95 pJ/bit for [36],
[37] or 0.54 pJ/bit for [38], [39], along the lines of what is
assumed in the WNoC literature. On top of that, our transceiver
only needs an extra 1.9 dB of SNR to compensate for the ISI
effects at 10 Gb/s and BER = 10−15.
To make an explicit connection between channel losses
and efficiency, we note that power amplifiers are the most
consuming components of current transceivers, e.g., 70.8%
in [38], [39]. Compensating for extra losses, noise figures,
or circuit limitations would make these figures to increase
even further. In fact, each amplifier has a limit Psat on the
output power it can provide. Going beyond that limit would
require a re-design of the amplifier and, according to long-time
experimentally validated scaling tendencies, the extra effort is
generally paid with a reduction of the amplifier efficiency in
2.5% per each extra dBm of Psat [53].
Research directions: although this work has mitigated the
intra-chip channel impairments significantly, we do not con-
sider to have reached a lower bound. Besides the application of
simulated annealing techniques to find global optima, we could
improve propagation further by (i) directing certain rays via
reflectors or leveraging the multiple antennas already in place
to perform beamforming, (ii) thinning silicon down to the
manufacturing limits [54], or (iii) exploring frequencies up to
the terahertz band [55]. Additionally, factors such as the chip’s
lateral dimensions or the antenna placement could be brought
into the optimization process as long as the computational cost
is affordable. At the transceiver side, low-weight coding would
help minimizing the impact of ISI at very high speeds [56].
VI. CONCLUSION
Wireless intra-chip communication has been proposed as
a potential solution to the scalability problems of current
multicore processors. However, we have demonstrated that
most works on this field is overly optimistic with regards to
the channel, assuming figures one or two orders of magnitude
better than what we found for a standard chip package. To
further address this fundamental issue and restate the potential
of WNoC, we proposed a methodology that exploits two
unique traits of this new wireless scenario: its monolithic
and static nature. The first allows us to engineer the channel,
this is, to modify the chip package to enhance propagation
in manufacturer-friendly ways. This process has yielded im-
provements of 47 dB of path loss or more than 10 GHz in
coherence bandwidth. The second allows us to optimize the
transceiver to mitigate multipath effects beyond the Nyquist
limit. We demonstrated that we can decode OOK signals at
10 Gb/s with a BER of 10-15 with a signal-to-noise ratio only
1.9 dB greater than in a dispersion-free environment.
REFERENCES
[1] R. Marculescu, U. Ogras, L.-S. Peh, N. Enright Jerger, and Y. Hoskote,
“Outstanding research problems in NoC design: system, microarchitec-
ture, and circuit perspectives,” IEEE Transactions on Computer-Aided
Design of Integrated Circuits and Systems, vol. 28, no. 1, pp. 3–21,
2009.
[2] S. Vangal, J. Howard, G. Ruhl, S. Dighe, H. Wilson, J. Tschanz, D. Fi-
nan, A. Singh, T. Jacob, S. Jain, V. Erraguntla, C. Roberts, Y. Hoskote,
N. Borkar, and S. Borkar, “An 80-Tile Sub-100-W TeraFLOPS Processor
in 65-nm CMOS,” IEEE Journal of Solid-State Circuits, vol. 43, no. 1,
pp. 29–41, 2008.
[3] G. Nychis, C. Fallin, and T. Moscibroda, “On-chip networks from a
networking perspective: congestion and scalability in many-core inter-
connects,” in Proceedings of the SIGCOMM, 2012, pp. 407–18.
[4] S. Park, T. Krishna, C.-H. Chen, B. Daya, A. Chandrakasan, and L.-S.
Peh, “Approaching the theoretical limits of a mesh NoC with a 16-node
chip prototype in 45nm SOI,” in Proceedings of the DAC-49, 2012, pp.
398–405.
[5] G. Chen, M. A. Anders, H. Kaul, S. K. Satpathy, S. K. Mathew, S. K.
Hsu, A. Agarwal, R. K. Krishnamurthy, V. De, and S. Borkar, “A
340 mV-to-0.9 v 20.2 Tb/s source-synchronous hybrid packet/circuit-
switched 16 16 network-on-chip in 22 nm tri-gate CMOS,” IEEE
Journal of Solid-State Circuits, vol. 50, no. 1, pp. 59–67, 2015.
[6] D. Wentzlaff, P. Griffin, H. Hoffmann, L. Bao, B. Edwards, C. Ramey,
M. Mattina, C.-C. Miao, J. F. Brown III, and A. Agarwal, “On-chip
interconnection architecture of the tile processor,” IEEE Micro, vol. 27,
no. 5, pp. 15–31, 2007.
[7] G. Chrysos, “Intel R© xeon phi coprocessor-the architecture,” Intel
Whitepaper, vol. 176, 2014.
[8] D. Bertozzi, G. Dimitrakopoulos, J. Flich, and S. Sonntag, “The fast
evolving landscape of on-chip communication,” Design Automation for
Embedded Systems, vol. 19, no. 1, pp. 59–76, 2015.
[9] J. Kim, K. Choi, and G. Loh, “Exploiting new interconnect technologies
in on-chip communication,” IEEE Journal on Emerging and Selected
Topics in Circuits and Systems, vol. 2, no. 2, pp. 124–136, 2012.
[10] D. Matolak, A. Kodi, S. Kaya, D. DiTomaso, S. Laha, and W. Rayess,
“Wireless networks-on-chips: architecture, wireless channel, and de-
vices,” IEEE Wireless Communications, vol. 19, no. 5, 2012.
[11] S. Abadal, B. Sheinman, O. Katz, O. Markish, D. Elad, Y. Fournier,
D. Roca, M. Hanzich, G. Houzeaux, M. Nemirovsky, E. Alarco´n, and
A. Cabellos-Aparicio, “Broadcast-Enabled Massive Multicore Architec-
tures: A Wireless RF Approach,” IEEE MICRO, vol. 35, no. 5, pp.
52–61, 2015.
[12] R. G. Kim, W. Choi, Z. Chen, P. P. Pande, D. Marculescu, and
R. Marculescu, “Wireless NoC and Dynamic VFI Codesign: Energy
Efficiency Without Performance Penalty,” IEEE Transactions on Very
Large Scale Integration (VLSI) Systems, vol. 24, no. 7, pp. 2488–2501,
2016.
[13] M. A. I. Sikder, A. Kodi, W. Rayess, D. Ditomaso, D. Matolak, and
S. Kaya, “Exploring wireless technology for off-chip memory access,”
in Proceedings of the HOTI ’16, 2016, pp. 92–99.
[14] O. Markish, B. Sheinman, O. Katz, D. Corcos, and D. Elad, “On-chip
mmWave Antennas and Transceivers,” in Proceedings of the NoCS ’15,
2015, p. Art. 11.
[15] H. M. Cheema and A. Shamim, “The last barrier: On-chip antennas,”
IEEE Microwave Magazine, vol. 14, no. 1, pp. 79–91, 2013.
[16] J. Wu, A. Kodi, S. Kaya, A. Louri, and H. Xin, “Monopoles Loaded
with 3-D-Printed Dielectrics for Future Wireless Intra-Chip Commu-
nications,” IEEE Transactions on Antennas and Propagation, vol. 65,
no. 12, pp. 6838–6846, 2017.
[17] B. A. Floyd, C.-M. Hung, and K. K. O, “Intra-chip wireless interconnect
for clock distribution implemented with integrated antennas, receivers,
and transmitters,” IEEE Journal of Solid-State Circuits, vol. 37, no. 5,
pp. 543–552, 2002.
[18] Y. P. Zhang, Z. M. Chen, and M. Sun, “Propagation Mechanisms
of Radio Waves Over Intra-Chip Channels With Integrated Antennas:
Frequency-Domain Measurements and Time-Domain Analysis,” IEEE
Transactions on Antennas and Propagation, vol. 55, no. 10, pp. 2900–
2906, 2007.
[19] H.-t. Wu, J.-j. Lin, and K. K. O, “Inter-Chip Wireless Communication,”
in Proceedings of the EuCAP ’13, 2013, pp. 3647–3649.
[20] X. Yu, J. Baylon, P. Wettin, D. Heo, P. Pratim Pande, and S. Mirabbasi,
“Architecture and Design of Multi-Channel Millimeter-Wave Wireless
Network-on-Chip,” IEEE Design & Test, vol. 31, no. 6, pp. 19–28, 2014.
[21] S. Laha, S. Kaya, D. W. Matolak, W. Rayess, D. DiTomaso, and A. Kodi,
“A New Frontier in Ultralow Power Wireless Links: Network-on-Chip
and Chip-to-Chip Interconnects,” IEEE Transactions on Computer-Aided
Design of Integrated Circuits and Systems, vol. 34, no. 2, pp. 186–198,
2015.
[22] S. Abadal, M. Iannazzo, M. Nemirovsky, A. Cabellos-Aparicio, H. Lee,
and E. Alarco´n, “On the Area and Energy Scalability of Wireless
Network-on-Chip: A Model-based Benchmarked Design Space Explo-
ration,” IEEE/ACM Transactions on Networking, vol. 23, no. 5, pp.
1501–13, 2015.
[23] A. Mineo, M. Palesi, G. Ascia, and V. Catania, “Runtime Tunable
Transmitting Power Technique in mm-Wave WiNoC Architectures,”
IEEE Transactions on VLSI Systems, vol. 24, no. 4, pp. 1535–1545,
2016.
[24] S. Abadal, J. Torrellas, E. Alarco´n, and A. Cabellos-Aparicio, “Or-
thoNoC: A Broadcast-Oriented Dual-Plane Wireless Network-on-Chip
Architecture,” IEEE Transactions on Parallel and Distributed Systems,
vol. 29, no. 3, pp. 628–641, 2018.
[25] S. Deb, A. Ganguly, P. P. Pande, B. Belzer, and D. Heo, “Wireless
NoC as Interconnection Backbone for Multicore Chips: Promises and
Challenges,” IEEE Journal on Emerging and Selected Topics in Circuits
and Systems, vol. 2, no. 2, pp. 228–239, 2012.
[26] S. H. Gade and S. Deb, “HyWin: Hybrid wireless NoC with sandboxed
sub-networks for CPU/GPU architectures,” IEEE Transactions on Com-
puters, vol. 66, no. 7, pp. 1145–1158, 2017.
[27] D. DiTomaso, A. Kodi, D. Matolak, S. Kaya, S. Laha, and W. Rayess,
“A-WiNoC: Adaptive Wireless Network-on-Chip Architecture for Chip
Multiprocessors,” IEEE Transactions on Parallel and Distributed Sys-
tems, vol. 26, no. 12, pp. 3289–3302, 2015.
[28] W. Choi, K. Duraisamy, R. G. Kim, J. R. Doppa, P. P. Pande, D. Mar-
culescu, and R. Marculescu, “On-Chip Communication Network for
Efficient Training of Deep Convolutional Networks on Heterogeneous
Manycore Systems,” IEEE Transactions on Computers, vol. 67, no. 5,
pp. 672–686, 2018.
[29] S. Abadal, A. Mestres, J. Torrellas, E. Alarco´n, and A. Cabellos-
Aparicio, “Medium Access Control in Wireless Network-on-Chip: A
Context Analysis,” IEEE Communications Magazine, vol. 56, no. 6, pp.
172–178, 2018.
[30] L. Yan and G. W. Hanson, “Wave propagation mechanisms for intra-chip
communications,” IEEE Transactions on Antennas and Propagation,
vol. 57, no. 9, pp. 2715–2724, 2009.
[31] W.-H. Chen, S. Joo, S. Sayilir, R. Willmot, T.-Y. Choi, D. Kim, J. Lu,
D. Peroulis, and B. Jung, “A 6-Gb/s Wireless Inter-Chip Data Link
Using 43-GHz Transceivers and Bond-Wire Antennas,” IEEE Journal
of Solid-State Circuits, vol. 44, no. 10, pp. 2711–2721, oct 2009.
[32] H. H. Yeh, N. Hiramatsu, and K. L. Melde, “The design of broadband
60 GHz AMC antenna in multi-chip RF data transmission,” IEEE
Transactions on Antennas and Propagation, vol. 61, no. 4, pp. 1623–
1630, 2013.
[33] R. S. Narde and J. Venkataraman, “Feasibility study of Transmission
between Wireless Interconnects in Multichip Multicore systems,” in
Proceedings of the APS/URSI ’17, 2017, pp. 1821–1822.
[34] S. H. Gade, S. Garg, and S. Deb, “OFDM Based High Data Rate, Fading
Resilient Transceiver for Wireless Networks-on-Chip,” in Proceedings
of the ISVLSI ’17, 2017, pp. 483–488.
[35] W. Rayess, D. W. Matolak, S. Kaya, and A. K. Kodi, “Antennas
and Channel Characteristics for Wireless Networks on Chips,” Wireless
Personal Communications, vol. 95, no. 4, pp. 5039–5056, 2017.
[36] X. Yu, S. P. Sah, H. Rashtian, S. Mirabbasi, P. P. Pande, and D. Heo,
“A 1.2-pJ/bit 16-Gb/s 60-GHz OOK Transmitter in 65-nm CMOS for
Wireless Network-On-Chip,” IEEE Transactions on Microwave Theory
and Techniques, vol. 62, no. 10, pp. 2357–2369, 2014.
[37] X. Yu, H. Rashtian, and S. Mirabbasi, “An 18.7-Gb/s 60-GHz OOK
Demodulator in 65-nm CMOS for Wireless Network-on-Chip,” IEEE
Transactions on Circuits And Systems -I: Regular Papers, vol. 62, no. 3,
pp. 799–806, 2015.
[38] S. Subramaniam, T. Shinde, P. Deshmukh, S. Shamim, M. Indovina,
and A. Ganguly, “A 0.36pJ/bit, 17Gbps OOK Receiver in 45-nm CMOS
for Inter and Intra-Chip Wireless Interconnects,” in Proceedings of the
SOCC ’17, 2017.
[39] T. Shinde, S. Subramaniam, P. Deshmukh, M. M. Ahmed, M. Indovina,
and A. Ganguly, “A 0.24 pJ/bit, 16 Gbps OOK Transmitter Circuit
in 45-nm CMOS for Inter and Intra-Chip Wireless Interconnects,” in
Proceedings of the GLSVLSI ’18, 2018, pp. 69–74.
[40] D. Matolak, S. Kaya, and A. Kodi, “Channel modeling for wireless
networks-on-chips,” IEEE Communications Magazine, vol. 51, no. 6,
pp. 180–186, 2013.
[41] “ITRS: International Technology Roadmap for Semiconductors.”
[Online]. Available: http://www.itrs2.net
[42] J. Branch, X. Guo, L. Gao, A. Sugavanam, J. J. Lin, and K. K.
O, “Wireless communication in a flip-chip package using integrated
antennas on silicon substrates,” IEEE Electron Device Letters, vol. 26,
no. 2, pp. 115–117, 2005.
[43] X. Timoneda, S. Abadal, A. Cabellos-Aparicio, D. Manessis, J. Zhou,
A. Franques, J. Torrellas, and E. Alarco´n, “Millimeter-Wave Propagation
within a Computer Chip Package,” in Proceedings of the ISCAS ’18,
2018.
[44] K. Kim, W. Bornstad, and K. K. O, “A Plane Wave Model Approach
to Understanding Propagation in an Intra-chip Communication System,”
in Proceedings of the APS ’01, 2001, pp. 166–169.
[45] R. S. Narde, N. Mansoor, A. Ganguly, and J. Venkataraman, “On-
Chip Antennas for Inter-Chip Wireless Interconnections: Challenges and
Opportunities,” in Proceedings of the EuCAP ’18, 2018.
[46] S. Khademi, S. Prabhakar Chepuri, Z. Irahhauten, G. Janssen, and A.-
J. van der Veen, “Channel Measurements and Modeling for a 60 GHz
Wireless Link Within a Metal Cabinet,” IEEE Transactions on Wireless
Communications, vol. 14, no. 9, pp. 5098–5110, 2015.
[47] P. Y. Chiang, S. Woracheewan, C. Hu, L. Guo, H. Liu, R. Khanna, and
J. Nejedlo, “Short-Range, Wireless Interconnect within a Computing
Chassis: Design Challenges,” IEEE Design & Test of Computers, vol. 27,
no. 4, pp. 32–43, 2010.
[48] A. Zajic and P. Juyal, “Modeling of THz Chip-to-Chip Wireless Chan-
nels in Metal Enclosures,” in Proceedings of the EuCAP ’18, 2018, pp.
1–5.
[49] J. Lin, H. Wu, Y. Su, L. Gao, A. Sugavanam, and J. Brewer, “Commu-
nication using antennas fabricated in silicon integrated circuits,” IEEE
Journal of Solid-State Circuits, vol. 42, no. 8, pp. 1678–1687, 2007.
[50] J. Simkin and C. W. Trowbridge, “Optimizing electromagnetic devices
combining direct search methods with simulated annealing,” IEEE
Transactions on Magnetics, vol. 28, no. 2, pp. 1545–1548, 1992.
[51] L.-S. S. L.-S. Shu, S.-Y. H. S.-Y. Ho, and S. J. Ho, “A novel orthog-
onal simulated annealing algorithm for optimization of electromagnetic
problems,” pp. 1791–1795, 2004.
[52] J. B. Anderson, F. Rusek, and V. Owall, “Faster-Than-Nyquist Signal-
ing,” Proceedings of the IEEE, vol. 101, no. 8, pp. 1817–1830, 2013.
[53] H. Wang, F. Wang, H. T. Nguyen, and S. Li, “Power Amplifiers
Performance Survey 2000-present.” [Online]. Available: https://gems.
ece.gatech.edu/PA{ }survey.html
[54] F. Bieck, S. Spiller, F. Molina, M. To¨pper, C. Lopper, I. Kuna, T. C.
Seng, and T. Tabuchi, “Carrierless design for handling and processing
of ultrathin wafers,” Proceedings of the ECTC ’10, pp. 316–322, 2010.
[55] Y. Chen and C. Han, “Channel Modeling and Analysis for Wireless
Networks-on-Chip Communications in the Millimeter Wave and Tera-
hertz Bands,” in Proceedings of the INFOCOM WKSHPS ’18, 2018.
[56] J. M. Jornet and I. F. Akyildiz, “Low-Weight Channel Coding for In-
terference Mitigation in Electromagnetic Nanonetworks in the Terahertz
Band,” in Proceedings of the ICC ’11, 2011, pp. 1–6.
