Reducing Interconnect Cost in NoC through Serialized Asynchronous Links by Ogg, S et al.
Reducing Interconnect Cost in NoC through Serialized Asynchronous Links
∗ 
 
Simon Ogg
1, Enrico Valli
2, Crescenzo D'Alessandro
3, Alex Yakovlev
3, Bashir Al-Hashimi
1, Luca Benini
2 
1University of Southampton, 
2University of Bologna, 
3Newcastle University 
{so04r, bmah}@ecs.soton.ac.uk 
 
                                                         
∗ This work is funded by EPSRC (UK) grant EP/C512804/1 and is greatly acknowledged. 
Abstract 
This work investigates the application of 
serialization as a means of reducing the number of 
wires in NoC combined with asynchronous links in 
order to simplify the clocking of the link. Throughput is 
reduced but savings in routing area and reduction in 
power could make this attractive. 
1. Introduction and Motivation 
As multiprocessor SoC solutions increase there are  
benefits to provide a scalable on chip communication 
structure such as NoC. Interconnect cost, in terms of 
the number of wires required between switches, could 
also be considerable in NoC structures since each 
switch is connected by a point-to-point link to a 
neighboring switch. The high cost of parallel links has 
been shown in [1] when inter-wiring spacing, shielding 
and repeaters are considered. The number of links in 
NoC will grow as more cores are integrated into a 
system. This work proposes the application of 
serialization as a means of reducing the interconnect 
cost in NoC.  
2. Proposed Link & Results 
The link, shown in Figure 1, has several blocks; 
synchronous/asynchronous converter interfaces (1&5), 
asynchronous serializer (2) and de-serializer (4), and 
wire buffers (3). The implementation uses a bundled 
request with the data. The synchronous/asynchronous 
interfaces uses a FIFO mechanism to pass data 
between the synchronous and asynchronous domains. 
The serializer and de-serializer split each 32 bit flit into 
smaller bit slices which are then sent serially along the 
buffered wires. The buffered wires allow pipelining of 
the bit slices to improve throughput. 
 
Switch 
B
u
f
 
B
u
f
  Switch 
m
CLK A 
S
/
A
 
I
n
t
.
  m  m
A
/
S
 
I
n
t
  n 
CLK A 
S
e
r
i
  n 
D
e
-
S
e
r
  m 
1  2  3 4  5 
ASYNCH. 
SYNCH. SYNCH. 3 
 
Figure 1 Proposed Link 
The synchronous 32 bit wide link was compared to 
our proposed asynchronous 4 and 8 bit wide link. All 
circuits were simulated with Cadence Spectre using ST 
0.12µm technology. The logic area cost  for the 
synchronous implementation is 15850 µm
2. For the 
asynchronous 8 and 4 bit wide the logic area cost is 
18900 µm
2 and 18950 µm
2 respectively. Figure 2 shows 
for a wire length of 1000 µm wiring area of the 
synchronous link is reduced from 30000 µm
2 down to 
7500 µm
2 and 3750 µm
2  for the asynchronous 8 and 4 
bit wide links respectively. Throughput in the 
asynchronous links is limited due to need to 
acknowledge each transfer, the link saturates 207 
MFlits/s for the 8 bit wide link. Figure 3 shows power 
in the 8 bit wide asynchronous link (I3) is up to 25% 
lower compared to the synchronous link (I1) depending 
on link usage. Power in the link is dominated by the 
synchronous/asynchronous converters. 
Switch Speed V Throughput
0
50
100
150
200
250
300
350
50 100 150 200 250 300
Switch Clock (MHz)
T
h
r
o
u
g
h
p
u
t
 
(
M
F
l
i
t
s
/
s
e
c
) I1-Synch
I2-Asynch(4)
I3-Asynch(8)
Wire Length v Wire Area
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
0 500 1000 1500 2000 2500 3000
Wire Length (µm)
W
i
r
i
n
g
 
A
r
e
a
 
(
µ
m
2
)
I1-Synch(32)
I2-Asynch(4)
I3-Asynch(8)
 
Figure 2 Throughput and Wire Area 
A feasible serialized asynchronous link has been 
demonstrated which lowers the point to point wiring 
interconnect cost. Throughput is limited due to the per-
transfer acknowledgement scheme and the authors are 
currently investigating ways to improve this. 
Average Power for 50% Usage 
0
100
200
300
400
500
600
700
800
I1-Synch
50%
I2-Asynch4
50%
I3-Asynch8
50%
Implementation (link usage)
A
v
e
r
a
g
e
 
P
o
w
e
r
 
(
µ
W
)
Ser/Des
Buffers
Asynch
Synch
Conv. 
Average Power for 25% Usage 
0
100
200
300
400
500
600
700
800
I1-Synch
25%
I2-Asynch4
25%
I3-Asynch8
25%
Implementation (link usage)  
Figure 3 Average Power 
3. References 
[1] A. Morgenshtein, I. Cidon,, “Comparative analysis of 
serial vs parallel link in NoC”, in International Symposium 
on SoC, Finland, 04, pp. 185-186. 