An Energy-Efficient Reconfigurable Circuit Switched Network-on-Chip by Wolkotte, Pascal T. et al.
An Energy-Efficient Reconfigurable Circuit-Switched Network-on-Chip
Pascal T. Wolkotte, Gerard J.M. Smit, Gerard K. Rauwerda, Lodewijk T. Smit
University of Twente, Department of EEMCS
P.O. Box 217, 7500 AE Enschede, The Netherlands
P.T.Wolkotte@utwente.nl
Abstract
Network-on-Chip (NoC) is an energy-efficient on-chip
communication architecture for multi-tile System-on-
Chip (SoC) architectures. The SoC architecture, in-
cluding its run-time software, can replace inflexible
ASICs for future ambient systems. These ambient sys-
tems have to be flexible as well as energy-efficient. To find
an energy-efficient solution for the communication net-
work we analyze three wireless applications. Based on
their communication requirements we observe that revis-
iting of the circuit switching techniques is beneficial. In
this paper we propose a new energy-efficient reconfig-
urable circuit-switched Network-on-Chip. By physically
separating the concurrent data streams we reduce the over-
all energy consumption. The circuit-switched router has
been synthesized and analyzed for its power consump-
tion in 0.13 µm technology. A 5-port circuit-switched router
has an area of 0.05 mm2 and runs at 1075 MHz. The pro-
posed architecture consumes 3.5 times less energy com-
pared to its packet-switched equivalent.
1. Introduction
In the Smart chipS for Smart Surroundings (4S) project
[1] we propose a heterogeneous multi-tile System-on-Chip
(SoC) architecture with run-time software and tools. The
SoC architecture contains a heterogeneous set of process-
ing tiles interconnected by a Network-on-Chip (NoC) as de-
picted in Fig. 1. The run-time software determines a near
optimal mapping of applications to the heterogeneous ar-
chitecture at run-time. The architecture including the run-
time software can replace inflexible ASICs for future ambi-
ent systems.
These ambient systems have to support wide range of
applications so they have to be flexible as well as energy-
efficient. The designer has to partition the application into
a Kahn like process graph model. In this model the appli-
cation is represented as a graph with communicating func-
DSRHDSRH DSP FPGA
DSP ASIC
ASIC
GPP
DSP
ASIC
GPP
GPP
DSRH FPGADSRH
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
Figure 1. An example of a heterogeneous
System-on-Chip (SoC) with a Network-on-
Chip (NoC). DSRH = Domain Specific Recon-
figurable Hardware
tional processes (see for example Fig. 2). At run time, the
individual processes of the application will be mapped on
the tiles that can execute it most efficiently. The communi-
cation channels between processes are mapped on the NoC
architecture.
The multi-tile SoC architecture has many advantages: a)
tiles of the same type can be duplicated when the number of
transistors grow in the next technology step, b) replication
of tiles eases the verification process, c) tiles do not grow in
complexity with a new technology, d) relative small tiles al-
low extensive optimization, e) computational performance
scales about linearly with the number of tiles, f) unused tiles
can be switched off to reduce the energy consumption of the
chip, g) locality of reference is exploited, h) it is possible to
have individual clock domains per tile, and i) for reconfig-
urable tiles it is possible to do partial dynamic reconfigura-
tion on a per tile basis.
Current wireless applications are based on a large set of
quick evolving 3G/4G wireless standards. At design-time,
the reconfigurability of the chip enables adaptation of the
application in case of changes in the standards. At run-
time, the SoC can be reconfigured for adaptation of the al-
gorithms/parameters due to changes in the reception qual-
ity [2]. Furthermore, the reconfigurable SoC can share its
resources among several standards (e.g. WLAN in combi-
0-7695-2312-9/05/$20.00 (c) 2005 IEEE
nation with UMTS). The reconfigurability and parallelism
of the SoC provide the necessary options for a so called
multi-mode transceiver system.
1.1. SoC Architecture
Our System-on-Chip consists of a heterogeneous set of
processors connected via a Network-on-Chip as depicted in
Fig. 1. The network consists of a set of routers intercon-
nected by links. In this paper we assume a regular two di-
mensional mesh topology of the routers. Every router is
connected with its four neighboring routers via bidirectional
point-to-point links and with a single processor tile via the
tile interface. The SoC system is organized as a central-
ized system: one node, called Central Coordination Node
(CCN), performs system coordination functions.
The main task of the CCN is to manage the system re-
sources. It performs run-time mapping of the newly arrived
applications to suitable processing tiles and inter-processing
communications to a concatenation of network links [3]. It
also tries to satisfy Quality of Service (QoS) requirements,
to optimize the resources usage and to minimize the energy
consumption. The CCN does not perform run-time schedul-
ing of individual processes and communications during ex-
ecution. That is performed by the individual tiles and net-
work routers. The CCN performs the feasibility analysis,
spatial mapping, process allocation and configuration of the
tiles and the NoC before the start of an application.
It is expected that the on-chip communication networks
of these future SoC will be one of the limiting factors for
performance and possibly energy consumption [4]. In this
paper we describe a new architectural concept for on-chip
communications. Using the communication characteristics
of three wireless communication standards we propose a
new reconfigurable circuit-switched on-chip network. This
network benefits from the common characteristics of these
wireless standards.
1.2. Organization of the Paper
The paper is organized as follows. We start with related
work for on-chip communication architectures. Section 3
describes three wireless standards (HiperLAN/2, UMTS
and Digital Radio Mondiale) and determines their com-
mon communication characteristics. Section 4 gives the
reasons for reconsidering a circuit-switched network. Us-
ing the characteristics of other NoC solutions and the re-
quirements of the wireless applications a new NoC archi-
tecture is developed. Section 5 describes this architecture.
Section 6 describes several scenarios for benchmark the
power consumption of our architecture. The paper con-
cludes with a comparison of our router with an equivalent
packet-switched router.
2. Related work
Network-on-chip (NoC) architectures [5–10] have been
proposed as a solution for the problem of on-chip communi-
cation in multi-tile SoC architectures. The architectures are
presented as replacements of the on-chip time-division mul-
tiplex buses (e.g. the AMBA bus from ARM Inc. [11]).
All the proposed solutions are based on routers intercon-
nected through network links. The solutions differentiate in
the topology of the network and the implementation of the
individual routers. The two dimensional mesh is the most
common topology compared with other topologies such as
hexagons, butterflies, tree or hypercube structures. The im-
plementations for the routers vary widely using techniques
of packet or circuit switching, dynamic or static schedul-
ing, wormhole or virtual-cut through routing.
The majority of the current router implementations
for network-on-chip are based on a packet-switched, syn-
chronous networks [5–9]. Using known routing protocols
the number of buffers are minimized and best-effort traf-
fic can be served. In the circuit-switched solution [10]
buffering is not necessary. To handle guaranteed through-
put traffic several techniques are used, such as: contention
free routing [5], static scheduling [10], virtual chan-
nels [6], virtual circuits [8] and priorities [9].
The routers are benchmarked using a local area network
approach where the benchmarks use random traffic patterns.
New (more specific) NoC benchmarks can be necessary,
because the on-chip traffic patterns have other characteris-
tics [12] and demands [13].
3. Application domain
To determine the requirements of the on-chip network
we have investigated the common characteristics of three
wireless applications: the baseband processing of Hiper-
LAN/2, UMTS and Digital Radio Mondiale (DRM). The
block diagram of DRM is similar to HiperLAN/2, but the
communication requirements are a factor 1000 less com-
pared to HiperLAN/2. The exact figures are presented in
[14].
3.1. HiperLAN/2
WLAN networks use radio technologies such as IEEE
802.11a or HiperLAN/2 to provide secure, reliable, fast
wireless connectivity. They operate in the unlicensed 2.4
and 5 GHz radio bands, with data rates up to 54 Mbps. The
physical layer of HiperLAN/2 is described in [15]. The task
of the physical layer in HiperLAN/2 is to modulate bits that
originate from the data link control layer on the transmitter
side and to demodulate them on the receiver side. For modu-
lation it uses Orthogonal Frequency Division Multiplexing
0-7695-2312-9/05/$20.00 (c) 2005 IEEE
Serial to
parallel
Freq. offset
correction
Prefix
removal FFT
Phase offset
correction
Channel
equalizationDemapping
Synchronization & Control
2 3 4
5
6
78
1
hard bits
samples
Figure 2. HiperLAN/2 Baseband Processing
Edge(s) Stream Bandwidth [Mbit/s]
S/P→ Pre-fix removal 1-2 640
Pre-fix removal→ FFT 3-4 512
FFT→ Channel eq. 5-6 416
Channel eq.→ De-map 7 384
Hard bits 8 12 (BPSK) up to
72 (QAM-64)
Table 1. Communication in HiperLAN/2
(OFDM). The physical layer of the HiperLAN/2 standard
has been mapped on a multi-tile architecture [16]. Fig. 2 de-
picts the block diagram of this implementation.
One major property of the signaling in HiperLAN/2 is
the grouping of samples in OFDM symbols. An OFDM
symbol has a fixed length of 80 samples. All operations
in the physical layer are performed on these OFDM sym-
bols. The processing in the OFDM receiver is performed
in block-mode on OFDM symbols. This results in a block-
based communication stream between the successive pro-
cessing tiles. One should guarantee that each 4 µs a new
OFDM symbol can be processed. This requires a guaran-
teed throughput service for our NoC. Table 1 gives the re-
quired communication bandwidth, based on 16 bits quanti-
zation for the baseband processing.
3.2. UMTS
The Universal Mobile Telecommunications System
(UMTS) standard [17] is an example of a Third Genera-
tion (3G) mobile communication system. UMTS is based
on Wideband Code Division Multiple Access (W-CDMA).
In CDMA every transmitted bit is coded with a spread-
ing code of a higher rate, which means that every trans-
mitted bit is multiplied with a spreading code. The spread-
ing code consists of a sequence of so-called chips, whose
rate is referred to as the chip rate. In this way the in-
formation is transmitted at the chip rate, so the spec-
trum is spread. As a consequence, many correlations with
the spreading code have to be performed in the UMTS re-
ceiver.
Rauwerda [18] mapped the downlink of an UMTS W-
CDMA receiver on a set of (reconfigurable) processors. In
contrast to the DRM and HiperLAN/2 implementation the
data processing and communication between the processors
is streaming oriented. This streaming communication re-
sults in new requirement for our NoC. At a regular short
interval a very small packet, containing 1 sample, has to be
transported to the successive processor.
4
3
2
Hard bits
5
Delay 1
Delay 2
Delay N
De-scrambling De-spreading
De-scrambling De-spreading
De-scrambling De-spreading
MRC coefficient finger 1
MRC coefficient finger N
Chips finger 1
Chips finger 2
Chips finger N
MRC coefficient finger 2
Scrambling code
Flexible rake receiver
M
a
xim
al
 R
atio
 C
o
m
bining
D
e
-m
apping
Pulse
shaping1
Oversampled
input samples
Control
Cell-searcher
Path-searcher
Channel-estimation
Figure 3. A UMTS Receiver W-CDMA with
N RAKE Fingers
Edge # Bandwidth [Mbit/s]
Chips (per finger) 2 61.44
Scrambling code 3 7.68
MRC coefficient (per finger) 4 61.44/SF
Received bits 5 7.68/SF (QPSK)
15.36/SF (QAM-16)
Table 2. Communication in UMTS
Fig. 3 depicts the basic block diagram of the imple-
mented W-CDMA receiver. The properties of the commu-
nication streams between the processes are listed in Table 2,
where every chip or coefficient is represented by 8 bits. For
example, the total communication bandwidth for process-
ing 4 RAKE fingers with a spreading factor (SF) of 4 is
∼ 320 Mbit/s.
3.3. Common Characteristics
Analyzing the common characteristics of the three wire-
less applications we made the following observations:
• The applications have a fixed amount of processing
per data sample (manifest loops). This is typically
found in wireless baseband processing and audio and
video filtering. It does not apply for audio and video
compression, where the amount of processing is data
dependent (non-manifest).
• Input data arrives at a fixed rate, which cause peri-
odic data transfers between the successive processing
blocks.
• We have semi-static life-time of a communication
stream, which means a stream is fixed for a relatively
long time.
To describe the network traffic in a system, we adopt the
notation used in [19]. According to the type of services re-
quired, the following types of traffic can be distinguished in
the network:
• GT (guaranteed throughput) this is the part of the traf-
fic for which the network has to give real-time guar-
antees (i.e. guaranteed bandwidth, bounded latency).
• BE (best effort) this is the part of the traffic for which
the network guarantees only fairness but does not give
any bandwidth and timing guarantees.
We observed that in the described applications the ma-
jority of the data streams through the successive processes.
0-7695-2312-9/05/$20.00 (c) 2005 IEEE
This continuous flow of data needs guaranteed throughput,
because the front-end is not allowed to drop data. Depend-
ing on the standard we can use block-based communica-
tion (OFDM) or have to use streaming (UMTS) because
the blocks get too large. If we compare the required band-
width between the processes of the different applications
this varies widely from several kbit/s (DRM) up to more
than 0.5 Gbit/s (HiperLAN/2).
Beside the main-stream of the communication we fore-
see a minor part (assumed to be less then 5%) of best ef-
fort communication e.g. control, interrupts and configura-
tion data. This communication can have more relaxed re-
quirements for the network and hence can use the best ef-
fort services.
The overall important characteristic is the life-time of a
communication stream. We aim to develop a SoC for a mul-
timedia terminal where we can assume that the data streams
are semi-static and have periodic behavior. This means that
for a long period of time subsequent data items of a stream
follow the same route. This will last for seconds and more,
because a user will listen to its radio or has a phone con-
versation for a considerable time. However, the control sys-
tem might change some settings of processes due to chang-
ing environmental conditions.
4. Circuit-Switched Revisited
The application analysis of section 3 shows that the
amount of expected guaranteed throughput traffic is much
larger compared to the amount of best effort traffic. Funda-
mentally, sharing resources and giving guarantees are con-
flicting, and efficiently combining guaranteed traffic with
best-effort traffic is hard [20]. By using dedicated tech-
niques for both types of traffic we aim for reducing the to-
tal area and power consumption. In this paper we concen-
trate on the architecture for communication of guaranteed
throughput traffic.
For guaranteed throughput we use reconfigurable cir-
cuit switching that create dedicated connections between
two processing tiles. The reasons for reconsidering circuit
switching instead of using packet switching are:
• The flexibility of packet switching is not needed, be-
cause data streams are fixed for a relatively long time.
Therefore, a connection between two tiles is required
for a long period (e.g. seconds or longer). This con-
nection can be configured by the CCN.
• Large amount of the traffic between tiles will need a
guaranteed throughput, which can be easier guaran-
teed in a circuit-switched connection.
• Current SoC have a large amount of wiring resources
that give enough flexibility for streams with different
bandwidth demands.
• Circuit switching eases the implementation of asyn-
chronous communication techniques, because data
and control can be separated. A control free pipelined
asynchronous data stream does not require much de-
sign effort.
• The circuit switching has a minimal amount of con-
trol in the data pad (e.g. no arbitration). This increases
the energy- efficiency per transported bit and the max-
imum throughput.
Further, we see some benefits when guaranteed through-
put traffic has to be scheduled:
• Scheduling communication streams over non-time
multiplexed channels is easier, because by definition
a stream will not have collisions with other commu-
nication streams. The Æthereal [5] and SoCBUS [10]
routers have large interaction between data streams
(both have to guarantee contention free paths). Deter-
mining the static time slots table requires consider-
able effort.
• Because data-streams are physically separated, colli-
sions in the crossbar do not occur. Therefore, we do
not need buffering and arbitration in the individual
router. An established physical channel can always be
used.
5. Architecture
As described in the introduction, we have a hardware ar-
chitecture containing a heterogeneous set of processing tiles
interconnected by a Network-on-Chip (NoC). For the mo-
ment the tiles and NoC are synchronized by the same clock.
The NoC will use dedicated techniques for BE and GT traf-
fic. For the GT traffic we developed a reconfigurable circuit-
switched network, that is configured by the CCN. For BE
traffic we aim for a packet-switched solution.
5.1. Proposed Circuit-Switched Network
The task of a NoC is to transport data from one tile to
the other. The NoC consists of a set of routers intercon-
nected by links. Using the benefits of the circuit switching
and required application demands we defined a reconfig-
urable circuit-switched router (see Fig. 4).
The reconfigurable circuit-switched router consists of
three major parts: the data-converter, crossbar and the cross-
bar configuration. For the moment the router has five bidi-
rectional ports where one port is connected to a processing
tile and four ports via a bi-directional link (16 bit wide per
direction) to their neighboring circuit-switched routers.
Wiklund [10] concluded that their circuit-switched NoC
had a high latency for setting up a new circuit. This was
caused by the blocking of the routers, because a reserved
physical link could not be used for other connections. They
0-7695-2312-9/05/$20.00 (c) 2005 IEEE

	
	


	







	




	







	



 

	



	
	


 !
"#	$%&'
%																							()
()																							%
"#	$)#&'
()
			
			
			
			
			
			
			
		
%

"

#
	
$&

*
 
'
()
		
		
		
		
		
		
		
		
		
		
		
	
%

"

#

	$
+
 


'
()
																							
%

"

#

	$ 
,


'
 !$
'$'$-'$'
"#%$
'$	'
"#()$-'$	'

				-
./	
Figure 4. Block Diagram of the Router
solved this problem by suggesting a static timing schedule
for the different occurring data-streams.
This problem is solved in our proposal by creating small
channels (e.g. four bits) called lanes. The bi-directional link
between two routers consists of a concatenation of uni-
directional lanes (e.g. two times four lanes) as depicted in
the right-side of Fig. 5. This increases the flexibility as is
available in time division multiplexed systems. The small
lanes are connected to a tile interface via the data-converter.
Fig. 5 depicts a data-converter that converts the 16 bit data
to the width of the lanes and visa-versa. The 16 bit tile in-
terface is compatible with the packet-switched alternative
of Kavaldjiev [6]. We define the division in lanes as lane di-
vision multiplexing. Each lane can be used for a different
physical connection between processors.
The width and number of lanes are adjustable parame-
ters in the design. They can be adjusted at design-time of
the SoC to meet the flexibility and bandwidth requirements
of the aimed applications. For example, if more streams are
needed for the north and south port their number of lanes
can be increased. The tables of section 3 can be used to de-
termine the width and number of lanes. Four lanes of four
bits and a tile interface of 16 bits have been chosen to make
a fair comparison with the four virtual channel configura-
tion of the packet-switched alternative.
In the router the four lanes of one port have to be con-
nected with all the four lanes of all the other four ports. This
results in a router with 20 input and 20 output lanes. They
are connected via a 16x20 fully connected crossbar (20x20
is not necessary, because data does not have to flow back).
The 20 output lanes of the crossbar are registered. The speed
of the total network will therefore only depend on the max-
imum delay in a single router plus the maximum wire delay
of the link between two routers.
 








	





























 









	





























ff
fi
ff
fl
ffi

 
!
"
fi
!
"
#
$%&
'
#
$%&
(
#
$%&
(
#
$%&
)
#
$%&
*
#+
%
,
#
$%&
'
#
$%&
)
#
$%&
*
-
./0
12
/
3
./44
5
$.
Figure 5. Data Converter Between Tile Inter-
face and Crossbar
6
7
8
9
:
9
8
9
:
9;9<
=
>
:
?
@
A
B
B
@
C
@
@
B
B
D
E
D
F
B
B
D
G
D
H
B
B
D
D
7
I
?
JK
:
LK
M>
6 I
N
9
OK8
7
PQQ
76 77QQ
IR ISQQ
I
T
IU
QQ
II
8
9
:
9
V
:
W
;>
Figure 6. Organization of the Header and Data
The configuration of the crossbar (which input lane is
connected with which output lane) is stored in a local con-
figuration memory. Per output lane the input lane is stored
plus an activation bit. The configuration memory size is
5x20 = 100 bits.
To minimize energy consumption the circuit switching
has fully separated data and control paths. Because a data-
packet cannot include routing information, we cannot serve
best effort traffic. We configure the configuration memory
via a small additional interface. Configuration of 1 lane re-
quires 10 bits that are generated by the CCN based on the
optimal applications mapping [3]. The configuration inter-
face is connected to the separate BE network. We aim to
transport the reconfiguration data in less than 1 ms over the
BE network, because the configuration of the crossbar will
not change frequently due to the long-life data streams be-
tween tiles. One single router can than be fully reconfigured
within 20 ms.
5.2. Data and Flow Control
The circuit-switched network can handle synchroniza-
tion of information in the data-packets. To enable this we
included a small four bits header with every data-word. The
header is combined with a 16-bit data-word of the tile. The
result is a packet of 5x4 bits, which can be transported over
a lane. The organization of this 20 bit packet is given in
Fig. 6.
With only a four bit forward lane from source to desti-
nation and no feedback, we have to assume the destination
can consume the data. In this case we do not support end-
0-7695-2312-9/05/$20.00 (c) 2005 IEEE

	



Figure 7. Details of a Single Lane
to-end flow control. To overcome this problem an acknowl-
edgement signal is added in the reverse direction. The new
lane consists of 4 data signals and 1 acknowledge signal in
the reverse direction (see Fig. 7). Depending on the applica-
tion one or more lanes and zero, one or more acknowledge-
ments signals can be used.
The acknowledgement signal is used in combination
with a window counter mechanism. This mechanism will
prevent a buffer overflow at the destination of a connec-
tion. Every source has a local window counter of size WC.
This local window counter indicates how many data-packet
the source is allowed to send to the destination. The des-
tination will send an acknowledgement signal when it has
read X data-packets, whereX ≤WC. When the source re-
ceives an acknowledge signal it increases its local window
counter (WC) by X. By configuring the use of the acknowl-
edgement signal and size of X and WC we can support both
blocking and non-blocking communication.
6. Traffic Patterns
Benchmarking a NoC router is not a trivial task, because
as far as we know no general method has been defined for
on-chip networks. We expect that the power consumption
of a single router is at least dependent on three parameters:
1. The average load of every data stream. This varies be-
tween 0% and 100% of the available bandwidth of a
single lane.
2. The amount of bit-flips in the data stream. This varies
from no bit-flips (ie. transmitting constant values) to
continuous bit-flips.
3. The number of concurrent data streams through the
router, which in our case has a maximum equal to the
number of lanes (20).
6.1. Used Traffic Patterns
To test the parameter sensitivity of our router we defined
a test set for traffic patterns. This set has three levels for the
number of bit-flips:
• Best case (no bit-flips, transmitting only zeros)
• Worst case (continuous bit-flips)
• Typical case (random data with 50% bit-flips).
Furthermore, to vary the amount of traffic which concur-
rently traverse the router we defined four scenarios. The sce-
narios are a combination of concurrent data-streams that are
listed in Table 3. All three data streams have a load of 100%.
The first scenario is a situation where no-data traverse
the router during the time of the simulation. This will give
Stream Input port Output port
1 Tile Router (East)
2 Router (North) Tile
3 Router (West) Router (East)
Table 3. Stream Definitions



	











(a) Scenario II




	










(b) Scenario III




	











(c) Scenario IV
Figure 8. Examples of Three Test Scenario’s
the static offset in the dynamic power consumption. Sce-
nario two, three and four are depicted in Fig. 8. Scenario II
simulates the communication between the tile interface and
a link. Scenario III extends Scenario II with communication
between a link and the tile interface. Scenario IV also simu-
lates a data stream that passes the router. This scenario also
gives an indication of the difference between time and lane
multiplexing, because two streams will be routed to the out-
put port East.
7. Results
The proposed circuit-switched router of this pa-
per has been compared with a packet-switched equiva-
lent of Kavaldjiev [6]. Both routers have bi-directional links
of 16 bits. At the same frequency they have the same max-
imum bandwidth and bounded latency for guaranteed
throughput traffic.
7.1. Synthesis Results
Both routers are synthesized in the same 0.13 µm tech-
nology. We used a TSMC low voltage, nominal VT
(TCB013LVHP) standard cell library with dielectric con-
stant (Low-k) insulators. Table 4 gives the synthesis results.
Furthermore, the last column of Table 4 includes the syn-
thesized and layouted results of the Æthereal router [5].
7.2. Power Estimation
We estimated the power consumption of our circuit-
switched router and the packet-switched router of Kavald-
jiev using the Synopsys Power Compiler [21]. Power Com-
piler distinguishes between two types of power consump-
tion: static and dynamic. The static power consumption is
the power dissipated by a gate when it is not switching.
The dynamic power is the power dissipated when the cir-
cuit is active. The dynamic power is composed of two kinds
0-7695-2312-9/05/$20.00 (c) 2005 IEEE
Router Circuit Packet Æthereal [5]
switched switched
Ports 5 5 6
Width of data 16 bit 16 bit 32 bit
Crossbar 0.0258mm2 0.0706mm2 n.a.
Buffering - 0.1034mm2 n.a.
Arbitration - 0.0022mm2 n.a.
Configuration 0.0090mm2 - -
Data converter 0.0158mm2 - -
Misc - 0.0038mm2 n.a.
Total 0.0506mm2 0.1800mm2 0.1750mm2
Max freq. 1075 MHz 507 MHz 500 MHz
Bandwidth/link 17.2 Gb/s 8.1 Gb/s 16 Gb/s
Table 4. Synthesis Results of Three Routers
0
200
400
600
800
1000
1200
1400
Scenario  
I
Scenario  
II
Scenario
III
Scenario
IV
Scenario  
I
Scenario  
II
Scenario
III
Scenario
IV
Circuit Switched Router Packet Switched Router
Po
w
er
 
Co
n
su
m
pt
io
n
 
[u
W
]
Static Power
Dynamic Power (internal cell)
Dynamic Power (switching)
Figure 9. Dynamic and Static Power Bars for
Different Scenarios (random data, 100% load)
of contributions: switching and internal cell. The switch-
ing power of a driving cell is the power dissipated by the
charging and discharging of the load capacitance at the out-
put of the cell. The internal cell power is any power dissi-
pated within the boundary of a cell.
Fig. 9 depicts the contribution of the three power types
for different traffic scenarios as described in Section 6.
The clock frequency of both routers is fixed at 25 MHz.
This gives a data-bandwidth of 80 Mbit/s per stream. These
streams meet the required bandwidth of an edge 2 of figure
3, which is 61.44 Mbit/s. It uses random data (50% bit-flips)
for every data stream as input. This 50% bit-flips is also no-
ticed in for example edge 2 of the UMTS receiver. The sim-
ulation time is 200 µs in which 2 kB of data is transported
per stream.
Fig. 10 depicts the data dependency of the dynamic
power consumption. We vary the amount of bit-flips in the
offered data stream. We use three cases: best-case (0%),
worst-case (100%) and typical case (50%).
7.3. Discussion
Analyzing the results of the synthesis and power estima-
tion of the circuit and packet-switched routers we made sev-
eral observations:
0% 50% 100%0
10
20
30
40
50
60
70
80
Percentage of data−bit flipsD
yn
am
ic
 p
ow
er
 c
on
su
m
pt
io
n 
[  µ
W
/M
H
z ]
Circuit switched:
    Scenario I
    Scenario II
    Scenario III
    Scenario IV
Packet switched:
    Scenario I
    Scenario II
    Scenario III
    Scenario IV
Figure 10. Data Dependency of the Dynamic
Power Consumption (100% load)
• The area and power consumption of the circuit-
switched router is 3.5 times less compared to the
packet-switched router. The main reasons for this dif-
ference are the necessary buffers and extra control in
the crossbar of the packet-switched router.
• Table 4 shows that the maximum bandwidth of both
routers can meet the required bandwidth of the wire-
less applications as has been described in section 3.
• In contradiction to our expectations the number of bit-
flips has only a minor influence on the dynamic power
consumption. We can also expect that a large set of
data streams will have similar toggle behavior as the
used random data set. A more relevant parameter is
the number of data streams that have to be routed con-
currently from the input lanes to the output lanes.
• The dynamic power consumption of scenario II up to
IV does not increase considerably compared with Sce-
nario I (no transport of data). This is caused by the
relative high offset in the dynamic power consump-
tion. For the circuit-switched router we expect to re-
duce this power consumption with clock gating. For
clock gating we can use the configuration informa-
tion of the router and switch off the unused lanes. If
clock gating is used, we expect that this offset will de-
crease. The lower offset will cause more variations in
the power consumption due to variations in the traf-
fic patterns.
• The last observation is the non-straight line in Sce-
nario III for the packet-switched router in Fig. 10.
This is caused by the collision of stream 1 and 3 at the
output port East. Apparently, the time-multiplexing of
the link will cause extra switching behavior in the con-
trol of the crossbar and therefore increase the power
consumption.
0-7695-2312-9/05/$20.00 (c) 2005 IEEE
8. Conclusion and Future Work
In this paper we proposed a new reconfigurable
circuit-switched router for a Network-on-Chip. The
circuit-switched Network-on-Chip was expected to bene-
fit from the common characteristics of three wireless appli-
cations. We showed that the circuit-switched NoC satisfies
the requirements for the three different wireless applica-
tions.
Compared with a packet-switched equivalent for the
guaranteed throughput traffic the circuit-switched router in-
deed has: Lower power consumption, a smaller chip area
and higher maximum throughput per direction. But it has
less flexibility and no support for best effort traffic.
Further analysis of the results showed that the number of
data streams is more important for the dynamic power con-
sumption than the number of bit-flips in the data stream. To
reduce large offset in the dynamic power consumption the
implementation of clock gating is useful.
For future implementations of the circuit-switched router
we plan to include clock gating schemes for reducing the
high offset in the dynamic power consumption. Further-
more, we want to define a generic tile interface so the router
can be embedded in a multi-tile SoC. This interface will
support several types of communication that can be used by
the application designers.
Acknowledgement
This research is conducted within the Smart Chips for
Smart Surroundings project (IST-001908) supported by the
Sixth Framework Programme of the European Community.
The Software described in this document is furnished un-
der a license from Synopsys (Northern Europe) Limited.
Synopsys and Synopsys product names described herein are
trademarks of Synopsys, Inc.
References
[1] “http://www.smart-chips.com.”
[2] L. T. Smit, “Energy-efficient wireless communication,” PhD
thesis, University of Twente, December 2003.
[3] L. T. Smit, et al., “Run-time mapping of applications to a het-
erogeneous reconfigurable tiled system on chip architecture,”
in Proceedings of the International Conference on Field-
Programmable Technology, December 2004.
[4] L. Benini and G. de Micheli, “Networks on chips: A new soc
paradigm,” IEEE Computer, vol. 35, no. 1, pp. 70–78, Jan-
uary 2002.
[5] J. Dielissen, et al., “Concepts and implementation of the
Philips network-on-chip,” in IP-Based SOC Design, Nov.
2003.
[6] N. Kavaldjiev, G. J. M. Smit, and P. G. Jansen, “A vir-
tual channel router for on-chip networks,” in Proceedings of
IEEE International SOC Conference. IEEE Computer So-
ciety Press, September 2004, pp. 289–293.
[7] M. Taylor, et al., “The raw microprocessor: A computational
fabric for software circuits and general purpose programs,”
IEEE Micro, vol. 22, no. 2, pp. 25–35, 2002.
[8] M. Millberg, et al., “Guaranteed bandwidth using looped
containers in temporally disjoint networks within the nos-
trum network on chip,” in Proceedings of the Design Au-
tomation and Test Europe Conference (DATE), February
2004.
[9] E. Bolotin, et al., “Qnoc: Qos architecture and design pro-
cess for network on chip,” The Journal of Systems Architec-
ture, December 2003, special issue on Networks on Chip.
[10] D. Wiklund and D. Liu, “Socbus: Switched network on
chip for hard real time systems,” in Proceedings of the In-
ternational Parallel and Distributed Processing Symposium
(IPDPS), Nice, France, April 2003.
[11] AMBA bus description, ARM, Inc., http://www.arm.com.
[12] J. Henkel, W. Wolf, and S. Chakradhar, “On-chip networks:
A scalable communication-centric embedded system design
paradigm,” in Proceedings of the 17th International Confer-
ence on VLSI Design. IEEE, 2004.
[13] K. Goossens, et al., “Interconnect and memory organization
in SOCs for advanced set-top boxes and TV — evolution,
analysis, and trends,” in Interconnect-Centric Design for Ad-
vanced SoC and NoC, J. Nurmi, et al., Eds. Kluwer, Apr.
2004, ch. 15, pp. 399–423.
[14] P. T. Wolkotte, G. J. M. Smit, and L. T. Smit, “Partitioning
of a drm receiver,” in Proceedings of the 9th International
OFDM-Workshop, Dresden, Germany, September 2004, pp.
299–304.
[15] Broadband Radio Access Networks (BRAN); Hiperlan type
2, ETSI TS 101 475 v1.2.2 (2001-02) ed., European
Telecommunication Standard Institute (ETSI), February
2001.
[16] P. M. Heysters, G. K. Rauwerda, and G. J. M. Smit, “Imple-
mentation of a HiperLAN/2 receiver on the reconfigurable
montium architecture,” in Proceedings of the 11th Reconfig-
urable Architectures Workshop (RAW 2004), Santa Fe´, New
Mexico, USA, April 26-27 2004, iSBN 0-7695-2132-0.
[17] H. Holma and A. Toskala, WCDMA for UMTS: Radio Ac-
cess for Third Generation Mobile Communications. John
Wiley & Sons, 2001.
[18] G. K. Rauwerda and G. J. M. Smit, “Implementation of a
flexible rake receiver in heterogeneous reconfigurable hard-
ware,” in Proceedings of the 2004 International Conference
on Field Programmable Technology (FPT), Brisbane, Aus-
tralia, December 6-8 2004.
[19] E. Rijpkema, et al., “Trade offs in the design of a router
with both guaranteed and best-effort services for networks
on chip,” in Proceedings of Design, Automation and Test in
Europe Conference, March 2003, pp. 350–355.
[20] J. Rexford and K. G. Shin, “Support for multiple classes of
traffic in multicomputer routers,” in Proceedings of the First
International Workshop on Parallel Computer Routing and
Communication. Springer-Verlag, 1994, pp. 116–130.
[21] “http://www.synopsys.com.”
0-7695-2312-9/05/$20.00 (c) 2005 IEEE
