Energy-efficient NoC for best-effort communication by Pascal T. Wolkotte et al.
ENERGY-EFFICIENT NOC FOR BEST-EFFORT COMMUNICATION
Pascal T. Wolkotte, Gerard J.M. Smit
University of Twente, Department of EEMCS




University of Karlsruhe, Institute for




A Network-on-Chip (NoC) is an energy-efﬁcient on-chip
communicationarchitectureforMulti-ProcessorSystem-on-
Chip (MPSoC) architectures. In an earlier paper we pro-
posedaenergy-efﬁcientreconﬁgurablecircuit-switchedNoC
to reduce the energy consumption compared to a packet-
switched NoC. In this paper we investigate a chordal slot-
ted ring and a bus architecture that can be used to handle
the best-effort trafﬁc in the system and conﬁgure the circuit-
switched network. Both architectures are compared on their
latency behavior and power consumption. At the same clock
frequency, the chordal ring has the major beneﬁt of a lower
latency and higher throughput. But the bus has a lower over-
all power consumption at the same frequency. However, if
we tune the frequency of the network to meet the through-
put requirements of control network, we see that the ring
consumes less energy per transported bit.
1. INTRODUCTION
In the Smart chipS for Smart Surroundings (4S) project [1]
weproposeaheterogeneousMulti-ProcessorSystem-on-Chip
(MPSoC) architecture withrun-time softwareand tools. The
MPSoCarchitecturecontainsaheterogeneoussetofprocess-
ing tiles interconnected by a Network-on-Chip (NoC). The
run-time software determines a near optimal mapping of ap-
plications to the heterogeneous architecture. The architec-
ture including the run-time software can replace inﬂexible
ASICs for future ambient systems.
Theseambientsystemsaretypicallybatterypoweredand
have to support a wide range of applications so they have
to be ﬂexible as well as energy-efﬁcient. To map ambient
applications on a parallel architecture like a MPSoC we as-
sume the application is represented as communicating par-
This research is conducted within the Smart Chips for Smart Surround-
ings project (IST-001908) supported by the Sixth Framework Programme
of the European Community. The Software described in this document
is furnished under a license from Synopsys (Northern Europe) Limited.
Synopsys and Synopsys product names described herein are trademarks of
Synopsys, Inc.
allel processes. One possible representation is a Kahn based
process graph model [2], which is a directed graph with
nodes representing sequential processes and edges repre-
senting FIFO communication between processes.
To reduce the energy consumption of the overall appli-
cation we aim to map the processes on the processing tile
that can execute it most efﬁciently. Due to the mapping of
processes to processing tiles on the MPSoC communication
is introduced.
Traditionally communication between processing tiles is
based on a shared bus. But for larger MPSoC with many
processing tiles it is expected that the bus will become a
bottleneck from both a performance, scalability and energy
point of view [3]. Therefore, we propose a multi-hop NoC,
where the network consists of a set of routers interconnected
by links. In this paper we assume a regular two dimensional
meshtopology oftherouters. Every router isconnected with
its four neighboring routers via bidirectional point-to-point
links and with a single processing tile via the tile interface.
1.1. NoC Architecture
The MPSoC architecture is organized as a centralized sys-
tem: oneprocessingnode, calledCentralCoordinationNode
(CCN), performs system coordination functions.
The main task of the CCN is to manage the system re-
sources. It performs run-time mapping of the newly arrived
applicationstosuitableprocessingtilesandinter-processcom-
munication to network links [4]. The CCN also tries to sat-
isfy Quality of Service (QoS) requirements, to optimize the
resources usage and to minimize the energy consumption.
Using the centralized approach we are able to deﬁne a NoC
with guarantees for the QoS constraints for the communica-
tion network.
For the NoC we deﬁned a network that can both han-
dle guaranteed throughput (GT) trafﬁc and best-effort (BE)
trafﬁc. The guaranteed throughput trafﬁc is deﬁned as data
streams that have a guaranteed throughput and a bounded
latency. The best-effort trafﬁc is deﬁned as trafﬁc where
neither throughput nor latency is guaranteed. The BE traf-
0-7803-9362-7/05/$20.00 ©2005 IEEE 197ﬁ c handles trafﬁ c like conﬁ guration data, interrupts, status
messages etc.
The deﬁ ned network is a reconﬁ gurable circuit switched
network [5]. The CCN conﬁ gures this network and con-
nects processing tiles via a part of the physical link, where
the bandwidth and latency is guaranteed and independent of
other trafﬁ c in the network. The best-effort trafﬁ c is not di-
rectly supported by the circuit switched network, but is han-




LAN/2, UMTS, Bluetooth and digital radio [6], [7], [8] we
observe for the NoC requirements that:
 TheseapplicationscanberepresentedasaKahnbased
process graph.
 The majority of data trafﬁ c between the processes has
periodic behavior and requires real-time guarantees.
 The application and the resulting communication will
remain active for seconds as an end-user will use an
application for seconds or minutes.
In [5] these application characteristics are used to com-
pare a packet-switched wormhole router [9] with the recon-
ﬁ gurable circuit-switched router on their energy consump-
tionandtimeperformance. Itturnsoutthatthecircuitswitched
routermatchesverywellthestreamingbehavioroftheappli-
cations. The major advantage of the circuit switched router
is3.5timeslowerpowerconsumptioncomparedtothepacket-
switched solution. The disadvantage of the circuit switched
router is that it has to be conﬁ gured by an external control
network. Solutions for this control network are analyzed in
this paper.
The rest of this paper is organized as follows. In sec-
tion 2 the two architecture options of the best-effort control
network are presented. To analyze the performance of the
networks we describe a power model for wires in section 3.
This model is used in section 4 for the performance analysis
of the control network. Section 5 concludes the paper.
2. NETWORK-ON-CHIP ARCHITECTURE
As described in the introduction, we have a hardware ar-
chitecture containing a set of heterogeneous processing tiles
interconnected by a Network-on-Chip. For the moment the
tiles and NoC are synchronized by the same clock.
Fundamentally, sharing resources and giving guarantees
areconﬂicting, andefﬁ cientlycombiningguaranteedthrough-
put (GT) trafﬁ c with best-effort (BE) trafﬁ c is hard [10]. By
using dedicated techniques for both types of trafﬁ c we try
to reduce the total area and power consumption. In this pa-
per we concentrate on the architecture for communication









(b) Chordal ring layout
Fig. 1. Conﬁ guration network layout
conﬁ guration packets that are required for (re)conﬁ guring
the circuit switched network.
2.1. Control Network
For the control network we evaluate two different communi-
cationarchitectures(seeFigure1)thatconnectalltheprocess-
ing tiles, where lwire is the distance between two neighbor-
ing processing tiles1 and N is the number of processing tiles
in the system. The ﬁ rst architecture is a bus architecture
where every tile has both a master and a slave port. The sec-
ond architecture is a chordal slotted ring architecture. Both
architectures can support only single packet transfers and no
bursttransfers. Bursttransfersarehandledbytheguaranteed
throughput network.
We assume that the number of wires that can be used for
the total communication network (GT and BE) is limited.
Therefore, we serialize the data for the BE network in the
same manner as the GT trafﬁ c. A second reason for seri-
alizing the data is that the required performance for the BE
part of the trafﬁ c is low (does not require a very high band-
width) [5]. For both control networks we serialize a 24 bits
data-item (8 bits control/address, 16 bits data) into a packet
of lp a ck et ﬂits. In this paper we transport 4 bits in parallel,
which makes lp a ck et equal to six ﬂits.
2.1.1. Bus
The ﬁ rst architecture is a bus architecture. An example of
a bus architecture is given in Figure 1(a). The bus has a
centralized arbiter (A) that controls the access to the bus.
Every tile in the bus can act as a master and as a slave. If a
tile wants to access the bus it sends a request to the central
arbiter. The arbiter decides with a round-robin arbitration
schedule which requesting tile may use the bus for a single
bus-transfer. The tile, which request is granted, can use the
bus for transferring one single packet of 24 bits.
For the physical bus architecture we evaluated two types
of bus layouts that are depicted in Figure 2.
1In this paper we assume that all tiles have an equal size. Similar results
can be obtained with tiles that do not have an equal size
198Long wire (4 bits)
th a t c onnec ts a l l  th e N
til es to one l a rge bu s
T ri- sta te d riv ers
th a t c a n d riv e th e
c om p l ete bu s
(a) Tristate bus
Multiplexer
D e- m ultiplexer
Long links (4 bits)
w ith  le ngth  l      (n)
(b) Multiplexer bus
Fig. 2. Bus options
The tristate bus connects all the tiles via bidirectional
wires that directly connect all the processing tiles. The tiles
connect their outputs via a tristate driver to the bus. The
advantage of this bus-layout is that the number of wires is
minimal. But the load noticed by the drivers of the bus is
high, due to the many receivers and large active wire length
(that spans among all the tiles). The number of active re-
ceivers in the bus is equal to the number of tiles (N). The
total length of the wires required to reach all the receivers is
equal to the total length of wires in the bus. This is equal to
(N  1) · lwire in the regular mesh layout of Figure 1(a).
The multiplexer bus connects all the tiles outputs with
point-to-point wires to a central multiplexer near the arbiter
(see Figure 2(b)). The arbiter selects the granted tile and
analyzes the destination address. Using this address, the ar-
biter will forward the data to the destination tiles by select-
ing the correct output of the central de-multiplexer. Com-
pared to the tristate bus the data will only be driving wires
that are needed between the source and destination (i.e. be-
tweensourceandmultiplexer, sourceandarbiter, de-multiplexer
and destination). This will decrease the amount of wire seg-
ments that are active, but it will increase the total amount of
wires on the MPSoC.
2.1.2. Chordal slotted ring
The second architecture is a chordal slotted ring architec-
ture that has a perimeter equal to N hops. An example of a
chordal ring is given in Figure 1(b). The ring is organized as
two uni-directional rings (rin g 0 and rin g 1) that connect all
the tiles in a large ring. On the ring there are one or more dy-
namically created slots that each can transport a data packet
of 24 bits. At every tile the slot is stored, the destination
address is analyzed and forwarded to the next tile if the des-
tination is not reached. Due to the two uni-directional rings
the maximum number of hops between two tiles is N/2. Be-
cause of the store-and-forward mechanism the ring can han-
dle 2 · N slots concurrently.
During operation of the total SoC not all tiles may be
required. To reduce the number of hops extra short-cuts (in-
dicated by dashed lines in Figure 1(b) between tiles in the
ring are possible. At run-time we can re-conﬁ gure the ring
and use the short-cuts instead of the standard ring route. The
re-conﬁ guration of the ring is handled by the CCN. This will
reduce the ring-perimeter and therefore the maximum num-
ber of hops. Furthermore, in-active parts of the ring can be
switched-off to save energy2.
3. PERFORMANCE ANALYSIS WIRE
In section 4 we analyze the performance of the control net-
work. The performance of the logic can be determined by
modelling the design in VHDL and using Synopsys Power
Compiler [11] for power estimation in .13 technology. This
tooldoesnotincludethelongwiresbetweenthelogicblocks.
For the long wires between the logic blocks we use an ana-
lytical model of a wire.
For the power ﬁ gures of a wire we include the drivers
and repeaters that are required in a link between two routing
structures. In [12] the power of a link between two routers
is given by:
Plink = (Pd r iv e r s + Pr e pe a te r s + Pw ir e ) · Nw ir e s (1)
where Nwires is equal to the number of parallel wires of the
link, which is equal to 4 in this paper. Each power factor can
be deﬁ ned as the sum of dynamic and leakage power. In this
paper we only focus on the dynamic energy consumption of
thelinksasleakagepowerin.13technology isminimal[13].
In this paper we do not include repeaters for performance-
increase of the wires, thus Prepea ters is set to zero. Via sim-
ulation we discovered that for frequencies is less than 100
MHz the repeaters can be safely ignored.
In [13] it is shown that the dynamic power consumption
of a link (wire including the driver) is equal to:
Plinkdyn = (s(cp + c0) + c · lw ir e )V
2
D D fc lk ·Nw ir e s (2)
Where  is equal to the switching factor (or activity factor),
lwire the length of the wire in mm and c0,cp,c and s are
determined by the process, wire pitch and wire dimensions.
We use c0 = 1.7[fF ],cp = 3.5[fF ],c = 240[fF /m m ] and
s = 151, which are the values given for 0.13µ m technology
by [13]. For the voltage we use a VD D = 1V , which is also
used for the power estimation of the logic blocks.
Theactivityfactorisdata(theamountofbit-toggles)and
load dependent. In a typical data-stream we have a possi-
bility of 50% possibility for a data change from 0 to 1 or
visa versa. Therefore, for typical data-streams the activ-
ity factor is then only related to the load on the link  =
0.5 · Llin k , where Llin k is the average load of the link, with
0  Llin k  1. With  = 0.5 · Llin k , fclk = 1M H z and
VD D = 1V we get the power consumption in µ W /M H z:
Plinkdyn = 0.5 · (s(c0 + cs) + c · lw ir e ) · Nw ir e s · Llink
= (0.39 + 0.12 · lw ir e ) · Nw ir e s · Llink (3)
2Also incorrect behaving tiles can be switched off, so the ring can avoid
one or more defect tiles.
1994. PERFORMANCE ANALYSIS CONTROL
NETWORK
The performance of the control network is analyzed on two
points. 1) Latency and 2) Power.
4.1. Latency
4.1.1. Bus
The latency of the bus is determined by two components: 1)
The time for a bus transfer and 2) The time of the arbiter
to grant a request of the tile. The time of a bus transfer
(Tbustransfer) after a request is granted is equal to lp a ck e t
due to the serialization of the data.
The time to grant a request is at least 1 cycle after the re-
quest has been sent to the arbiter. But this may be increased,
due to: 1) The number of other tiles that have an active re-
quest, 2) The priorities of these requests, 2) the average load
(Lbus) of the requests and 4) The time to handle a request,
which is equal to the time required for a bus transfer.
In our bus conﬁ guration we have built a round-robin ar-
bitration with N requesting tiles. This results that all re-
quests have the same priority level and that at most (N 1)
requests have to be served before the request of a speciﬁ c
tile is granted. The average waiting time is therefore equal
to:
Tbusreq u est = 0.5 · (N  1) · Lbus · Tbustransfer (4)
The total latency Tbus of one single bus transfer is then
equal to:
Tbus = 1 + Tbusreq u est + Tbustransfer (5)
4.1.2. Ring
The latency of the ring is determined by three components:
1) The time to access the ring and put the complete packet
on the ring Tr in g ac c ess, 2) The time to pass through interme-
diate hops of the ring and 3) The time required to get off the
ring Tr in g leav e. The total latency Tr in g of one single packet
transferred between two tiles is then equal to:
Tr in g = Tr in g ac c ess + Th o p · (Nh o p s  1) + Tr in g leav e (6)
Where Nh o p s is the distance between two tiles on the ring
(i.e. two neighboring tiles have Nh o p s equal to 1) and Th o p
is the number of cycles to pass one ring block. Because
the priority in the ring-arbiter is always in favor of the ring
Th o p = 5 [cycles] and Tr in g leav e = 1 [cycle]. The time
to put data on the ring at least lp a ck e t + 2 [cycles]. But
this can increase if the sending tile is blocked, due to the
locally observed load of the ring (Lr in g ). For lp a ck e t = 6
the average access latency of the ring is measured with a
VHDL-model of the ring. The relation between the average
access latency of Tr in g ac c ess and the observed load of the
ring is presented in Figure 3.













































Fig. 3. Access latency noticed by the sending tile
4.1.3. Comparison
Figure 4 depicts the average latency that one single packet
notices to access the communication network and reach its
destination. For the bus this is equal to Tbus and is indepen-
dent on the distance between sending and receiving tile. For
the ring this is equal to Tr in g and is dependent on the dis-
tance between the sending and receiving tile. In Figure 4 we
assumed that the trafﬁ c is uniform distributed. This results
in a average number of required hops equal to N/4. This ﬁ g-
ure shows that the latency of a packet on the ring increases
with the number of tiles.
4.2. Power
The power consumption of the proposals for the control net-
work depends on the number of tiles (N), the length of the
actively used wires between the tiles and the trafﬁ c load gen-
erated by the tiles Ltile s, where 0  Ltile s  1. We varied
the amount of trafﬁ c to derive the load dependency. For all
used data-sets, we observed an average switching activity of
50%.
4.2.1. Bus
For the power consumption of the total bus we have three
components: 1) The interface block at every tile interface.
This block is used to serialize/de-serialize the data and con-
troltherequest/acknoledgeprotocolwiththearbiter, 2)Wires











































Fig. 4. Tile-to-Tile latency noticed by the packet
200for data, select and request/acknowledgement lines and 3)
Centralized arbiter and its multiplexer.
The power consumption of a interface-block can be esti-
mated with a linear equation depending on an constant offset
and a variable part depending on the input and output load:
Pinterfacedyn = ibs + ibdin · Lb u s2tile + ibdo u t · Ltile2b u s (7)
where Lbus2tile = Ltile 2bus is the load of the bus inter-
face in each direction. We varied both input load (Lbus2tile)
and output load (Ltile 2bus) of the interface to derive the con-
stants ibs, ibdin and ibdo u t . Using Synopsys power compiler
we estimated the constants, which are equal to ibs = 4.30,
ibdin = 2.01 and ibdo u t = 1.30 for .13µ m and VD D = 1V .
The power consumption in the wires in the bus depends
onthelayout ofthebus. Ifweusethecentralmultiplexer op-
tionwehavepoint-to-pointwirestothecentralmultiplexer/de-
multiplexer. The maximum length of the point to point wires
determines the maximum frequency of the bus. This maxi-
mum length is equal to lw ir e · (

N  1), which is the dis-
tance from a corner of the chip to central placed multiplexer
and de-multiplexer. The average length of the point-to-point
wires determines the power consumption at an uniform dis-
tributed trafﬁ c offer. The average length of the active wires
is equal to: lm ux  0.5 · lw ir e ·






N processing tiles like Figure 1(a) and the ar-
biter is centralized. For non-square and/or non-centralized
layouts the average length of the wires increases.
Combining the average wire length with Equation 3 and
7 the power dissipation for a 4 bit wide bus can be esti-
mated. It equals the total offset in dynamic power consump-
tion of all the interfaces, twice the power consumption of an
point-to-point link (one from source to the multiplexer and
one from de-multiplexer to the destination) with the min-
imum average length lm ux and the load dependent power
consumption of both receiving and sending interfaces. The
total power consumption of the bus is than equal to:
Pb u s = Pinterface + Pw ires + Parb iter
 ibs · N + (ibdin + ibdo u t ) · Lb u s +
2 · 0.5 · Lb u s(s(c0 + cs) + c · lm u x ) · Nw ires
= 4.30 · N + 6.45 + 0.96 · lm u x · Lb u s (8)
Where the energy consumption of the arbiter, multiplexers
and request/acknowledgement wires are neglected.
If the bus has the tristate structure we have point-to-
multi-point wires. The total length of the active wires is
then always equal to ltr ista te = (N 1)·lw ir e. These wires
connect one active driver with N receivers. But due to the
point-to-multi-point organization of the wire we constructed
a lumped wiring model of the tristate bus. From this model
we derived the power consumption in the tristate organized
bus:
Pb u s = Pinterface + Pw ires + Parb iter
= ibs · N + (ibdin + ibdo u t ) · Lb u s +
(0.5 · Lb u s(s(c0 · N + cs) + c · ltristate) · Nw ires)
= 4.30 · N + (4.37 + 0.51 · N + 0.48ltristate) · Lb u s(9)
4.2.2. Ring
For the power consumption of the total ring we have two
components: 1)Ringblockateverytileinterfaceand2)Wires
between the ring blocks. For the ring block the power con-
sumption can be estimated by an linear equation depending
on the load of the ring:
Pring b lo ck dyn = irs + ird · (Lring 0 + Lring 1) (10)
where Lr in g x is the locally observed load of rin g x and 0 
Lr in g x  1. With the VHDL-model of the ring block a
power estimation with Synopsys power compiler has been
performed to ﬁ nd values for the constants irs and ird. We
found the offset constant irs = 8.48 and the load dependent
constant ird = 1.91 for .13µ m and VD D = 1V .
Combine Equation 3 and 10 results in the total power
consumption of the ring:
Pring = (ird + 0.5 · (s(c0 + cs) + c · lw ire) · Nw ires +
irs) · N · (Lring 0 + Lring 1)





Figure 5 depicts the total dynamic power consumption of
the control network depending on the number of tiles. The
load of the control network is set to 100% to get the worst-
case overall power consumption of the control network. The
length between the tiles (lw ir e) is set to 2 mm. From Fig-
ure 5 it can be concluded that the most power-efﬁ cient con-
trol network is the bus with the central multiplexer. But in
this ﬁ gure we do not take the total throughput of the control
network into account.













































































Fig. 6. Dynamic power consumption at a speciﬁ c average
throughput with a network load of 50% (line with diamonds
is the ring))
Therefore, wecomparedthemultiplexbuswiththechordal
slotted ring and normalized the ﬁ gures to a given through-
put. From a latency point of view (see Figure 3) we want
to have at most 50% average load on the control network.
With a load of > 50% the access latency of the ring gets
too high. Using this constraint on the load we determine
the required frequency of the network to be able to offer
the required throughput of the control network. With this
frequency we determine the dynamic power consumption
of the network, which is depicted in Figure 6. Figure 6
shows that: 1) The chordal ring can offer approximately 3.4
times more throughput at the same power consumption and
2) The chordal ring consumes 3.2 times less power at the
same throughput requirements.
5. CONCLUSION
In this paper two communication architectures are analyzed
thatcanbeusedforbest-effortcommunicationinaNetwork-
on-Chip. This best-effort communication is required for
controlofthetilesandforconﬁ gurationofthecircuit-switched
network that handles the guaranteed throughput trafﬁ c. We
have two implementations: one based on a bus and one
basedonachordalslottedring. Botharchitectureshavebeen
compared on their latency behavior and power consumption.
Based on the results with the latency and throughput
analysis we can conclude that the ring is the best suited net-
work if a low clock frequency, high throughput and/or low
access latency is required for the control network. Espe-
cially, the latency of the bus increases rapidly with the num-
ber of tiles.
Based on the results of the power analysis we can con-
clude that the overall power consumption of the bus is rela-
tively low compared to the ring architecture. This is mainly
caused by the smaller interface blocks between the network
and the processing tile. However, if we tune the frequency
of the network to meet the throughput requirements of the
control network, we see that the ring consumes less energy
per transported bit. This is due to the concurrent commu-




[2] G. Kahn, “The semantics of a simple language for parallel
programming,” in Information processing, J. L. Rosenfeld,
Ed. Stockholm, Sweden: North Holland, Amsterdam, Aug
1974, pp. 471–475.
[3] L. Benini and G. de Micheli, “Networks on chips: A new
soc paradigm,” IEEE Computer, vol. 35, no. 1, pp. 70–78,
January 2002.
[4] L. T. Smit, et al., “Run-time mapping of applications to
a heterogeneous reconﬁ gurable tiled system on chip archi-
tecture,” in Proceedings of the International Conference on
Field-Programmable Technology, December 2004.
[5] P. T. Wolkotte, et al., “An energy-efﬁ cient reconﬁ gurable
circuit-switched network-on-chip,” in Proceedings of the
12th Reconﬁgurable Architectures Workshop (RAW 2005),
Denver, Colorado, USA, April 4-5 2005.
[6] P. M. Heysters, G. K. Rauwerda, and G. J. M. Smit, “Im-
plementation of a HiperLAN/2 receiver on the reconﬁ gurable
montium architecture,” in Proceedings of the 11th Reconﬁg-
urable Architectures Workshop (RAW 2004), Santa F´ e, New
Mexico, USA, April 26-27 2004, iSBN 0-7695-2132-0.
[7] G. K. Rauwerda and G. J. M. Smit, “Implementation of a
ﬂexible rake receiver in heterogeneous reconﬁ gurable hard-
ware,” in Proceedings of the 2004 International Conference
on Field Programmable Technology (FPT), Brisbane, Aus-
tralia, December 6-8 2004.
[8] P. T. Wolkotte, G. J. M. Smit, and L. T. Smit, “Partitioning
of a drm receiver,” in Proceedings of the 9th International
OFDM-Workshop, Dresden, Germany, September 2004, pp.
299–304.
[9] N. Kavaldjiev, G. J. M. Smit, and P. G. Jansen, “A virtual
channel router for on-chip networks,” in Proceedings of IEEE
International SOC Conference. IEEE Computer Society
Press, September 2004, pp. 289–293.
[10] J. Rexford and K. G. Shin, “Support for multiple classes of
trafﬁ c in multicomputer routers,” in Proceedings of the First
International Workshop on Parallel Computer Routing and
Communication. Springer-Verlag, 1994, pp. 116–130.
[11] http://www.synopsys.com.
[12] A. Morgenshtein, et al., “Comparative analysis of serial vs.
parallel links in networks on chip,” in Proceedings SOC 2004
InternationalSymposiumonSystem-on-Chip. Tampere, Fin-
land: IEEE Computer Society Press, Los Alamitos, Califor-
nia, November 2004, iSBN 0-7803-8558-6.
[13] K. Banerjee and A. Mehrotra, “A power-optimal repeater in-
sertion methodology for global interconnects in nanometer
designs,” IEEE Transactions on Electron Devices, vol. 49,
no. 11, pp. 2001–2007, November 2002.
202