Modelling and Tools for Power Supply Variations Analysis in Networks-on-Chip by Dahir NS et al.
IEEE TRANSACTIONS ON COMPUTERS, VOL. XX, NO. X, NOVEMBER 201X 1
Modelling and Tools for Power Supply Variations
Analysis in Networks-on-Chip
Nizar Dahir, Student Member, IEEE, Terrence Mak, Member, IEEE, Fei Xia,
Alex Yakovlev, Senior Member, IEEE,
Abstract—Power supply integrity has become a critical concern with the rapid shrinking feature size and the ever increasing power
consumption in nanometre scale integration. In particular, on-chip communication, in platforms such as networks-on-chip (NoC),
dictates the power dissipation and overall system performance in multi-core systems and embedded computing architectures. These
architectures require a dedicated tool for analyzing the power supply noise which must embed distinctive communication characteristics
and spatial parameters. In this paper, we present a tool dedicated for determining the on-chip VDD drops due to communication
workload in NoCs. This tool integrates a fast power grid model, a NoC simulator, an on-chip link model and a microarchitectural
power model for router. The model has been rigorously verified using SPICE simulations. The proposed model and tools are further
exemplified through analyzing the impact of power supply noise for NoC links. Statistical timing analysis of NoC links in the presence of
power supply noise was performed to evaluate the bit error rates. This work would enable better understanding of the tradeoffs existing
in the design of NoCs, and the induced power supply noise due to on-chip communication. This understanding is crucial for the analysis
of the quality of service (QoS) of communication fabrics in NoCs at the early design stages.
Index Terms—Networks-on-chip, power supply noise, power grid simulation, on-chip routing, timing analysis, power grid granularity,
probability of error, bit error rate.
F
1 INTRODUCTION
POWER supply noise has adverse effects on digitalcircuit performance and reliability. It could cause
signal deterioration and create soft errors. Recently, it
has been reported that variation in power supply would
have significant impacts on operational frequency and
system power dissipation [1], [2]. Both resistive (IR) and
inductive (∆I) voltage drops are sources of power sup-
ply noise. The resistive voltage drop occurs mainly due
to the resistance of power delivery wires in the power
grid network and increases with the amount of current
delivered through these wires. On the other hand, the
inductive drop is mainly due to wire inductance in the
package as well as in the grid wires and is proportional
to the rate of change of current.
Technology scaling exacerbates the problem of power
supply noise for many reasons. Firstly, wire thickness
in the power network is rapidly shrinking. This sub-
stantially increases the resistance of the power delivery
wires. Also, the demand for power delivery is rapidly
increasing. These facts promotes higher IR drop. Sec-
ondly, higher switching frequency increases ∆I drop.
Thirdly, lower operating voltage decreases the noise
margin. Consequently, voltage drop as a percentage of
supply voltage is rapidly increasing. For example, the
voltage drop can be up to 30% of nominal supply voltage
Manuscript received xx, 201x; revised xx, 201x. Nizar Dahir, Fei Xia,
Alex Yakovlev are with the School of Electrical, Electronic & Computer
Engineering, Newcastle University, UK. E-mail: {nizar.dahir, fei.xia,
alex.yakovlev}@newcastle.ac.uk. Terrence Mak is with the Department of
Computer Science and Engineering, Chinese University of Hong Kong
email: stmak@cse.cuhk.edu.hk.
in 65 nm technology if the necessary precautions are
not taken [3]. Mitigating power supply noise becomes
a grand challenge for the sustainability of future large-
scale integration development.
Some research efforts have focused on studying the
impact of power supply noise on the performance of
VLSI circuits, while others are aimed to mitigate this
noise. The traditional technique for power supply noise
mitigation is the use of on-chip decoupling capacitors. In
[4], [5] techniques are proposed to determine the optimal
values and positions of decoupling capacitors for mini-
mizing the power supply noise during floor-planning.
Optimal power-gating scheduling is also proposed to
minimize the voltage drop caused by the switching
activities in the gated blocks [6].
Power delivery network design optimized for Dy-
namic Voltage Scaling systems (DVS) is also considered
in [7], where a model that uses the Markov decision
process (MDP) to minimize the total energy demand
when the system works in this power management
mode.
The above techniques are examples of power supply
noise mitigation at the circuit level. At higher levels and
for multi-core systems, workload assignment can have
significant impact on the induced power supply noise
in the system. In [8], a simulated-annealing approach is
employed to optimize assignment of workloads to the
cores, such that the resulting power supply noise can
be minimized. However, they considered independent
tasks running on these cores and, thus, ignored intra-
chip communication.
The emerging multi-core systems require dedicated
0000–0000/00/$00.00 c© 201x IEEE Published by the IEEE Computer Society
IEEE TRANSACTIONS ON COMPUTERS, VOL. XX, NO. X, NOVEMBER 201X 2
and high-performance on-chip communication sys-
tems. Network-on-chip (NoC) had been proposed as a
promising infrastructure to deliver scalable and high-
performance on-chip communication [9].
The power budget of the NoC takes up significant
overall portion in NoC-based systems. For instance, the
routers in MIT-RAW CMP network consumes about 40%
of the tile power, and the communication network takes
up to 35% of the overall system power [10]. A mea-
surement for the Intel’s 80-tile TeraFLOPS CMP reported
that the communication power budget is about 28% [11].
This imply that on-chip communication workload is re-
sponsible for a considerable portion of the overall power
supply noise. In contrast to conventional models for logic
or microprocessors, this portion of power supply noise
would have an interesting correlation to the temporal
and spatial distributions of the traffic load. This load
can be determined in the early design stages, once the
application characteristics are known.
Due to aggressive technology scaling, multi-core sys-
tems and, particularly, on-chip interconnection networks
are more and more prone to various sources of noise.
Apart from power supply noise, which is a major source
of errors, process variation, crosstalk, thermal and leak-
age are other examples of error sources. All of these
noises can cause errors and contribute to degradation of
performance. Thus, error control techniques are needed
for fault tolerance and to provide the quality of service
(QoS) required by the target application [12], [13]. More
importantly, it requires accurate estimations of error
and fault rates from various sources at early design
stage [14]. For independent noise sources, fault rates can
be modelled separately and their effects can be added
to account for a general estimation of fault tolerance
metrics.
In this work, a tool for analyzing power supply noise
dedicated for networks-on-chip is presented. It captures
the supply voltage variations caused by communication
loads across the chip. This model will allow us to
better understand the tradeoffs existing in the design
of communication links and, in particular, evaluate the
relationship between the voltage or frequency and fault
rate or bit error rate (BER). This relationship is crucial
for early stage analysis of the quality of service (QoS) of
communication fabrics in NoCs. The major contributions
of this paper are summarized as follows:
1) Develop a tool which employs an integrated model
of power supply noise in NoCs. Detailed circuit
level design parameters and application-specific on-
chip communication dynamics, including traffic pat-
tern and link bandwidth, are considered. This tool
provides a compact integration of NoC power and
area model, an NoC simulator, on-chip link model
and a power grid model.
2) Rigorous evaluation of the model accuracy and the
impact of power grid granularity on this accuracy
has been carried out using SPICE verifications. This,
also, gives an insight on the scalability and the
trade-offs between simulation time and accuracy of
the power grid model.
3) The model has been employed to analyze the power
supply noise in networks-on-chip. Novel observa-
tions about power supply noise distribution and
variation due to different routing algorithms and
traffic patterns are found.
4) The impact of the resulting power supply noise on
the performance has been studied. Statistical timing
analysis of the link delay caused by the power
supply noise is performed. Moreover, high level
fault metrics such as the probability of timing errors
and bit error rates are, also, evaluated based on real
world and synthetic communication scenarios.
2 BACKGROUND AND RELATED WORK
NoCs are used to connect components on the same chip.
The transfer of data is achieved in a way similar to
conventional computer networks where packet switch-
ing is used and packets are routed from the source
to the destination. A packet is split into smaller data
units called flits. The interconnected components can
be general purpose microprocessors, memory blocks or
control circuitry. Each component (IP) is attached to a
router which is used as a gateway to connect the IP to
other IPs and to route information for the overall system.
The term tile is often used to stand for the IP core and
the corresponding router.
Many tools have been developed to model NoC power
and area for early-stage design space exploration [15],
[16], [17]. These tools aim to help designers evaluate
their design in the early stages and explore different
design strategies and techniques which will result in an
initial estimation of the significance of a specific design
technique. Other researchers focused on optimizing floor
planning and topology [16], [18], [19], and application
mapping [20], [21]. The majority of these efforts aim
to minimize area and power and do not consider the
ever increasing problem of power supply noise which is
directly affected by the output of these design strategies.
Optimizing NoC design for power supply noise requires
a tool for modelling this noise to guide and evaluate the
optimization process. This paper’s aim is to provide such
a tool.
Power noise modelling requires models for both work-
load and power delivery grid. The workload models
determine the values and locations of the power con-
suming modules in the chip, while the power grid model
determines the supply voltage profile across the power
delivery grid in the presence of these workloads. This
section surveys the state-of-the-art in these two areas.
2.1 Power Grid Model
To analyze the power delivery grid in VLSI circuits, the
grid is modelled as an RLC network while loads are
often modelled as independent current sources [22] or
equivalent passive elements [23]. Determining the node
IEEE TRANSACTIONS ON COMPUTERS, VOL. XX, NO. X, NOVEMBER 201X 3
voltages for an m node power grid model requires solv-
ing the following system of partial differential equations
(PDEs).
Gv(t) + Cv′(t) = i(t) (1)
where G,C ∈ Rm×m are matrices representing memory-
less elements (resistors) and memory elements (induc-
tors and capacitors) respectively, while, v(t), i(t) ∈ Rm
are vectors of voltages and imposed independent current
sources at the grid nodes respectively. In this model, the
independent current sources are used to represent the
circuit activity across the chip.
Due to the enormous number of elements and nodes
in the grid, solving for the node voltages v(t) using
traditional circuit simulators, such as SPICE, is imprac-
tical in terms of both memory and simulation time.
This problem has been considered by many researchers.
Several solutions have been proposed to solve the power
grid size problem during both simulation and modelling.
For modelling, model order reduction (MOR) ap-
proaches were used to reduce model order before sim-
ulation. Multigrid-like [24], [25], hierarchical [26], [27],
partition-based [28] and Krylov subspace-based methods
[29] are examples of MOR. Other works focused on
reducing simulation time, for instance random walk-
based simulation [30]. Most of these methods are based
on iterative computation, and they are difficult to use for
analysis that involves solving the model several times
due to their high computational demand. Alternative
direct methods are preferable in this case, particularly
when the interest is in the peak noise rather than the
time profile of the noise.
In [31] a fast and direct model to determine the peak
power supply noise is proposed. The power grid is mod-
elled as a distributed RLC network excited by constant
voltage sources and switching capacitors (CLoad) are
used to model the on-chip circuit activity. The amount of
these capacitors, for a particular circuit, is determined by
the amount of charge (or energy) delivered to this circuit
during the switching time period ts. Approximating the
noise impulse with a linear ramp which reaches its
maximum at t = ts, the minimum voltage at node j in
the grid, V minj , is given by
V minj =
1
λj


k∑
i=1,i6=j
xi,jV
min
i +
1
2
k∑
i=1,i6=j
Ci,jVDD

 (2)
where λj =
∑k
i=1,i6=j xi,j+
1
2
∑k
i=1,i6=j Ci,j+C
Load
j and
xi,j = t
2
s/(6Li,j + 3Ri,jts). Ri,j , Li,j and Ci,j are the
resistance, inductance and capacitance between nodes i
and j in the power grid, respectively, ts is the switching
time, and CLoadj is the equivalent capacitance of the load
at node j.
This model provides an accurate estimation of peak
power supply noise and a maximum error of 5% was
reported [31]. This power grid model can significantly
reduce the simulation time. It would be an ideal candi-
date for developing a tool for real-time supply variations
evaluation. However, this model assumes known load
equivalent capacitances. In this work, these capacitances
dynamically change in real time, thus, this power grid
model can be adopted after introducing a technique
to determine these switching load capacitances for the
communication fabric of NoC. This load includes both
links and router switching as detailed in Section 3.2.
2.2 Workload Model
Considerable amount of literature have been published
on workload modelling for power grid analysis. Tech-
niques that represent the workload with equivalent pas-
sive elements were reported. For example, in [23], a
macro model based on the effective impedance of the
current consumer is proposed. Other techniques that
are based on independent current source models are
also used. For example macro-models can be used to
determine the waveforms of these current sources. In
[22], a frequency domain current macro-model is pro-
posed where the input vector pairs of the circuits are
partitioned according to the hamming distance and a
current macro-model is built for each distance using
regression. However, these workload models assume
computation workloads are independent current sources
or passive elements. Task dependencies and correlation
between the computational cores are ignored and hence
workload due to communication cannot be captured in
these models.
Independent current sources or passive elements can
be a reasonable representation of workloads for simple
logic or, to an extent, for microprocessors. For complex
on-chip communication systems, such as networks-on-
chip, dynamic power consumption through the on-chip
communication infrastructure imposes a significant vari-
ation in power and load. Therefore, an effective model
that captures the communication dynamics is required.
This model must integrate both the router circuit and
links workloads.
For the router circuit workload, a router microarchitec-
tural power model [17] can be integrated with a circuit
activity simulator to characterize this workload.
A model for on-chip link is essential for determining
on-chip communication loads. In literature, a number
of current [32], energy [33], and power [34], [35] based
models for on-chip interconnects have been proposed.
On-chip interconnects are modelled as capacitively and
inductively coupled distributed RLC lines. In [32], an
analytical model for on-chip link current based on de-
coupling techniques is presented. On-chip link wires are
driven by exponential voltage source (VS) and loaded
by capacitor (CL). A two-port network with source and
load impedances is employed to derive a closed form
for wire current. The link current can be obtained using
decoupling transformation [34].
IEEE TRANSACTIONS ON COMPUTERS, VOL. XX, NO. X, NOVEMBER 201X 4
3 METHODOLOGY
On-chip communication traffic produces a considerable
portion of the overall power supply noise in NoCs.
In contrast to conventional models for logic or micro-
processors, this portion of engendered noise or VDD
variation would have an interesting correlation to the
spatial distribution of the traffic load which is a direct
result of the network-level design outcomes, such as
application mapping and routing path allocation (or
routing algorithm).
Power supply noise model requires a detailed consid-
eration of both workload and power grid models. Fig.
1 illustrates the inputs, outputs and the models used
for power supply noise computation in NoCs in this
work. The technology and architectural parameter files,
in addition to the floorplan information are taken as
input to the model. These files are given to the NoC
power model to compute the power traces of NoC com-
ponents. For router, we used the well known NoC router
power and area model Orion [17]. This model is fast,
accurate and easy to integrate with other models. More
importantly, this is an architecture-level model which
enables the macro-modelling of power grid workload
for NoCs. Application characteristics files are fed to an
NoC simulator to generate traffic information. The open-
source SystemC-based NoC simulator, Noxim [36], is
modified and employed here because it supports a wide
range of traffic distributions and routing algorithms and
due to its efficiency and ease of configuration. Also, for
power grid model solution, the fast peak noise power
grid model proposed in [31] is adopted.
Integrating all these models in an automated flow
enables the computation of dynamic voltage variations
in NoCs. The tools and the benchmarks used in this work
are made available online [37]. The methods and tools
presented in this paper are applicable to a wide range
of topologies, including tree, mesh, torus etc. However,
for convenience we will describe it in the context of a
regular mesh topology.
3.1 Power Noise in Networks-on-Chip
Many design parameters can affect the spatial and tem-
poral distribution of communication loads in NoCs.
Routing algorithms, traffic patterns, and packet injection
rates are examples of these parameters. Consequently,
NoC communication workloads have spatial and tem-
poral distributions both are determined by the design
entities of the system [38], [39], [40]. This distribution in
time and space of communication loads is reflected as
spatial and temporal power supply noise distribution in
the power delivery grid.
Fig. 2 shows a general overview of a network-on-
chip and its power grid. The power grid is a grid of
metal wires, which can be modelled as an RLC network.
The power grid can have different topologies, e.g. mesh,
tree, and irregular. These grids usually span several
metal layers and they are hierarchical in nature. This
 
- NoC size, 
- NoC topology, 
- # of ports, 
- # of buffers, 
- link size,  
- link activity, 
- technology 
parameters, 
    - VDD,  
    - frequency 
    - etc. 
- chip floorplan, 
- package 
parameters. 
- Power grid 
topology. 
- per unit length 
R, L & C etc. 
(PTM)  
 
 
 
 
 
NoC router 
and links 
power models  
(ORION) 
Dynamic and 
static energy 
of; buffers, 
crossbar, switch 
allocator, clock 
and links. 
 
Dynamic and 
static 
power 
traces  
NoCs cycle 
accurate 
simulator 
(NOXIM) 
Power 
delivery 
network 
model 
 
NoC architecture 
and implementation 
parameter files 
 
Chip and package 
parameter files 
 
- Temporal and 
spatial supply 
voltage 
variations 
 
Results 
- Application 
characteristics 
- # of tasks. 
- bandwidth and 
data volume 
requirements. 
 
 
 
Mapping Traffic 
distribution  
Application 
characteristics 
file 
Fig. 1. Computational flow for NoC power supply noise
modelling.
implies that segment length and width decreases (grid
granularity increases) as we go from more global to more
local power grid nodes. Some nodes in the upper layers
are connected to package VDD (and ground) pads. These
pads are modelled as RL segments here.
As mentioned earlier, power grid analysis for the
whole grid is not practical due to the huge size of the
resulting model. Thus, exploiting the hierarchical nature
of power grids, power grid analysis can be performed
using a macro modelling approach. A lumped model is
used to characterize individual blocks in the chip and
power grid of appropriate granularity is considered [26].
In this work a macro model is considered for routers. The
Fig. 2. NoC power delivery network.
IEEE TRANSACTIONS ON COMPUTERS, VOL. XX, NO. X, NOVEMBER 201X 5
Ctotalr (k) Total load equivalent capacitance for router r.
Clinksr (k)
Load equivalent capacitance for router r at cycle
k due to link traversals for all the links in the
router.
Cch(k)
Load equivalent capacitance at cycle k due to link
traversal of router channel ch.
Ccircuitr (k)
Load equivalent capacitance of router r at cycle
k due to circuit activity.
n Number of wires in the NoC data link.
R The set of all routers in the NoC.
CHr The set of all channels in router r.
Gr
The set of power grid nodes responsible for deliver-
ing power to router r.
SW
Vector of size n with elements representing the wire
switching direction, 0 for quiet, 1 for switching up
and -1 for switching down.
Ψ
The set of microarchitectural-level processes executed
by NoC router.
Er(k) Total energy delivered to router r at cycle .
Eψ Energy consumed when executing micro process ψ.
αψ(r, k)
The number of occurrences of process ψ in router r
at cycle k.
Ich The link current profile for channel ch.
g
Power grid granularity multiple used to map fine
grid to a coarse grid.
lf The fine power grid segment length.
lX The coarse power grid segment length.
Errorg
Power supply noise error due to grid granularity
reduction.
VX Voltage for the coarse grained grid model.
Vf Voltage for the fine grained grid model.
Gf Fine grained (original) power grid.
GX Coarse grained (reduced) power grid.
ti
clk Q
Clock-to-Q delay of latch i.
tisetup Critical setup time of latch i.
ti,jwire The delay of wire i, j.
Pr(Erri) Probability of error for link i.
γi Utilization of link i
TABLE 1
Definitions of symbols.
router is matched with a region of the power grid which
is determined by the floorplan information.
Fig. 3 illustrates links and routers of network-on-
chip. Router functional units and link’s equivalent circuit
including the drivers and loads are also shown. In this
work, the link wires are modelled as RLC interconnects
driven by exponential voltage sources and loaded by
capacitances (CL). NoC workload consists of the switch-
ing workloads of both router circuits and links. Router
workload is due to internal router processes. These
processes are: receiving a flit, route computation, switch
allocation and switch traversal. Link workload is due to
the switching of its drivers and repeaters.
3.2 Compartmental Modelling for Communication
Fabrics
The data in NoCs are routed using routers. Routers
include a cross-bar switch (see Fig. 3) which, for mesh
topology, comprises four input/output channels for
global communication (north, east, south and west) and
one input/output channel for local communication. The
link is driven by drivers, which are part of the sending
router circuitry, this implies that link traversal power
is supplied by the flit forwarding router and the re-
peaters along the link path (if any). In our model, the
workload of both routers and links are characterized
by capacitance. This capacitance is determined by the
charge delivered to the circuit during the switching time
period.
 
On-chip Link 
link CH 
h 
 
wire 1 
wire 2 
wire n 
C dh 
R dh L dh 
C dh 
C dh  
 
 
CL 
CL 
CL 
Routing and 
arbitration 
Input 
buffer 
Input 
buffer 
Crossbar 
switch 
R dh L dh 
R dh L dh 
C dh 
R dh L dh 
C dh 
R dh L dh 
C dh 
R dh L dh 
link 1 
Routing and 
arbitration 
Input 
buffer 
Input 
buffer 
Crossbar 
switch 
Router j Router i 
W
i
r
e
 
d
r
i
v
e
r
s
 
W
i
r
e
 
l
o
a
d
s
 
Fig. 3. Illustration of the NoC routers connecting tiles
i and j, and the equivalent circuit of on-chip links from
router i (sender) to j (receiving). n: link size, h: link length,
CH : number of router channel.
Based on the above, we compute the router capacitive
load for router r ∈ R (see Table 1) in the power grid at
the kth switching cycle (CLoadr (k)) as:
Cloadr (k) = C
links
r (k) + C
circuit
r (k). (3)
where Clinksr (k) is load equivalent capacitance due to
link traversal of the flits at cycle k, and Ccircuitr (k) is the
load equivalent capacitance due to router circuit activity
at cycle k. The capacitances here vary over time to reflect
dynamic communication load changes in the network.
3.2.1 Link Workload
Links traversal load is the summation of loads of all
cahnnel links of the router i.e.
Clinksr (k) =
∑
∀ch∈CHr
Cch(k) (4)
where, for mesh topology, the set of router channels are
CHr = {North,East, South,West, Local}.
Channel’s link load capacitance at cycle k, Cch(k), can
be computed from link’s current profile at that cycle as
follows:
Cch(k) =
Qch(k)
VDD
=
∫ Tclk
0
Ich(k, t)dt
VDD
(5)
where Qch(k) is the total charge delivered to the channel
link at cycle k, Ich(k, t) is the current profile of the
channel link at cycle k, and Tclk is the clock frequency
period. In this work the link model proposed in [32] is
employed to compute this current profile after modi-
fying the formula for total channel link current Ich to
include wire switching.
Let SW be a vector of size n (the number of link wires)
with elements swi that takes values of 0, 1 or -1, when
IEEE TRANSACTIONS ON COMPUTERS, VOL. XX, NO. X, NOVEMBER 201X 6
wire i is quiet, switching up (0→ 1), or switching down
(1→ 0), respectively. The time profile of the current draw
of channel link at cycle k, Ich(k, t), can be expressed as:
Ich(k, t) =
n∑
h=1
n∑
i=1
MTh,iswi,ch,k
n∑
j=1
MTi,jIi,ch,k(t) (6)
where Ii,ch,k is the current of wire i ∈ {1...n}, for the
channel link at cycle k, M is the decoupling transforma-
tion matrix [34] and swi,ch,k is the switching direction of
wire i at cycle k [32].
3.2.2 Router Workload
Router’s circuit capacitive load for a switching time
period can be computed from the energy consumed by
router’s circuit in this time period. This energy is de-
termined by integrating an NoC simulator, to determine
the processes are taking place in the router at each cycle,
and a router power model, which determines the energy
consumed by each of these processes.
Let Ψ={RECEIVE, ROUTE, FORWARD, STANDBY}
be the set of microarchitectural-level processes that can
be executed by the router. These processes are receiv-
ing a flit, route computation, forwarding a flit, and no
activity (static energy), respectively. The energy of the
RECEIVE process is the energy required for writing to
the input buffer. ROUTE process energy is required for
route computation, which is only performed for header
flits (for wormhole routing). FORWARD process energy
is the summation of energies required for reading from
input buffer , switch allocation, switch traversal and link
traversal. Also, let α(r, k) be the number of occurrences
of process ψ ∈ Ψ in router r at cycle k. Now, the total
energy delivered to router r at cycle k, Er(k), can be
expressed as:
Er(k) =
∑
∀ψ∈Ψ
Eψ.αψ(r, k). (7)
where Eψ is the energy required by the router circuit to
execute process ψ. This energy, for a particular router
design, can be computed using a router microarchitec-
tural power model, while, αψ can be determined using a
cycle accurate NoC simulator. The router load equivalent
capacitance at cycle k ( Ccircuitr (k)) can now be computed
as follows:
Ccircuitr (k) =
Er(k)
V 2DD
(8)
The load capacitance which results from Eq. 3 is used
to characterize the load in Eq. 2.
The following sections are based on the assumption
that router r is supplied with power through a set of
nodes (Gr) in the power grid, and in line with [41],
[42], the resulting router load is divided equally over
the set Gr. The set Gr is determined by the floorplan
information, i.e. based on the geometrical structure of
the grid and areas and positions of the routers.
3.3 Power Grid Granularity
In power grid simulation there are two techniques for
model solution, iterative and direct [27]. Due to the very
large grid size, the iterative technique is more suitable
for analyses which involve obtaining single system so-
lution. For instance, the DC analysis of power grids.
On the other hand, direct technique is more convenient
when multiple model solutions are necessary. This is
the case for this work, thus, we used a coarse-grained
lumped model for the power grid in order to achieve
significant results in a practical simulation time. Power
grids designed for real VLSI circuits may contain tens of
thousands or even millions of nodes [43] which results in
impractical simulation time and memory requirements.
Thus, a coarse grid approach was used in many previous
works [24], [25], [44]. In these works a multi-grid based
model order reduction is used and the number of nodes
in the power grid is reduced by node elimination. The
analysis is performed on the reduced coarse grid. Then,
the solution is mapped back to the original (fine-grained)
grid using linear interpolation, taking into account the
values of conductances between the nodes.
During the mapping from the fine grid (Gf ) to the
coarse grid (GX ) the geometrical coordinates of the
nodes in GX must be equal to their counterparts in Gf
to preserve the structure of the grid. For regular mesh
topology, this mapping takes the ratio (g = lX
lf
) of the
segment length of the fine grid, lf , and the segment
length of the coarse grid, lX , as input (see Fig. 4). To
keep the total resistance equal, a segment’s width must
be increased in proportion to increasing its length.
The relative error of the voltage drop for node j′ ∈ GX
due to grid granularity reduction can be expressed as
Errorg(j
′) =
∣∣∣∣
∆Vf (j
′)−∆VX(j)
∆Vf (j)
∣∣∣∣× 100%, (9)
where ∆Vf and ∆VX are the voltage drops in the fine
and coarse grids, respectively, j and j’ are the nodes
in the fine grid and its counterpart in the coarse grid,
respectively.
Fig. 4. Illustration of the mapping from fine-grained to a
coarse-grained models of the power delivery grid.
IEEE TRANSACTIONS ON COMPUTERS, VOL. XX, NO. X, NOVEMBER 201X 7
4 EXPERIMENTAL RESULTS AND DISCUSSION
4.1 Experimental Setup
To evaluate our model, we adopt the floorplan and
architecture of Intel’s TeraFlop tile [11]. A 38-bit com-
munication links are assumed with 3 GHz frequency.
The technology used is 65 nm with nominal VDD=1V .
Tile dimensions are 2mm height and 1.5 mm width. The
computational units’ power traces are estimated using
results presented in [11].
For the power delivery network (PDN), we used a
lumped model which includes both on-chip and off-
chip power delivery network models. The on-chip PDN
consists of a global level mesh structure routed in the
top metal layers. Unless otherwise mentioned, the on-
chip power network is modelled as RLC mesh with a
grid segment length such that we have 5× 5 granularity
per NoC tile. Based on our analysis (see Section 4.3) and
as suggested by [41], [45], this is enough for capturing
the power supply voltage variations across the chip with
reasonable accuracy and simulation time. The RLC val-
ues of the grid segments and link wires were determined
using PTM [46].
Orion [17] is used for router power computation. Com-
munication traffic simulation is done using Noxim [36], a
SystemC-based NoC simulator. A Gaussian random dis-
tribution is assumed for the link switching activities and
a Poisson distribution for the packet injection. A uniform
buffer size of 16 flits and a packet length of 3 flits are
assumed. These values are in line with Intel’s TeraFlop
NoC configuration [11]. These models are integrated in
an automated flow to compute the power supply voltage
variations as a function of activity for on-chip networks
(Fig. 1).
4.2 Model Verification
Firstly, we performed an experiment to evaluate the
accuracy of our model. The power trace of a 3 ×
3 NoC is computed under the Transpose traffic (in
which tile(i, j) sends packets to tile(j, i)) that results
from the NoC simulator for a packet injection rate of
0.015 packets/cycle/node. Then, a SPICE netlist of the cir-
cuit is generated. In this netlist the workload is modelled
as triangular current sources. The peaks of these current
sources are computed based on power traces taken from
the router and link power models and the activity of
the NoC. This will enable the evaluation of the voltage
variations resulting from integrating all the components
of the model together (activity, power, and grid models).
This circuit netlist is simulated in SPICE to obtain the
resulting grid node voltages. The same scenario is also
simulated using our model following the methodology
described in Section 3. Power grid node voltages ob-
tained from both the model and SPICE are presented in
Fig. 5, which shows good matching with a mean relative
error of only 4.7%.
4.3 Granularity Analysis
The impact of power grid granularity on both model
accuracy and simulation time is also evaluated. A power
grid for an area of 1mm2 is simulated assuming a
load of 38-wire link computed using Eq. 6. VDD drop
across the grid is determined with different power grid
granularities. Coarser grids are generated by doubling
the grid segment lengths and widths to preserve the
resultant resistance.
Table 2 shows the results of this analysis for a grid
granularity starting from 40×40 (6400 nodes) down to 5×
5 (25 nodes). Taking SPICE simulation of the 40×40 grid
as a baseline, both relative error (Eq. 9) and simulation
time are shown.
Results show that model accuracy decreases when the
granularity of the model is quartered. On the other hand,
simulation speed rapidly increases with granularity due
to higher model order reduction. Considering fixed grid
granularity per tile, the simulation time increases lin-
early with the NoC size (number of tiles) due to the
fact that the number of power grid nodes will increase
linearly with the number of tiles.
0 50 100 150 200
0.90
0.92
0.94
0.96
0.98
1.0
node number
n
o
de
 v
ol
ta
ge
 (
V)
 
 
model SPICE
Fig. 5. Comparison of node voltages between the pro-
posed computational model with SPICE simulation for a
3× 3 NoC configuration.
lX lf #of g = Errorg time using
(µm) (µm) nodes lX/lf (%) (s)
25 25 40× 40 (1600) 1 - 8.9 SPICE
25 25 40× 40 (1600) 1 1.98 1.152 model
50 25 20× 20 (400) 2 6.07 0.155 model
100 25 10× 10 (100) 4 8.6 0.022 model
200 25 5× 5 (25) 8 11.85 0.006 model
TABLE 2
Comparison between the proposed model with different
power grid granularities with SPICE simulation.
IEEE TRANSACTIONS ON COMPUTERS, VOL. XX, NO. X, NOVEMBER 201X 8
4.4 Synthetic Traffics and Routing Algorithms
In this section we present and discuss the results of
power supply noise caused by different synthetic traffic
patterns and routing algorithms. A 6 × 6 NoC is con-
sidered here. Traffics used are Random, Transpose, and
Hotspot. For the Random traffic each tile sends data to
all other tiles with equal probability. For the Transpose
case tile(i, j) sends packets to tile(j, i). For the Hotspot
traffic pattern the four central tiles receive an extra 5%
in addition to the uniform (Random) traffic. We also
considered four routing algorithms; XY, Odd-Even (OE),
Fully-Adaptive and Negative-First (NF).
The number of clock cycles necessary for capturing the
characteristics of the workload differs from one applica-
tion to another. For synthetic traffic it is noticed that the
power traces have been constantly repeated after 10,000
clock cycles. However, the model is run for 100,000 clock
cycles to guarantee the coverage of all workload charac-
teristics. The resulting VDD drop is plotted for different
routing algorithms (Fig. 6) and traffic distributions (Fig.
7) for a range of packet injection rates. The peak and
mean drops are shown for both figures. The achieved
throughputs for these routing algorithms and traffics are
also shown in Fig. 8. Spatial VDD drops under these
traffics and routing algorithms are given in Figures 9 and
10 respectively for a PIR of 0.015 packets/cycle/node.
Note that in general there is a considerable increase
in VDD drop with PIR. This is expected since a higher
packet injection rate leads to a higher throughput which
increases the switching activity of the routers and data
links and, in turn, raises the current draw causing a
higher VDD drop. However, different routing algorithms
and traffics behave differently in terms of VDD drop with
the increase of PIR.
For instance, consider the NF routing algorithm. For
low PIR (< 0.015) it can be noticed that it causes higher
peak (Fig. 6(a)) and mean (Fig. 6(b)) drops than the XY
and OE, although it achieves the same throughput (and
thus consumes the same power) within this PIR range
(see Fig.8(a)). This is due to the fact that NF algorithm
tends to migrate the traffic to the negative quarter of
the NoC mesh as can be seen in Fig. 9(d). This can
create hotspots that would suffer higher supply drop
due to unbalanced power density. On the other hand,
both XY and OE have more harmonic spatial workloads
compared to NF which leads to lower supply drop.
At high PIR (> 0.015) the VDD drop for both NF and
XY is less than that of OE due to the fact that OE achieves
higher throughput compared to XY and NF at this PIR
range. This is becuase the NoC starts to saturate and
throughput decreases for the latter two which is not the
case for OE (see Fig. 8(a)).
Looking at the traffic patterns, we can see that the
Hotspot traffic causes higher VDD peak drop due to the
centric nature of this traffic distribution and for the same
reasons discussed above. This drop reduces at higher
PIR (> 0.015) due to the reduced throughput caused by
saturation. Random traffic results in higher throughput
at this range of PIR (Fig. 8(b)) which causes higher peak
and mean drops.
Figures 9 and 10 show the spatial distributions of
selected routing algorithms and traffics, respectively. In
general, it can be observed that the power supply drop
which results from a traffic/routing is determined by
the amount of the workload of this traffic/routing and
increases with this workload. Also, the spatial distribu-
tion of a traffic/routing workload plays an important
role here. Highly unbalanced traffic/routing can lead to
significantly higher power supply noise compared to the
balanced traffic/routing, even for the same amount of
workload.
Table 3 summarizes the results of peak and mean
VDD drops for the considered set of traffics and routing
algorithms.
0.005 0.01 0.015 0.02 0.025
6
8
10
12
14
16
18
PIR (packet/cycle/node)
p
e
a
k 
V D
D
 
dr
op
 (
%)
 
 
XYZ
Odd−Even
Negative−First
(a) Peak VDD drop
0.005 0.01 0.015 0.02 0.025
2
3
4
5
6
7
8
9
10
PIR (packet/cycle/node)
m
e
a
n
 
V
D
D
 
dr
op
 (
%)
 
 
XYZ
Odd−Even
Negative First
(b) Mean VDD drop
Fig. 6. VDD drop for different routing algorithms versus
PIR.
0.005 0.01 0.015 0.02 0.025
6
8
10
12
14
16
18
PIR (packet/cycle/node)
p
e
a
k 
vo
lt
ag
e 
dr
op
 (
%)
 
 
Random
Transpose
Hotspot
(a) Peak VDD drop
0.005 0.01 0.015 0.02 0.025
2
3
4
5
6
7
8
9
10
PIR (packet/cycle/node)
m
e
a
n
 
V
D
D
 
dr
op
 (
%)
 
 
Random
Transpose
Hotspot
(b) Mean VDD drop
Fig. 7. VDD drop for traffic patterns versus PIR.
0.005 0.01 0.015 0.02 0.0250.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
PIR (packet/cycle/node)
t
hr
ou
gh
pu
t 
 (
fl
t/
cy
cl
/I
P)
 
 
XYZ
Odd−Even
Negative First
(a)
0.005 0.01 0.015 0.02 0.0250.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
PIR (packet/cycle/node)
t
hr
ou
gh
pu
t 
 (
fl
t/
cy
cl
/I
P)
 
 
Random
Transpose
Hotspot
(b)
Fig. 8. Throughput for differnet (a) routing algorithms, and
(b) traffic patterns.
4.5 Real Traffic
To generate a realistic communication scenario, a generic
complex MultiMedia System (MMS) which comprises
IEEE TRANSACTIONS ON COMPUTERS, VOL. XX, NO. X, NOVEMBER 201X 9
tile X
t
il
e 
Y
 
 
0
1
2
3
4
5
3 4 51 20
(a) XY
tile X
t
il
e 
Y
 
 
0 1 2 3 4 5
0
1
2
3
4
5
(b) Odd-Even
tile X
t
il
e 
Y
 
 
54320
0
1
2
3
4
5
1
(c) Fully-Adaptive
tile X
t
il
e 
Y
 
 
10
9
8
7
6
5
4
3
2
1
0
3
4
5
2
0 54321
[%]
(d) Negative-First
Fig. 9. Spatial distribution of mean VDD drop (%) for different routing algorithms and Transpose traffic.
tile X
t
il
e 
Y
 
 
0
0
1
3
4
5
1 2
2
43 5
(a) Random
tile X
t
il
e 
Y
 
 
0
1
2
3
4
5
3 4 51 20
(b) Transpose
tile X
t
il
e 
Y
 
 
54320
0
1
2
3
4
5
1
(c) Butterfly
tile X
t
il
e 
Y
 
 
10
9
8
7
6
5
4
3
2
0 1 2 3
1
3
4
5
2
0
54
[%]
(d) Hotspot (5%)
Fig. 10. Spatial distribution of mean VDD drop (%) for different synthetic traffics with XY routing.
TRAFFIC
Random Tranpose Hotspot
ROUTING peak mean peak mean peak mean
XY 13.63 5.6 12.95 5.6 13.18 5.6
Odd-Even 11.51 5.48 13.82 5.5 12.53 5.45
Negative-First 14.15 5.31 13.81 5.31 14.21 5.3
Fully-Adaptive 12.96 2.92 13.56 5.48 12.79 5.32
TABLE 3
Summary of VDD drop (%). Results of four different
routing algorithms and three traffic patterns.
H263 video encoder, an H263 video decoder, an MP3
audio encoder, and an MP3 audio decoder is used [20].
We considered three mapping strategies to map this
benchmark to a 5 × 5 NoC; maximizing performance
(minimizing packet latency) [21], minimizing energy [47]
and a random mapping. The resulting VDD drops in the
presence of the resulting three traffic patterns are com-
puted using our tool. The power trace for this benchmark
is found to be periodic with a period of nearly 70,000
clock cycles. The simulations are run for 100,000 cycles
to guarantee the coverage of workload characteristics.
Fig. 11 shows the spatial distribution of power sup-
ply drop. It can be seen that performance and energy
mappings are relatively close to each other in terms
of VDD drop. However, the energy-aware mapping has
slightly higher peak drop compared to performance-
aware mapping. It can also be seen that in this instance
of random mapping there is considerably higher drop
compared to performance and energy mappings. This
is caused by not only higher power for this mapping
(14W compared to 10.4W for the performance mapping
and 9.8W for the energy mapping), but also due to the
spatial distribution of this power profile which results in
higher power (and thus current) density in the central
tiles and leads to higher voltage drop.
Fig. 12 plots the traffic (in terms of routed flits) and
the corresponding VDD drop with time for two tiles from
the MMS benchmark with performance mapping. This
figure illustrates very high correlation between traffic
and supply drop. This can be explained by the fact that
power dissipation in NoCs is strongly correlated with
the network traffic load. Thus higher traffic is reflected
as higher current draw increasing supply drop on the
power grid. This also implies that higher/lower tempo-
ral variation of traffic (Fig. 12(a)) results in higher/lower
variation of VDD increasing power supply noise (Fig.
12(b)). It can be noticed that the DC component of VDD
drop is highly correlated to the average of the traffic
while the AC component is correlated to the temporal
variation of traffic.
0 10 20 30 40 500
10
20
30
40
50
60
70
80
simulation time (micro sec.)
# 
of
 r
ou
te
d 
fl
it
s 
 
 
tile11(mean=50.2,STD=15.31)
tile22(mean=3.29, STD=4.34)
(a) Traffic
0 10 20 30 40 501
2
3
4
5
6
7
simulation time (micro sec.)
V
D
D
 
dr
op
  
(%
)
 
 
tile11 (mean=5.05, STD=1.21)
tile22 (mean=1.36, STD=0.37)
(b) VDD drop
Fig. 12. Temporal variations of routed traffic and the
corresponding VDD drop for buzy (tile 11) and quite (tile
22) tiles, under the MMS traffic.
IEEE TRANSACTIONS ON COMPUTERS, VOL. XX, NO. X, NOVEMBER 201X 10
tile X
t
il
e 
Y
Peak: 8.0%   Mean: 2.76%
 
 
0
1
3
4
0 1 3 42
2
(a) Maximum performance mapping
tile X
t
il
e 
Y
Peak:8.44%   Mean:2.55%
 
 
1
3
4
0 1 2 3 4
2
0
(b) Minimum energy mapping
tile X
t
il
e 
Y
Peak:10.73%   Mean: 4.01%
 
 
10
8
6
4
2
0
2
3
0
1
20 1 3 4
4
[%]
(c) Random mapping
Fig. 11. Spatial distribution of mean VDD drop (%) for the MMS application traffic with three mapping strategies.
5 CASE STUDY: TIMING AND ERROR ANAL-
YSIS OF LINKS IN THE PRESENCE OF POWER
SUPPLY VARIATIONS
Power supply noise modelling for NoCs can find many
applications in the evaluation and design space explo-
ration at various levels. Power grid integrity analysis,
power supply noise-aware application mapping, and
floor planning, are examples of these applications. How-
ever, our tool can also be used to analyze the impact
of the resulting VDD variations on timing accuracy for
the circuit dominated paths [1], or link dominated paths,
such as communication links and clock distribution
networks [48], [49]. Here we perform a power supply
variations-aware statistical timing analysis of NoC links
which enables the computation of the probability of
switching errors for data links comprising the NoC
communication fabric.
5.1 Power Supply Variations Impact on Link Delay
A major impact of power supply noise on performance
can be seen by its impact on delay. Power supply voltage
drops can cause significant increases in delay for circuit
dominated as well as interconnect dominated paths [48],
[50]. This delay could lead to violations of timing con-
strains in these paths and thus generate soft errors. It
has been reported that for these timing constraints to
be met for a 20% of supply variation, a 42% decrease
in frequency is required for 65nm technology [48]. To
compute the probability of switching error due to timing
delays under power supply variations, a full knowledge
of the VDD variation distribution is needed. Also, a
delay versus VDD relationship for various components
of the link is necessary. The delay components of on-chip
global interconnects are illustrated in Fig. 13. Consider-
ing a synchronous data path between two tiles, i and
j, of a NoC data link and assuming a zero clock skew
between these two, the sum of the clock-to-Q delay of
the sending Flip-Flop (ticlk Q), the wire delay between
the two FFs (ti,jwire) in addition to the setup time of
the receiving FF (tjsetup) must not exceed the link clock
period Tclk [51] i.e.
ticlk Q + t
i,j
wire + t
j
setup < Tclk. (10)
 
FF i  
t
i,j
wire 
 C1 
R1 L1 
buffer 1 buffer 2 buffer n 
 C2 
R2 L2 
 Cn 
Rn Ln 
D Q 
  FF j 
D Q 
t
i
clk_Q t
i
setup 
 
t
j
clk_Q t
j
setup 
 
:one cycle link delay 
Fig. 13. A model of on-chip link illustrating the delay
components and timing constrains [51].
Violation to this timing constraint would lead to switch-
ing errors in the link.
To perform timing analysis, and in line with [1], [50],
we adopt a quadratic approximation to determine the
impact of VDD drop on these delay components, i.e.
td(∆VDD) = k1 + k2(∆VDD) + k3(∆VDD)
2 (11)
where, td(∆VDD) is any of the link timing components
on the left hand side of Eq. 10 and ki (i=1,2,3) are
technology dependant constants. Assuming 65 nm tech-
nology, we simulated an edge triggered D FF, which
comprises two master-slave D latches, in SPICE and
obtained the clock-to-Q and setup times of the FF under
VDD variation. We also obtained the wire delay for
a 2mm length buffered wire. Fig. 14 plots the delay
obtained for tclk Q, twire and tsetup when VDD drop is
varied from 0% (VDD = 1.0V ) to 25% (VDD = 0.75V ) of
nominal supply voltage with a step of 5% (50mV ). Using
these results, analytical formulas relating the clock-to-
Q delay, setup time and wire delays to VDD drop are
obtained using regression.
5.2 Probability of Timing Violation Errors and Bit
Error Rates
Using our model, we obtained the distribution of VDD
for all chip components. In general the obtained VDD
distribution consists of a DC component (or IR) drop,
which results in shifting the mean of the the VDD dis-
tribution and an AC component(or ∆I droop), which
results in variation of VDD . Using the formulas for
link delay (Eq. 11), a statistical timing analysis can be
performed to obtain the distribution of the resulting link
IEEE TRANSACTIONS ON COMPUTERS, VOL. XX, NO. X, NOVEMBER 201X 11
25201510500
50
100
150
200
250
300
V
DD
 drop (%)
de
la
y 
(p
s)
 
 
t
wire
t
clk_Q
t
setup
Fig. 14. The twire, tclk Q and tsetup link delays versus
VDD drop.
delay variations due to power supply variations for all
NoC links. Thus, IR drop translates into a delay skew
and ∆I droop translates into delay jitter for these links.
For FFs, (tclk Q) and tsetup are computed from the VDD
of the sending and receiving tiles respectively. The wire
delay (twire) is computed using the VDD between the
sending and receiving tiles.
The resulting delay distribution of a link can be used
to estimate the probability of timing error due to power
supply variations for that link. We estimate this proba-
bility as the portion of the delay distribution that does
not satisfy the constraint in Eq. 10. In other words the
probability of timing error on link l, Pr(Errl), can be
expressed as:
Pr(Errl) = Pr(tl > Tclk) (12)
where tl is the total link delay which is computed in the
presence of VDD variations for that link using Eq. 11.
Fig. 15 shows the results of this analysis for the MMS
benchmark with maximum performance mapping. Fig.
15(a) shows the distribution of delay means (skews) for
all links and Fig. 15(b) shows the distribution of delay
STDs (jitters) for these links. It is found that links with
the highest error probability belong to the tile, which
suffers the highest VDD drop, tile 2, as can be seen in
Fig. 11(a).
Using the probability of error for each link the average
bit error rate (BER) for the NoC can be computed.
Given the application and mapping characteristics, let
the relative utilization of a NoC link l be γl which is
the ratio of the data volume communicated through this
link (νl) to the total data volume communicated in the
NoC. Thus, the relative utilization of the link can be
characterized as:
γl =
νl∑
∀i,j∈{N} νi,j
(13)
where {N} is the set of all nodes (tiles) in the NoC and
νi,j is the data volume that needs to be communicated
between tile i (as a source) and tile j (as a consumer).
Now, the the bit error rate (BER) for all the NoC links
(assuming links to be independent) can be computed as
follows:
BER =
∑
∀l∈{L}
γl.αl.P r(Errl) (14)
where {L} is the set of all links in the NoC and αl
290 295 300 305 310 315 3200
5
10
15
20
mean of link delay (ps)
o
c
c
u
r
a
n
c
e
s
 
(%
)
(a) Distribution of links delay (mean)
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 60
5
10
15
20
STD of link delay (ps)
o
c
c
u
r
a
n
c
e
s
 
(%
)
(b) Distribution of links delay (std)
Fig. 15. Links delay statistics for the MMS benchmark
using maximum performance mapping.
is the average switching activity of link l. The switching
activity αl is the average spatial activity of the link which
is determined by the average hamming distance between
consecutive flits, while γl can be seen as the average
switching activity in time. Both are in the range of 0-
1.
For the MMS benchmark with performance mapping
and assuming αl is 25% the average BER is found to be
5.9476×10−6. On the other hand, when energy mapping
is considered the NoC expereinces slightly higher drop
(see Fig. 11) and BER increases to 2.9× 10−5.
To illustrate the impact of increased communication
workloads in terms of network throughput on BER for
various traffic scenarios, Fig. 16 plots BER for different
synthetic traffics with throughput. It can be noticed
that BER increases exponentially with throughput for
all traffics. However, due to different hotspot and traffic
distributions, different traffics experience different BERs.
Higher and more concentrated traffics lead to higher
BER as can be seen for the Hotspot traffic in Fig. 16.
6 CONCLUSION
In this work, an integrated tool to capture the impact
of on-chip communication workloads on power delivery
IEEE TRANSACTIONS ON COMPUTERS, VOL. XX, NO. X, NOVEMBER 201X 12
0.07 0.08 0.09 0.1 0.11
10−7
10−6
10−5
10−4
10−3
throughput (flits/cycle/node)
bi
t 
er
ro
r 
ra
te
 (
lo
g)
 
 
Transpose
Random
Hotspot (5%)
Fig. 16. Bit error rate versus throughput for various
synthetic traffics.
grid is presented. This tool is dedicated for NoCs. It inte-
grates a NoC simulator, on-chip link model, NoC power
and area models, and a fast power grid model to provide
a comprehensive simulation and system analysis. The
granularity of the power grid model would contribute to
the degree of accuracy of the analysis. Compared to the
SPICE simulation, error of the power grid model is less
than 2% and increases linearly with the granularity of the
grid. The developed tool also provides detailed analysis
for power supply variation based on the traffic distri-
butions and routing algorithms. The practicality of the
proposed model is further exemplified through a case
study. Detailed statistical timing analysis for communi-
cation links delay is presented. This enables the study of
impact of power supply noise based on communication
delay for different communication workloads and traffic
patterns. The results from such analyses can be used
to determine a high-level performance metric, such as
probability of error and bit error rates. Comprehensive
analyses of power grid and accurate communication link
models are crucial to power supply integrity evaluation.
This proposed method would lead to a robust evaluation
of NoC-based multi-core systems in early design stages.
REFERENCES
[1] M. Saint-Laurent and M. Swaminathan, “Impact of power-supply
noise on timing in high-frequency microprocessors,” IEEE Trans-
actions on Advanced Packaging, vol. 27, no. 1, pp. 135–144, 2004.
[2] S. R. Nassif, “Power grid analysis benchmarks,” in Design Au-
tomation Conference. ASPDAC’08. Asia and South Pacific, 2008, pp.
376–381.
[3] A. H. Ajami, K. Banerjee, and M. Pedram, “Scaling analysis of
on-chip power grid voltage variations in nanometer scale ulsi,”
Analog Integrated Circuits and Signal Processing, vol. 42, no. 3, pp.
277–290, 2005.
[4] H. H. Chen and D. D. Ling, “Power supply noise analysis
methodology for deep-submicron vlsi chip design,” in Design
Automation Conference. Proceedings of the 34th, 1997, pp. 638–643.
[5] H. H. Chen and J. S. Neely, “Interconnect and circuit modeling
techniques for full-chip power supply noise analysis,” Compo-
nents, Packaging, and Manufacturing Technology, Part B: Advanced
Packaging, IEEE Transactions on, vol. 21, no. 3, pp. 209–215, 1998.
[6] Y. Wang, J. Xu, Y. Xu, W. Liu, and H. Yang, “Power gating aware
task scheduling in mpsoc,” Very Large Scale Integration (VLSI)
Systems, IEEE Transactions on, no. 99, pp. 1–12, 2010.
[7] J. Hwisung and M. Pedram, “Optimizing the power delivery
network in dynamically voltage scaled systems with uncertain
power mode transition times,” in Design, Automation and Test in
Europe Conference and Exhibition (DATE), 2010, pp. 351–356.
[8] A. Todri, M. Marek-Sadowska, and J. Kozhaya, “Power supply
noise aware workload assignment for multi-core systems,” in
Computer-Aided Design ICCAD ’08. IEEE/ACM International Con-
ference on, 2008, pp. 330–337.
[9] W. J. Dally and B. Towles, “Route packets, not wires: On-chip
interconnection networks,” in Design Automation Conference. Pro-
ceedings. IEEE, 2001, pp. 684–689, design Automation Conference,
2001. Proceedings.
[10] H. Wang, L.-S. Peh, and S. Malik, “Power-driven design of
router microarchitectures in on-chip networks,” in Microarchitec-
ture, 2003. MICRO-36. Proceedings. 36th Annual IEEE/ACM Inter-
national Symposium on, dec. 2003, pp. 105 – 116.
[11] S. Vangal, J. Howard, G. Ruhl, S. Dighe, H. Wilson, J. Tschanz,
D. Finan, P. Iyer, A. Singh, and T. Jacob, “An 80-tile 1.28 tflops
network-on-chip in 65nm cmos,” in Solid-State Circuits Conference,
2007. ISSCC 2007. Digest of Technical Papers. IEEE International.
IEEE, 2007, pp. 98–589, solid-State Circuits Conference. ISSCC ’07.
Digest of Technical Papers. IEEE International.
[12] P. Vellanki, N. Banerjee, and K. S. Chatha, “Quality-of-service
and error control techniques for mesh-based network-on-chip
architectures,” Integr. VLSI J., vol. 38, no. 3, pp. 353–382, Jan. 2005.
[Online]. Available: http://dx.doi.org/10.1016/j.vlsi.2004.07.009
[13] H. Zimmer and A. Jantsch, “A fault model notation and error-
control scheme for switch-to-switch buses in a network-on-chip,”
in Hardware/Software Codesign and System Synthesis, 2003. First
IEEE/ACM/IFIP International Conference on, oct. 2003, pp. 188 –193.
[14] C. Grecu, L. Anghel, P. Pande, A. Ivanov, and R. Saleh, “Essential
fault-tolerance metrics for NoC infrastructures,” in On-Line Testing
Symposium IOLTS 07. 13th IEEE International, july 2007, pp. 37 –42.
[15] W. Hang-Sheng, Z. Xinping, P. Li-Shiuan, and S. Malik, “Orion:
a power-performance simulator for interconnection networks,” in
Microarchitecture. (MICRO-35). Proceedings. 35th Annual IEEE/ACM
International Symposium on, 2002, pp. 294–305.
[16] C. Seiculescu, S. Murali, L. Benini, and G. De Micheli, “Sunfloor
3d: A tool for networks on chip topology synthesis for 3-d systems
on chips,” Computer-Aided Design of Integrated Circuits and Systems,
IEEE Transactions on, vol. 29, no. 12, pp. 1987–2000, 2010.
[17] A. B. Kahng, B. Li, L. S. Peh, and K. Samadi, “Orion 2.0: A power-
area simulator for interconnection networks,” Very Large Scale
Integration (VLSI) Systems, IEEE Transactions on, vol. PP, no. 99,
pp. 1–5, 2011.
[18] A. Tapani, A. S. David, T. enza, B. Hong, and N. Jari, “Topology
optimization for application-specific networks-on-chip,” 2004,
966758 53-60.
[19] X. Jiang, W. Wayne, H. Joerg, and C. Srimat, “A design methodol-
ogy for application-specific networks-on-chip,” ACM Trans. Em-
bed. Comput. Syst., vol. 5, no. 2, pp. 263–280, 2006, 1151076.
[20] J. Hu and R. Marculescu, “Energy- and performance-aware map-
ping for regular NoC architectures,” Computer-Aided Design of
Integrated Circuits and Systems, IEEE Transactions on, vol. 24, no. 4,
pp. 551 – 562, 2005.
[21] S. Murali and G. De Micheli, “Bandwidth-constrained mapping
of cores onto NoC architectures,” in Proceedings of the conference
on Design, automation and test in Europe - Volume 2, ser. DATE ’04.
Washington, DC, USA: IEEE Computer Society, 2004, pp. 896 –
901 Vol.2.
[22] S. Bodapati and F. N. Najm, “High-level current macro-model
for power-grid analysis,” in 39th Design Automation Conference
Proceedings., 2002, pp. 385–390.
[23] S. Kvatinsky, E. G. Friedman, A. Kolodny, Scha, x, and L. chter,
“Power grid analysis based on a macro circuit model,” in 26th
Convention of IEEE in Israel, 2010, pp. 708–712.
[24] J. N. Kozhaya, S. R. Nassif, and F. N. Najm, “A multigrid-like
technique for power grid analysis,” Computer-Aided Design of
Integrated Circuits and Systems, IEEE Transactions on, vol. 21, no. 10,
pp. 1148–1160, 2002.
[25] A. Goyal and F. N. Najm, “Efficient rc power grid verification
using node elimination,” in Design, Automation and Test in Europe
Conference and Exhibition (DATE), 2011, pp. 1–4.
IEEE TRANSACTIONS ON COMPUTERS, VOL. XX, NO. X, NOVEMBER 201X 13
[26] M. Zhao, R. V. Panda, S. S. Sapatnekar, and D. Blaauw, “Hi-
erarchical analysis of power distribution networks,” Computer-
Aided Design of Integrated Circuits and Systems, IEEE Transactions
on, vol. 21, no. 2, pp. 159–168, 2002.
[27] J. M. S. Silva, J. R. Phillips, and L. M. Silveira, “Efficient repre-
sentation and analysis of power grids,” in Design, Automation and
Test in Europe, 2008, pp. 420–425.
[28] L. Hang, J. Fan, Q. Zhenyu, S. X. D. Tan, W. Lifeng, Y. Cai, and
X. Hong, “Partitioning-based approach to fast on-chip decoupling
capacitor budgeting and minimization,” Computer-Aided Design of
Integrated Circuits and Systems, IEEE Transactions on, vol. 25, no. 11,
pp. 2402–2412, 2006.
[29] Y. Boyuan, S. X. D. Tan, C. Gengsheng, and W. Lifeng, “Modeling
and simulation for on-chip power grid networks by locally domi-
nant krylov subspace method,” in Computer-Aided Design ICCAD.
IEEE/ACM International Conference on, 2008, pp. 744–749.
[30] B. Boghrati and S. Sapatnekar, “A scaled random walk solver for
fast power grid analysis,” in Design, Automation and Test in Europe
Conference and Exhibition (DATE), pages = 1-6, keywords = algebra
integrated circuit design linear algebraic equations on-chip power grids
power grid analysis random walk solver, year = 2011.
[31] L.-R. Zheng and H. Tenhunen, “Fast modeling of core switching
noise on distributed lrc power grid in ulsi circuits,” Advanced
Packaging, IEEE Transactions on, vol. 24, no. 3, pp. 245 –254, aug
2001.
[32] S. Tuuna, L. R. Zheng, J. Isoaho, and H. Tenhunen, “Modeling of
on-chip bus switching current and its impact on noise in power
supply grid,” Very Large Scale Integration (VLSI) Systems, IEEE
Transactions on, vol. 16, no. 6, pp. 766–770, 2008.
[33] P. P. Sotiriadis and A. P. Chandrakasan, “A bus energy model for
deep submicron technology,” Very Large Scale Integration (VLSI)
Systems, IEEE Transactions on, vol. 10, no. 3, pp. 341–350, 2002.
[34] J. Chen and L. He, “A decoupling method for analysis of coupled
rlc interconnects,” in Proc. IEEE/ACMInt. Great Lakes Symp. VLSI,.
IEEE, 2002, pp. 41–46, proceedings of the 12th ACM Great Lakes
symposium on VLSI.
[35] C. Po-Hao and C. Jia-Ming, “A decoupling technique on switch
factor based analysis of rlc interconnects,” in Electro/Information
Technology IEEE International Conference on, 2007, pp. 73–78.
[36] F. Fazzino, M. Palesi, and D. Patti, “Noxim: Network-on-chip
simulator,” URL: http://sourceforge.net/projects/noxim, 2008.
[37] F. X. Nizar Dahir, Terrence Mak and A. Yakovlev, “NoC
power supply analysis tool,” 2012. [Online]. Available:
http://www.staff.ncl.ac.uk/terrence.mak/
[38] A. Agarwal, K. Chopra, D. Blaauw, and V. Zolotov, “Circuit
optimization using statistical static timing analysis,” in Design
Automation Conference ’05. Proceedings. 42nd, june 2005, pp. 321
– 324.
[39] P. Bogdan and R. Marculescu, “Non-stationary traffic analysis
and its implications on multicore platform design,” Computer-
Aided Design of Integrated Circuits and Systems, IEEE Transactions
on, vol. 30, no. 4, pp. 508 –519, april 2011.
[40] ——, “Statistical physics approaches for network-on-chip traffic
characterization,” in Proceedings of the 7th IEEE/ACM international
conference on Hardware/software codesign and system synthesis, ser.
CODES+ISSS ’09. New York, NY, USA: ACM, 2009, pp. 461–470.
[Online]. Available: http://doi.acm.org/10.1145/1629435.1629498
[41] M. S. Gupta, J. L. Oatley, R. Joseph, W. Gu-Yeon, and D. M. Brooks,
“Understanding voltage variations in chip multiprocessors using
a distributed power-delivery network,” in Design, Automation and
Test in Europe Conference and Exhibition. DATE ’07, 2007, pp. 1–6.
[42] F. Mohamood, M. B. Healy, L. Sung Kyu, and H. H. S. Lee, “Noise-
direct: A technique for power supply noise aware floorplanning
using microarchitecture profiling,” in Design Automation Confer-
ence. ASP-DAC ’07. Asia and South Pacific, 2007, pp. 786–791.
[43] D. S. R. Nassif, “IBM power grid benchmarks,” 2008.
[44] N. K. Joseph, “Fast power grid simulation,” in 37th Conference
on Design Automation (DAC’00), R. N. Sani, Ed., vol. 0, 2000, pp.
156–161.
[45] N. Khan, S. Alam, and S. Hassoun, “Power delivery design for 3-
d ics using different through-silicon via (tsv) technologies,” Very
Large Scale Integration (VLSI) Systems, IEEE Transactions on, vol. 19,
no. 4, pp. 647 –658, april 2011.
[46] Y. Cao, T. Sato, D. Sylvester, M. Orshansky, and C. Hu, “Predic-
tive technology model,” Nanoscale integration and modeling group,
Arizona State Univerity, http://ptm.asu.edu/, 2006.
[47] H. Jingcao and R. Marculescu, “Energy-aware mapping for tile-
based NoC architectures under performance constraints,” in De-
sign Automation Conference, 2003. Proceedings of the ASP-DAC 2003.
Asia and South Pacific, 2003, pp. 233–239.
[48] S. Kirolos, Y. Massoud, and Y. Ismail, “Accurate analytical delay
modeling of cmos clock buffers considering power supply varia-
tions,” in Circuits and SystemsISCAS. IEEE International Symposium
on, 2008, pp. 3394–3397.
[49] J. Choi, M. Swaminathan, N. Do, and R. Master, “Modeling of
power supply noise in large chips using the circuit-based finite-
difference time-domain method,” Electromagnetic Compatibility,
IEEE Transactions on, vol. 47, no. 3, pp. 424 – 439, aug. 2005.
[50] S. Kirolos, Y. Massoud, and Y. Ismail, “Power-supply-variation-
aware timing analysis of synchronous systems,” in Circuits and
Systems ISCAS. IEEE International Symposium on, 2008, pp. 2418–
2421.
[51] J. M. Rabaey, A. P. Chandrakasan, and B. Nikolic, Digital integrated
circuits : a design perspective, ser. Prentice Hall electronics and VLSI
series. Pearson Education, Jan. 2003.
Nizar S. Dahir (S’12) received both B.Sc. and
M.Sc. degrees in Computer Engineering from
AL-Nahrain University Baghdad/Iraq in 1997 and
2000, respectively. He is currently working to-
ward the Ph.D. degree with the School of Electri-
cal, Electronic and Computer Engineering, New-
castle University, UK. His research interests in-
clude power and thermal integrity of Multiproces-
sor Systems-on-Chip and Networks-on-Chip.
Terrence Mak (S’05-M’09) received both B.Eng.
and M.Phil. degrees in Systems Engineering
from the Chinese University of Hong Kong in
2003 and 2005, respectively, and the Ph.D. de-
gree from Imperial College London in 2009. He
joined the School of Electrical, Electronic and
Computer Engineering at Newcastle University
as a lecturer from 2010 until 2012. He is cur-
rently with the Department of Computer Science
and Engineering, Chinese University of Hong
Kong. During his Ph.D., he worked as a Re-
search Engineer Intern in the VLSI group at Sun Microsystems Labora-
tories in Menlo Park, California. He also worked as a Visiting Research
Scientist in the Poon’s Neuroengineering Laboratory at MIT. He was the
recipient of both the Croucher Foundation Scholarship and the US Navel
Research Excellence in Neuroengineering in 2005. In 2008, he served
as the Co-Chair of the UK Asynchronous Forum, and in March 2008 he
was the Local Arrangement Chair of the Fourth International Workshop
on Applied Reconfigurable Computing. His research interests include
FPGA architecture design, Network-on-Chip, reconfigurable computing
and VLSI design for biomedical applications.
Fei Xia is a Senior Research Associate with the
School of Electrical, Electronic, and Computer
Engineering, Newcastle University, UK. His re-
search interests include the design, modelling,
and analysis of microelectronic systems focus-
ing on asynchronous systems and data commu-
nication in such systems. Dr Xia holds a BEng in
Automation from Tsinghua University, an MSc in
Control Systems from the University of Alberta,
and a PhD in Electronic Engineering from King’s
College London.
IEEE TRANSACTIONS ON COMPUTERS, VOL. XX, NO. X, NOVEMBER 201X 14
Alexandre (Alex) Yakovlev received the MSc
and PhD degrees from St. Petersburg Electrical
Engineering Institute in 1979 and 1982, respec-
tively, where he worked in the area of asyn-
chronous and concurrent systems since 1980
and the DSc degree from Newcastle University
in 2006. In the period between 1982 and 1990,
he held positions of assistant and associate pro-
fessor at the Computing Science Department.
Since 1991, he has been at the Newcastle Uni-
versity, where he worked as a lecturer, reader,
and professor at the Computing Science Department until 2002, and
is now heading the Microelectronic Systems Design Research Group
(http://async.org.uk) at the School of Electrical, Electronic and Computer
Engineering. His current interests and publications are in the field
of modeling and design of asynchronous, concurrent, real-time, and
dependable systems on a chip. He has published four monographs and
more than 200 papers in academic journals and conferences, has man-
aged over 25 research contracts. He has chaired program committees
of several international conferences, including the IEEE International
Symposium on Asynchronous Circuits and Systems (ASYNC), Petri
nets (ICATPN), Applications of Concurrency to Systems Design (ACSD),
and is currently a chairman of the Steering committee of the Conference
on Application of Concurrency to System Design. He is a senior member
of the IEEE and a member of the IET. In April 2008, he was general chair
of the 14th ASYNC Symp. and 2nd Int. Sympo. on NoCs, and tutorial
chair at Design Automation and Test in Europe (DATE) in 2009.
ACKNOWLEDGMENTS
The authors would like to thank prof. Maurizio Palesi
from the University of Catania / Italy, Sampo Tuuna
from the University of Turku / Finland and the anony-
mous reviewers for their valuable comments and help
throughout this work. Also, the first author would like
to thank the University of Kufa / Iraq for financing his
Ph.D. scholarship at the University of Newcastle.
