An Analytical Model for On-chip Interconnects in Multimedia Embedded Systems by Wu, Yulei et al.
 
Accepted by ACM Transactions on Embedded Computing Systems, Vol. xx, No. x, Article xx, Publication date: 2013 
An Analytical Model for On-Chip Interconnects in Multimedia 
Embedded Systems  
YULEI WU, Chinese Academy of Sciences 
 
GEYONG MIN, University of Bradford 
DAKAI ZHU, University of Texas at San Antonio 
LAURENCE T. YANG, St. Francis Xavier University 
 
The traffic pattern has significant impact on the performance of network-on-chip. Many recent studies 
have shown that multimedia applications can be supported in on-chip interconnects. Driven by the 
motivation of evaluating on-chip interconnects in multimedia embedded systems, a new analytical model 
is proposed to investigate the performance of the fat-tree based on-chip interconnection network under 
bursty multimedia traffic and non-uniform message destinations. Extensive simulation experiments are 
conducted to validate the accuracy of the model, which is then adopted as a cost-efficient tool to investigate 
the effects of bursty multimedia traffic with non-uniform destinations on the network performance.  
Categories and Subject Descriptors: C.2.1 [Computer-Communication Networks]: Network Architecture 
and Design, C.4 [Performance Of Systems] 
General Terms: Performance  
Additional Key Words and Phrases: Networks-on-Chip, bursty multimedia traffic, non-uniform destination 
distributions, analytical modelling 
ACM Reference Format: 
Yulei Wu, Geyong Min, Dakai Zhu, and Laurence T. Yang, 2013. An analytical model for on-chip 
interconnects in multimedia embedded systems. ACM Trans. Embedd. Comput. Syst. x, x, Article x (x 
2013),  x  pages. 
DOI:http://dx.doi.org/10.1145/0000000.0000000  
1. INTRODUCTION 
The latest development in multimedia embedded systems that are implemented with 
an on-chip architecture [Dally and Towles 2004; Majeti, Pasalapudi and 
Yalamanchili 2009; Varatkar and Marculescu 2002] not only requires processing of 
multichannel real-time audio or video signals, but also expects efficient 
interconnection networks for transport of multimedia content. The emerging chip-
multiprocessor (CMP) architectures consist of many processing cores on a single chip 
owing to the advances of miniaturization in semiconductor technologies [Marculescu 
 
 This work is supported by the National Program on Key Basic Research Project (973 Program) under 
grant 2012CB315803, the National Key Technology Research and Development Program of the Ministry of 
Science and Technology of China under grant 2012BAH01B03, the "Strategic Priority Research Program" 
of the Chinese Academy of Sciences under grant XDA01020304, the National High-tech R&D Program of China 
(863 Program) under grant 2011AA01A101, and the NSFC under grant 61173045. 
Author’s addresses: Y. Wu, Computer Network and Information Center, Chinese Academy of Sciences; G. 
Min (corresponding author), School of Computing, Informatics and Media, University of Bradford; D. Zhu, 
Department of Computer Science, University of Texas at San Antonio; L. T. Yang, Department of 
Computer Science, St. Francis Xavier University.  
Permission to make digital or hardcopies of part or all of this work for personal or classroom use is granted 
without fee provided that copies are not made or distributed for profit or commercial advantage and that 
copies show this notice on the first page or initial screen of a display along with the full citation. 
Copyrights for components of this work owned by others than ACM must be honored. Abstracting with 
credits permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any 
component of this work in other works requires prior specific permission and/or a fee. Permissions may be 
requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, 
fax +1 (212) 869-0481, or permissions@acm.org. 
© 2013 ACM 1539-9087/2010/03-ART39 $15.00 
DOI:http://dx.doi.org/10.1145/0000000.0000000 
XX 
39:2                                                                                                                            Y. Wu et al. 
 
 
Accepted by ACM Transactions on Embedded Computing Systems, Vol. xx, No. x, Article xx, Publication date: x 2013 
et al. 2009; Peng and Lin 2010; Sanchez, Michelogiannakis and Kozyrakis 2010]. To 
date, the network-on-chip (NoC), also known as on-chip interconnects, has emerged 
to play an important role in providing dominant solutions for the interconnection 
design of CMP architectures. The topology of an on-chip network specifies the 
structure in which the processing cores are connected. An NoC may adopt any 
topology proposed for interconnection networks, such as fat trees, mesh, torus, and 
folded torus [Dally and Towles 2004; Moadeli et al. 2010]. In this paper, we focus on 
the fat tree topology, which has been adopted by related studies [Grecu et al. 2004; 
Kapre et al. 2006; Taktak, Desbarbieux and Encrenaz 2008; Wang et al. 2012]. 
The traffic pattern has significant impacts on the performance of on-chip 
interconnects. To obtain a proper and deep understanding of the network 
performance, it is necessary to incorporate with the accurate models in order to 
capture the realistic network traffic patterns. The message arrival process and 
destination distribution are two of the most important characteristics used to define 
the network traffic patterns [Duato, Yalamanchili and Ni 2003]. A number of recent 
studies have convincingly shown that multimedia applications can be supported in 
on-chip interconnects [Lee et al. 2006; Ogras and Marculescu 2008; Varatkar and 
Marculescu 2004]. Furthermore, the message designations often exhibit the non-
uniform distributions over on-chip interconnects [Mirza-Aghatabar et al. 2007; Zhang 
and Jones 2009].  
Wormhole switching is an efficient switching scheme for on-chip networks 
[Bjerregaard and Mahadevan 2006; Marculescu et al. 2009], where a message is 
divided into a sequence of fixed-size units, called flits. The header flit governs the 
path through the network and the remaining data flits follow it in a pipelined fashion. 
Without the complexity caused by adaptive routing, a deterministic routing 
algorithm is suitable for NoC [Dally and Towles 2001]. In deterministic routing, a 
message traverses a fixed path between its source and destination, which simplifies 
the implementation, avoids message deadlock, and guarantees an in-order delivery. 
Therefore, in this study, a deterministic routing based on Up*/Down* algorithm 
[Schroeder et al. 1991] is adopted. 
The performance study of on-chip interconnects can be achieved by either 
simulation or analytical modelling [Duato, Yalamanchili and Ni 2003]. However, the 
simulation-based approach may be time-consuming and costly since the convergence 
of simulation towards a steady state in the presence of multimedia traffic with non-
uniform destinations is often very slow. In contrast, analytical modelling can capture 
the essential features of the network, gain significant insights, and offer a cost-
effective and versatile tool that can be used to investigate the network performance 
with different design alternatives and under various working conditions. In 
particular, the analytical model can provide quantitative relations between input 
parameters and performance metrics in order to have a thorough investigation of the 
network performance over a complete parametric range. 
Most of the existing studies on on-chip interconnects are resorted to simulation to 
evaluate the network performance. The lack of analytical performance models for 
such on-chip interconnects hinders efficient design for multimedia embedded systems. 
With the aim of capturing the characteristics of multimedia traffic patterns and 
obtaining a comprehensive understanding of network performance, this paper makes 
the following contributions:  
 A new analytical model is proposed to investigate the performance of on-chip 
interconnects in CMP in the presence of multimedia traffic with non-uniform 
destinations. The multimedia traffic is captured by bursty and correlated 
An Analytical Model for On-Chip Interconnects in Multimedia Embedded Systems                                    39:3  
                                                                                                                                         
 
Accepted by ACM Transactions on Embedded Computing Systems, Vol. xx, No. x, Article xx, Publication date: x 2013 
Markov-modulated-Poisson-process (MMPP) and the non-uniform destination is 
modelled by the hot-spot in the network. A popular fat-tree topology is adopted 
as the underlying interconnection architecture in CMP. 
 Extensive simulation experiments are conduced to validate the accuracy of the 
model. The comparison between analytical and simulation results reveals that 
the model possesses a good degree of accuracy under different design 
alternatives and with various traffic conditions. 
 To illustrate its applications, the analytical model is then applied to investigate 
the impact of multimedia traffic with hot-spot destinations on the performance 
of the fat-tree based on-chip interconnects. The analytical results demonstrate 
that the network performance degrades considerably under such traffic patterns. 
The rest of the paper is organized as follows. Section 2 presents the related work. 
The network architecture is shown in Section 3. Section 4 derives the analytical 
model to investigate the performance of fat-tree based on-chip interconnects. 
Extensive simulation experiments are used to validate the accuracy of the analytical 
model in Section 5. Section 6 carries out performance analysis by virtue of the 
developed analytical model. Finally, Section 7 concludes this study. 
2. RELATED WORK 
The studies on performance evaluation of on-chip interconnects have been widely 
reported in the literature [Ascia et al. 2008; Kodi, Sarathy and Louri 2008; Matsutani 
et al. 2009; Pande et al. 2005; Sanchez, Michelogiannakis and Kozyrakis 2010]. 
However, most of these studies are based on the use of simulation experiments to 
evaluate the performance of interconnects in NoC architecture. For example, the 
authors in [Ascia et al. 2008] proposed a selection strategy coupled with a routing 
algorithm to improve the performance of on-chip interconnects by virtue of flit-level 
simulators. The objective of the proposed strategy is to choose the channel that 
allows the packet to travel to its destination along a path that has the fewest number 
of congested nodes. The study by [Kodi, Sarathy and Louri 2008] proposed a low-
power low-area on-chip interconnection network architecture by reducing the number 
of buffers within the router. To minimise the performance degradation caused by the 
reduced buffer size, the circuit level enhancements were deployed to the existing 
repeaters to double the buffers when required. Matsutani et al. [Matsutani et al. 
2009] proposed a tree-based interconnection network so as to efficiently use 
enormous wire resources for low-latency and high-throughput communication in NoC 
and employed the simulation study to evaluate the performance of interconnection 
networks. Pande et al. [Pande et al. 2005] developed a consistent and meaningful 
evaluation methodology to compare the performance and characteristics of a variety 
of on-chip interconnection architectures and explored the design trade-offs that 
characterise the NoC for the optimal development of integrated network-based 
design. Sanchez, Michelogiannakis, and Kozyrakis [Sanchez, Michelogiannakis and 
Kozyrakis 2010] explored the architectural-level implications of network design for 
NoC. They further evaluated and compared different network topologies using 
simulation of the full chip. 
The analytical modelling and evaluation of on-chip interconnects are rarely 
reported in the current literature. Very recently, Moadeli et al. [Moadeli et al. 2010] 
have proposed an analytical model to evaluate the performance of ring-based on-chip 
interconnects. However, this model was developed based on the assumptions that 
message arrivals follow non-bursty Poisson process and message destinations are 
uniformly distributed. To the best of our knowledge, there has not been any 
39:4                                                                                                                            Y. Wu et al. 
 
 
Accepted by ACM Transactions on Embedded Computing Systems, Vol. xx, No. x, Article xx, Publication date: x 2013 
analytical model reported for on-chip interconnects in the current literature to 
handle multimedia traffic with hot-spot destinations. 
3. THE NETWORK-ON-CHIP ARCHITECTURE 
The NoC closely resembles the architecture of interconnection networks in high-
performance computing (HPC) systems [Benini and Micheli 2002]. Thus, the 
interconnection topologies adopted in the early NoC architecture can be traced back 
to the field of HPC systems. As the technology that scales to the nanoscale regime 
brings physical design issues to the forefront, 2D mesh and torus topologies exhibit a 
grid-based regular structure which is intuitively considered to be matched to the 2D 
chip layout and thus have been adopted in on-chip networks. Meanwhile, the NoC 
architectures aiming at low latency communication, performance scalability and 
flexible routing still choose fat-trees as their reference topology. Moreover, numerous 
research efforts have been made on tree-based topologies in the NoC community, 
proving their superior performance over 2D meshes under different types of traffic 
patterns [Pande et al. 2005]. As a result, the fat-tree topology and its variants are 
widely adopted in practice. The m-port n-tree is a typical example of fat-tree 
topologies [Lin, Chung and Huang 2004]. 
 
 
(a) (b)
Switch
Processing Core
 
 
Fig. 1. The m-port n-tree (4-port 2-tree) and its on-chip layout 
 
Fig. 1 depicts the topology of 4-port 2-tree and its on-chip layout. In this topology, 
each network switch has m communication ports that are connected with other 
switches or processing cores. The height of the tree is )1( +n . The switches (except for 
root switches) use half of the ports to connect with their descendants or processing 
nodes, and the other half to connect with their ancestors. The root switch employs all 
communication ports for connection with their descendants or processing nodes. 
The m-port n-tree consists of nodeN  processing nodes and switchN  switches 
(including the root switches) [Lin, Chung and Huang 2004], where nodeN  and switchN  
are given by 
 
n
node
mN 




=
2
2  (1) 
 
1
2
)12(
−





−=
n
switch
mnN  (2) 
Let jP  denote the probability that a newly generated message needs to cross j2  
channels ( j  channels in ascending phase and j  channels in descending phase) to 
reach its destination in the m-port n-tree. As the number of nodes located at distance 
j2  ( 11 −≤≤ nj ) in the m-port n-tree is 1)2/)(12/( −− jmm , we have 
An Analytical Model for On-Chip Interconnects in Multimedia Embedded Systems                                    39:5  
                                                                                                                                         
 
Accepted by ACM Transactions on Embedded Computing Systems, Vol. xx, No. x, Article xx, Publication date: x 2013 
 11     ,
1
2
1
2
1
−≤≤
−











 −
=
−
nj
N
mm
P
node
j
j  (3) 
The number of nodes located at distance n2  in the m-port n-tree is 1)2/)(1( −− nmm . 
Thus, nP  can be expressed as 
 
1
2
)1(
1
−





−
=
−
node
n
n N
mm
P  (4) 
Consequently, the average message distance, d , in the m-port n-tree can be 
expressed as  
 ∑
=
=
n
j
jjPd
1
2  (5) 
The m-port n-tree contains two types of connections: node-to-switch (or switch-to-
node) connection and switch-to-switch connection. Given that the basic unit of 
transmission in on-chip communication is a flit, let cnt  denote the time required to 
transmit a flit on a node-to-switch (or switch-to-node) connection and cst  represent 
the time to transmit a flit on a switch-to-switch connection in the m-port n-tree 
topology. cnt  and cst  can be determined by [Javadi, Akbari and Abawajy 2006] 
 nncn BLt /5.0 ϖθ +=  (6) 
 nscs BLt /ϖθ +=  (7) 
where nθ  and sθ  are the network latency and switch latency in the m-port n-tree 
topology and nB  denotes the bandwidth of connections in the m-port n-tree. ϖL  is the 
length of each flit. 
4. THE ANALYTICAL MODEL 
This section firstly presents the methods for modelling message arrival processes of 
multimedia applications and for modelling non-uniform destination distributions. 
The derivation of the analytical model is then reported. The major difference between 
the previous publications [Wu et al. 2008; Wu et al. 2011] and this paper is that the 
former considered the 2D torus interconnection networks but this study focuses on 
modelling and analysis of m-port n-tree interconnection topology. The key notations 
used in the derivation of the analytical model are listed in Table 1. 
4.1 Modelling the Message Arrival Process 
The message arrival process of multimedia applications significantly deviates from 
the traditional renewal process, e.g., Poisson process. The modelling of multimedia 
traffic is preferable to capturing its distinguishing characteristics, which usually 
possess bursty and correlated nature that can significantly affect the network 
performance [Liu et al. 2008]. A highly bursty message arrival process tends to have 
a large variance-to-mean ratio of the inter-arrival time. Let X  denote the inter-
arrival time, burstiness can be characterised by the squared coefficient of variation 
(SCV) of the inter-arrival time. The other important feature of multimedia traffic is 
the high correlation between inter-arrival times. The degree of correlation between 
inter-arrival times is typically measured by the correlation coefficient of X . 
39:6                                                                                                                            Y. Wu et al. 
 
 
Accepted by ACM Transactions on Embedded Computing Systems, Vol. xx, No. x, Article xx, Publication date: x 2013 
Table 1 Key Notations Used in the Derivation of the Analytical Model 
sQ , sΛ  parameter matrices of sMMPP  to model the traffic generated by the source node 
crQ , crΛ  parameter matrices of crMMPP  to model the regular traffic arriving at a given network channel 
ichQ , ichΛ  parameter matrices of ichMMPP  to model the arrival process of hot-spot messages on network channels 
located at i  hops away from the hot-spot node 
icQ , icΛ  parameter matrices of icMMPP  to model the loads at network channels that are i  hops away from the hot-
spot node 
rQ , rΛ  parameter matrices of rMMPP  to model the regular traffic injected into the network 
h  hot-spot fraction denoting the probability that generated messages are destined to the hot-spot node 
B buffer size at the source node 
ϖ  message length in flits 
rL , hL  communication latency experienced by regular messages and hot-spot messages  
jrT ,  transmission delay of a j2 -channel regular message on network channels 
channel
iN  number of channels located at i  hops away from the hot-spot node traversed by the hot-spot messages 
jkrWb ,,  blocking time that j2 -channel regular messages experience to acquire a channel at stage k  
channel
iN  number of channels that are i  )21( ni ≤≤  hops away from the hot-spot node traversed by the regular 
messages and hot-spot messages 
channelt
iN
_  total number of output channels that are i  )21( ni ≤≤  hops away from the hot-spot node 
jkrPb ,,  blocking probability of the regular messages at stage k 
jkrWc ,,  waiting time experienced by the regular messages to acquire a channel in the event of blocking 
)(* ,, sF jkr  Laplace-Stieltjes transform of the service time of regular messages on network channels at stage k 
rτ  mean time for the tail flit of a regular message to reach the destination 
 
In this paper, the arrival process of multimedia traffic is represented by an MMPP 
[Fischer and Meier-Hellstern 1993], which is a doubly stochastic process with the 
arrival rate varying according to a multi-state ergodic continuous-time Markov chain. 
The two-state MMPP has been widely used in numerous studies to model the 
message arrival behaviour of bursty traffic due to the following reasons: 1) many 
studies [Heffes 1980; Liu et al. 2008; Shah-Heydari and Le-Ngoc 2000] have revealed 
that MMPP has the ability of capturing the time-varying arrival rate and the 
important correlations among inter-arrival times of multimedia traffic; 2) MMPP is 
closed under the splitting and superposition operations and thus can be used to 
model the decomposition and superposition of network traffic in on-chip 
interconnection networks; and 3) the queueing-related results of MMPP have been 
widely studied [Fischer and Meier-Hellstern 1993; Heffes 1980], which makes the 
solutions of modelling networks with the MMPP arrival process analytically tractable. 
An Analytical Model for On-Chip Interconnects in Multimedia Embedded Systems                                    39:7  
                                                                                                                                         
 
Accepted by ACM Transactions on Embedded Computing Systems, Vol. xx, No. x, Article xx, Publication date: x 2013 
In this study, a two-state sMMPP  is adopted to model the traffic burstiness of the 
message arrival process generated by the source node [Wu et al. 2011]. sMMPP  can 
be characterized by the infinitesimal generator sQ  of the underlying Markov chain 
and rate matrix sΛ  as 
 





−
−
=
22
11
ss
ss
s ϕϕ
ϕϕ
Q      and   ),( 21 sss diag λλ=Λ  (8) 
where the element 1sϕ  is the transition rate from state 1 to 2 and 2sϕ  is the rate out 
of state 2 to 1. 1sλ  and 2sλ  are the traffic rate when the Markov chain is in state 1 
and 2, respectively. The mean, sλ , variance, )2(sλ , third central moment, )3(sλ , 
covariance function, )(tCov , and integral of the covariance function, sδ , of the traffic 
rate are listed below and can also be found in [Heffes 1980; Min and Ould-Khaoua 
2004]. These quantities denote how dependent the rate at one instant of time is on 
that at another instant and play a major role in the method for the superposition 
operations of MMPP. 
 
21
1221
ss
ssss
s ϕϕ
ϕλϕλ
λ
+
+
=  (9) 
 2
21
2
2121)2(
)(
)(
ss
ssss
s ϕϕ
λλϕϕ
λ
+
−
=  (10) 
3
21
2
1221
2
21211221
21
1
3
22
3
1)3(
)(
))()(3)((
ss
ssssssssssss
ss
ssss
s ϕϕ
ϕλϕλλλϕϕϕλϕλ
ϕϕ
ϕλϕλ
λ
+
++−+
−
+
+
=  (11) 
 t
ss
ssss ssetCov )(2
21
2
2121 21
)(
)()( ϕϕ
ϕϕ
λλϕϕ +−
+
−
=  (12) 
 
210
)2(
1)(1
sss
s dttCov ϕϕλ
δ
+
== ∫
∞
 (13) 
The SCV, 2sC , of the inter-arrival time and the one-step correlation coefficient, 1sr , 
of sMMPP  are often used to represent the burstiness of message arrivals and the 
correlations between inter-arrival times [Liu et al. 2008] 
 
)()(
)(21
122121
2
21
2
21212
ssssssss
ssss
sC ϕλϕλλλϕϕ
λλϕϕ
+++
−
+=  (14) 
 2
122121
2
21
2
21
2
21211
)()(
)(
sssssssss
ssssss
s C
r
ϕλϕλλλϕϕ
ϕϕλλλλ
+++
−
=  (15) 
4.2 Modelling the Message Destination Distribution 
The message designations often exhibit the non-uniform distributions over the 
networks in on-chip interconnects. Hot-spot traffic is able to capture the 
characteristics of the non-uniform distribution of message destinations where a 
number of nodes direct a fraction of their messages to the hot-spot node [Ascia et al. 
2008; Pfister and Norton 1985]. The hot-spot traffic may lead to paralysis of the hot-
spot node and even the whole network [Xiong, Liu and Sun 2001]. Hot-spot traffic 
has attracted significant research efforts over past few years due to the strong 
39:8                                                                                                                            Y. Wu et al. 
 
 
Accepted by ACM Transactions on Embedded Computing Systems, Vol. xx, No. x, Article xx, Publication date: x 2013 
evidence of its existence and the great effects on network performance [Ould-Khaoua 
and Sarbazi-Azad 2001; Sarbazi-Azad, Ould-Khaoua and Mackenzie 2001; Wu et al. 
2008]. For example, messages routed through an on-chip interconnect with the same 
destination address (i.e., the network coordinator) may result in contention. The hot-
spot traffic model proposed in [Pfister and Norton 1985] is employed to generate non-
uniform distribution of message destinations in this study. Specifically, each message 
has the probability, h , to be directed to the hot-spot node, and the probability, )1( h− , 
of being evenly directed to all other network nodes. 
4.3 Derivation of the Analytical Model 
A critical performance metric used to evaluate on-chip interconnects is the 
communication latency [Moadeli et al. 2010], which consists of three parts: a) waiting 
time at the source; b) transmission delay that is the time for a message to cross the 
network; and c) the time for the tail flit to reach the destination. The latency reflects 
dynamic behaviours of the network and may be high if the network traffic is non-
uniformly distributed, e.g., in the presence of hot-spot traffic. 
Let us refer to the traffic caused by the regular messages and hot-spot messages 
as regular traffic and hot-spot traffic, respectively. This section considers the effects 
of both regular messages and hot-spot messages on the performance of on-chip 
interconnection networks. Let rL  and hL  represent the communication latency 
experienced by regular messages and hot-spot messages in on-chip interconnects, 
respectively. Since each message has the probability, h , to be directed to the hot-spot 
node and the probability, )1( h− , of being evenly directed to all network nodes, the 
communication latency, L , for a given message in the on-chip interconnects 
[Sarbazi-Azad, Ould-Khaoua and Mackenzie 2001] can be given as follows: 
 hr hLLhL +−= )1(  (16) 
The regular messages and hot-spot messages experience different latencies due to 
the non-uniform traffic loads and varying blocking time over different network 
channels, depending on their locations with respect to the hot-spot node. In what 
follow, we first determine the waiting time at the source node, and then calculate the 
transmission delay and blocking time experienced by both regular messages and hot-
spot messages. Finally, we determine the time for the tail flit to reach the destination 
so as to calculate the message communication latency. 
4.3.1 Waiting Time at the Source for Regular and Hot-Spot Messages 
Let rMMPP  denote the regular traffic arriving at the queue in the network, which 
is a fraction, )1( h− , of the traffic generated by the source. Based on the principle of 
splitting an MMPP [Fischer and Meier-Hellstern 1993], the corresponding 
infinitesimal generator rQ  and rate matrix rΛ  of rMMPP  can be given by 
 





−
−
==
22
11
rr
rr
sr ϕϕ
ϕϕ
QQ    and   ),()1( 21 rrsr diagh λλ=−= ΛΛ  (17) 
In this paper, we consider the finite buffer queue, B , at the source; thus, the 
arriving messages are dropped when the buffer becomes full. Let sPl  denote the 
probability that an arriving packet finds the buffer full; the calculation of sPl will be 
given later by Eq. (19). The effective regular traffic entering the queue, denoted by 
An Analytical Model for On-Chip Interconnects in Multimedia Embedded Systems                                    39:9  
                                                                                                                                         
 
Accepted by ACM Transactions on Embedded Computing Systems, Vol. xx, No. x, Article xx, Publication date: x 2013 
erMMPP , at the source is a fraction, )1( sPl− , of rMMPP . Based on Eq. (17), the 
infinitesimal generator erQ  and rate matrix erΛ  of erMMPP  can be determined. 
To calculate the waiting time experienced by the regular message at the source, 
we adopt a bi-variate Markov chain, as shown in Fig. 2. Let baP , , 2 ,1=a  and Bb ≤≤0 , 
represent the probability that there are b  flits in the queue and the underlying 
Markov chain of the rMMPP  is at stage a . State ),( ba  corresponds to the case that 
there are b  flits in the queue and the rMMPP  is at state a . The transition rate out of 
state ),( ba  to )1,( +ba  is raλ  given by Eq. (17). The rate from state )1,( +ba  to ),( ba  is 
the service rate rT/1 , where rT  is the transmission delay for a regular message given 
by Eq. (29). The transition rate out of state ),1( b  to ),2( b  is 1rϕ , while the rate from 
state ),2( b  to ),1( b  is 2rϕ , where 1rϕ  and 2rϕ  can be given by Eq. (17). 
 
0,1
0,2
1rϕ 2rϕ
1,1
1,2
2,1
2,2
B,1
B,2
1rλ
rT/1
…...
…...
2rλ
…...
rT/1 rT/1 rT/1
rT/1rT/1rT/1rT/1
1rλ 1rλ 1rλ
2rλ 2rλ 2rλ
1rϕ 2rϕ1rϕ 2rϕ 1rϕ 2rϕ
 
 
Fig. 2. State transition rate diagram of the queue system 
 
The transition rate matrix, ℜ , of the bi-variate Markov chain can be obtained 
from Fig. 2. The steady-state probability vector, ) , , ,()( 10, BbaP PPPP == , where 
) ,( ,2,1 bbb PP=P , satisfies the equations: 0=ℜP  and 1=Pe . Let bP , Bb ≤≤0 , denote 
the probabillity that there are b  flits in the buffer. bP  is given by ∑ ==
2
1 ,a bab
PP . 
According to Little's law [Kleinrock 1975], the waiting time, rW , experienced by 
regular messages at the source can be determined by 
 
er
B
b b
r
bP
W
λ
∑ == 0  (18) 
where erλ  is the mean arrival rate of erMMPP  and can be computed based on Eq. (9). 
To determine sPl  used in Eq. (17), let us first calculate the probability, bP′ , that there 
are b  flits in the queue observed by an arriving packet. bP′  can be given by [Meier-
Hellstern 1989] eΛPeΛP rbr
B
b bb
P
1
0
−
=




=′ ∑ . Therefore, the probability, sPl , that an 
arriving message finds the finite buffer full can be written as  
 Bs PPl ′=  (19) 
The output process of the regular traffic from the queue, orMMPP , in the source 
node can be modelled approximately by that of the queueing system subject to the 
infinite buffer and rMMPP  input traffic. This approximation is validated by 
comparing the analytical performance results with those obtained through 
39:10                                                                                                                            Y. Wu et al. 
 
 
Accepted by ACM Transactions on Embedded Computing Systems, Vol. xx, No. x, Article xx, Publication date: x 2013 
simulation; it is worth noting that this approximation of the output process is not 
used in the simulation experiments. orMMPP  can be obtained by matching the 
moments of the inter-departure time of the packets. Following the method used in 
[Ferng and Chang 2001] to derive the output process from queueing system with the 
MMPP input, the infinitesimal generator orQ  and rate matrix orΛ  of orMMPP  can be 
determined. 
Similarly, let hMMPP  denote the hot-spot traffic arriving at queue in the source 
node, which is a fraction, h , of its generated traffic. Based on the method for deriving 
the expression of rW , we can readily obtain the waiting time, jhW , experienced by the 
j2 -channel hot-spot messages at its source node. The output of the hotspot traffic 
from the queue, ohMMPP , can be determined accordingly. 
4.3.2 Transmission Delay in m-Port n-Tree Based On-Chip Interconnects 
In this section, we first determine the traffic characteristics in m-port n-tree based 
on-chip interconnects under bursty multimedia traffic and hot-spot destinations, and 
then calculate the transmission delay and blocking time experienced by both regular 
messages and hot-spot messages in on-chip interconnects. 
A. Traffic Characteristics for Regular Messages and Hot-Spot Messages 
Due to the uniformity of regular messages on network channels, the arrivals of 
regular traffic at network channels exhibit similar statistical behavior. Since the 
network has nodeN  source nodes and nodenN4  network channels [Javadi, Akbari and 
Abawajy 2006], the regular traffic arriving at a given network channel is equal to rf  
times as much as that enters into the network from the queue in the source node. rf  
can be given by 
 
n
d
nN
dN
f
node
node
r 44
==  (20) 
Generally, rf  is not an integer value because it is determined by the network size, 
the properties of traffic generated by the source, and the hot-spot fraction. Let rΖ  
and rF  denote the integral and fractional parts of rf . Given that the splitting and 
superposition of multiple MMPPs are again an MMPP [Fischer and Meier-Hellstern 
1993; Heffes and Lucantoni 1986], let crMMPP  denote the regular traffic arriving at a 
given network channel. crMMPP  can be determined by the superposition of rΖ  traffic 
flows modelled by orMMPP  and one FMMPP , where FMMPP  represents the resulting 
traffic flow from the splitting of orMMPP  with the splitting probability rF . According 
to Eq. (17), the infinitesimal generator FQ  and rate matrix FΛ  of FMMPP  can be 
obtained. 
The parameters of crMMPP  can be determined by matching the following four 
statistical characteristics: mean, variance, third central moment, and integral of the 
covariance function of the arrival rate. Based on the parameter matrices of orMMPP  
and FMMPP , their statistical characteristics can be calculated based on Eqs. (9)-(13). 
Since crMMPP  is the superposition of rΖ  orMMPP  and one FMMPP , we can further 
obtain the mean ( crλ ), variance ( )2(crλ ), third central moment ( )3(crλ ), and integral of 
An Analytical Model for On-Chip Interconnects in Multimedia Embedded Systems                                    39:11  
                                                                                                                                         
 
Accepted by ACM Transactions on Embedded Computing Systems, Vol. xx, No. x, Article xx, Publication date: x 2013 
the covariance function ( crδ ) of the traffic rate of crMMPP  and compute its 
infinitesimal generator crQ  and rate matrix crΛ  as follows [Heffes 1980; Min and 
Ould-Khaoua 2004] 
 





−
−
=
22
11
crcr
crcr
cr ϕϕ
ϕϕ
Q    and   ),( 21 crcrcr diag λλ=Λ  (21) 
where the parameters, 1crϕ , 2crϕ , 1crλ  and 2crλ  are given by 
 







=
+
=
+
=
2    ,
)1(
1    ,
)1(
1



crcr
cr
crcr
cr
ηδ
η
ηδ
ϕ  (22) 
 








=−
=+
=
2    ,
1    ,
)2(
)2(



cr
cr
cr
cr
cr
cr
cr
η
λ
λ
η
λ
λ
λ  (23) 
 



 +−+= 24
2
1 crcr
cr
cr υυ
υ
η  (24) 
 
( )3)2(
)3(
cr
cr
cr
λ
λ
υ =  (25) 
Because of the non-uniformity of hot-spot messages on network channels, the hot-
spot traffic on different channels varies and can be determined according to their 
locations with respect to the hot-spot node. With hot-spot traffic, the network 
channels at different locations with respect to the hot-spot node have identical traffic 
characteristics. Therefore, we need to determine the number of channels, channeliN , 
located at i  hops away from the hot-spot node traversed by the hot-spot messages. 
channel
iN  can be given by 
 






≤≤+








−





≤≤
= −−
ninmm
ni
N nichanneli 21              1
2
2
1                                         2
1  (26) 
Let 
ichMMPP  denote the arrival process of hot-spot messages on network channels 
located at i  hops away from the hot-spot node. 
ichMMPP  can be obtained by 
considering the following two cases: 
a) With the use of deterministic routing, the channels located at i  )1( ni ≤≤  hops 
away from the hot-spot node can receive messages generated from the nodes 
located more than i2  hops away from the hot-spot node. The number of nodes 
located at i2  )1( ni ≤≤  hops away from the hot-spot node, nodeiN , can be 
expressed as 
39:12                                                                                                                            Y. Wu et al. 
 
 
Accepted by ACM Transactions on Embedded Computing Systems, Vol. xx, No. x, Article xx, Publication date: x 2013 
 







=−





<≤




 −





=
−
−
nimm
nimm
N
i
i
node
i
               )1(
2
1             1
22
1
1
 (27) 
b) The channels located at i  )21( nin ≤≤+  hops away from the hot-spot node can 
receive messages generated by channeliin N−22  nodes in the m-port n-tree. 
Since there are channeliN  channels located at i  hops away from the hot-spot node to 
be traversed by the hot-spot messages, the hot-spot traffic arriving at a given 
network channel located at i  hops away from the hot-spot node is 
ichf  times of the 
traffic that enters into the network from the queue in the source node. 
ichf  can be 
expressed as 
 






≤≤+
≤≤
=
−
=∑
nin
ni
N
N
f
in
channel
i
n
i
node
chi
21                     2
1          
2
 
 (28) 
Adopting the similar method used in determining of the regular traffic on network 
channels, we can readily obtain the infinitesimal generator, 
ichQ , and rate matrix, 
ichΛ , of ichMMPP . The superposition of these two types of traffic yields the loads at 
network channels that are i  hops away from the hot-spot node, modelled by 
icMMPP  
with the infinitesimal generator, 
icQ , and rate matrix, icΛ . 
B. Transmission Delay for Regular Messages and Hot-Spot Messages 
Since regular messages may cross different numbers of channels to reach their 
destinations, we take into account the transmission delay of a j2 -channel regular 
message (i.e., the message needs to traverse j2  channels to reach its destination) as 
jrT , . Averaging all the possible destinations made by a given regular message yields 
the transmission delay as 
 ∑
=
=
n
j
jrjr TPT
1
,  (29) 
For the sake of clarity, the numbering of network stages in m-port n-tree topology 
is based on the location of switches between the source and destination. The 
numbering starts from the stage next to the source (stage 0) and goes up as it is 
closer to the destination. In m-port n-tree, the number of stages to be crossed by a 
j2 -channel message is 12 −= jK . Since messages are transferred to the local 
processing core upon arriving at their destinations, the analysis starts from the last 
stage and continues backward to the first stage. Therefore, the service time 
experienced by regular messages on network channels at the last stage ( 1−K ), 
jSrT ,1, − , can be given by 
 cnjSr tT ϖ=− ,1,  (30) 
where ϖ  is the message length in flits. 
An Analytical Model for On-Chip Interconnects in Multimedia Embedded Systems                                    39:13  
                                                                                                                                         
 
Accepted by ACM Transactions on Embedded Computing Systems, Vol. xx, No. x, Article xx, Publication date: x 2013 
The service time experienced by messages on the network channels at the internal 
stages k  )20( −≤≤ Kk  can be obtained by actual message transmission time and the 
delay due to blocking at subsequent stages. Thus, the service time, jkrT ,, , experienced 
by messages on network channels at internal stages can be expressed as 
 ∑
−
+=
+=
1
1
,,,,
K
k
jrcsjsr WbtT

ϖ ,         20 −≤≤ Kk  (31) 
where jkrWb ,,  is the blocking time that messages experience to acquire a channel at 
stage k . jrT ,  is the service time of a regular message at stage 0, i.e., jrjr TT ,0,, = . 
Similarly, we consider the transmission delay of a j2 -channel hot-spot message 
as jhT , . Following the derivation of jkrT ,,  given by Eqs. (30) and (31), the service time, 
jkhT ,, , experienced by j2 -channel hot-spot messages on network channels can be 
readily determined. 
C. Blocking Time for Regular Messages and Hot-Spot Messages on Network Channels 
The blocking time experienced by j2 -channel regular messages on network 
channels at stage k, jkrWb ,, , can be determined by the blocking probability of 
messages at this stage, jkPb , , and the waiting time, jkWc , , that the messages 
experience to acquire a channel when blocking occurs. Since there are channeliN  
channels that are i  )21( ni ≤≤  hops away from the hot-spot node traversed by the 
regular messages and hot-spot messages, and the total number of output channels 
that are i  )21( ni ≤≤  hops away from the hot-spot node is channeltiN _ , the probability 
that the channels are located at i  hops away from the hot-spot node traversed by 
both regular messages and hot-spot messages (i.e., the superposed messages) is 
channelt
i
channel
i NN
_ . Therefore, jkrWb ,,  can be expressed as 
 jkrjkrchannelt
i
channel
i
jkjkchannelt
i
channel
i
jkr WcPbN
NWcPb
N
NWb ,,,,_,,_,, 1 






−+=  (32) 
where 12 −−= sji . jkrPb ,,  and jkPb ,  are the blocking probability of the regular 
messages and all messages (i.e., including both regular messages and hot-spot 
messages) at stage k. jkrWc ,,  and jkWc ,  are the waiting time experienced by the 
regular messages and all messages to acquire a channel in the event of blocking. 
channelt
iN
_  can be given by 
 





≤≤+−
≤≤





=
−
ninN
nimN
node
i
channelt
i
21              )1(2
1                   
2
2
1
_  (33) 
Taking both the regular messages and hot-spot messages with their appropriate 
weights into account yields the service time on network channels at stage k as follows: 
 jkh
c
ch
jkr
c
cr
jk TTT
i
i
i
,,,,, λ
λ
λ
λ
+=  (34) 
39:14                                                                                                                            Y. Wu et al. 
 
 
Accepted by ACM Transactions on Embedded Computing Systems, Vol. xx, No. x, Article xx, Publication date: x 2013 
where crλ , ichλ  and icλ  denote the mean arrival rate of regular traffic, hot-spot 
traffic and the superposed traffic on network channels located at i hops away from 
the hot-spot node. These quantities can be obtained by the virtue of Eq. (9). 
The blocking probability, jkPb , , can be determined using a Markov chain. The 
state of the Markov chain is described by a pair of random variables, ),( st , where t  
denotes the status of the channel and s  is the state of 
icMMPP . The transition rate 
out of state ),( st  to ),1( st +  is 
icsλ , where icsλ  is the traffic rate on network channels 
when 
icMMPP  is at state s ; while the rate from ),1( st +  to ),( st  is icsjkT λ−,/1 . The 
transition rates out of state ),( st  to )1,( +st  and out of ),( st  to )1,( −st  are 
ic2ϕ  and 
ic1ϕ , respectively. Obtaining the steady-state vector of the Markov chain can yield 
the blocking probability jkPb ,  [Min and Ould-Khaoua 2004]. 
Since the hot-spot traffic is non-uniformly distributed over the network channels, 
the waiting time experienced by messages due to blocking on network channels 
depends on the location of the current network channels with respect to the hot-spot 
node because the traffic rate varies from one channel to the other. To determine the 
waiting time, jkWc , , the network channels are modelled as MMPP/G/1 queueing 
systems [Fischer and Meier-Hellstern 1993]. As the arrival process is modelled by 
icMMPP  and the service time is jkT , , jkWc ,  can be expressed as 
 
ik
iiiiiikikikik
ik
ikiik
jk
c
cccccccccc
c
ccc
v
ttt
Wc
,
,,,,
,
,,
, 1
ˆ))()1((
)1(2
2 1)2(
ρ
ρ
ρ
λρ
−
++−
−
−
+
=
− λπeQΛπg
 (35) 
 
ik
ikijk
c
ccv
jk
tWc
Wc
,
,,
2
2 )2(
, ρ
λ−
=  (36) 
where 12 −−= sji . In the above two equations, 
ikct ,  and 
)2(
,ikc
t  denote the first two 
moments of the service time on network channels and can be determined from the 
Laplace-Stieltjes transform of the service time on network channels at stage k 
[Kleinrock 1975]. The traffic intensity, 
iikik ccc t λρ ,, = , where icλ  is the mean traffic 
rate arriving at the network channels and is equal to 
ii cc λπ
ˆ . 
icπ  is the steady-state 
vector of 
icMMPP  and ccc ii eΛλ =ˆ . ce  is the column unit vector of length 2. The 
algorithm for computing the matrix 
ikc ,g  can be found in [Fischer and Meier-
Hellstern 1993]. jkrWc ,,  can be determined according to Eqs. (35) and (36), by 
modelling the network channel as an MMPP/G/1 queueing system where the arrival 
process is modelled by crMMPP  and the service time is jkrT ,, . 
4.3.3 Time for the Tail Flit of Regular and Hot-Spot Messages to Reach the Destination 
The mean time for the tail flit of a regular message to reach the destination, rτ , can 
be given by  
 cncsr ttd +−= )2(τ  (37) 
An Analytical Model for On-Chip Interconnects in Multimedia Embedded Systems                                    39:15  
                                                                                                                                         
 
Accepted by ACM Transactions on Embedded Computing Systems, Vol. xx, No. x, Article xx, Publication date: x 2013 
The time for the tail of a j2 -channel hot-spot message to reach the destination, 
jhτ , can be determined by 
 ∑ −= +=
1
1
K
k cncsh
tt
j
τ  (38) 
The communication latency for the regular messages, rL , can be written as 
rrrr WTL τ++= , and that for an j2 -channel hot-spot message can be given by 
jjjj hhhh WTL τ++= . Averaging all the possible values of j  gives the communication 
latency for a hot-spot message as hL . 
4.3.4 Implementation of the Analytical Model 
To facilitate the understanding of the derivation of the analytical model, in what 
follows, we will outline the key steps for implementation of the model. 
Step 1:  Calculate the parameter matrices denoting the traffic patterns on network 
channels. 
Step 1.1: Calculate the parameter matrices of crMMPP  for modelling the regular 
traffic arriving at a given network channel using Eqs. (20)-(25); 
Step 1.2: Calculate the parameter matrices of 
ichMMPP  for modelling the hot-spot 
traffic arriving at network channels located at i  hops away from the hot-spot 
node based on Eqs. (21)-(25) and (28); 
Step 1.3: Apply Eqs. (21)-(25) again to calculate the parameter matrices of 
icMMPP  for modelling the traffic at network channels that are i  hops away from 
the hot-spot node. 
Step 2: Based on the parameter matrices of the traffic patterns obtained from Step 1, 
calculate the communication latency for regular messages and hot-spot messages in 
on-chip interconnects 
Step 2.1: Calculate the waiting time at the source node for regular messages and 
hot-spot messages using Eq. (18); 
Step 2.2: Calculate the transmission delay for regular messages and hot-spot 
messages using Eqs. (30)-(32); 
Step 2.3: Calculate the time for the tail flit of a message to reach the destination 
using Eqs. (37) and (38). 
Step 3: Based on the communication latencies derived from Step 2, calculate the 
communication latency for a given message in the on-chip interconnects using Eq. 
(16). 
5. VALIDATION OF THE MODEL 
The accuracy of the analytical model is validated by means of a discrete-event 
simulator, operating at the flit level, based on OMNeT++ simulation framework. The 
communication latency is defined as the mean amount of time from the generation of 
a message until the last data flit reaches the processing core of the destination. 
Extensive simulation experiments have been performed to validate the accuracy of 
the model for various combinations of message lengths, parameter metrics of sMMPP  
and hot-spot fractions. However, for the sake of specific illustration and without loss 
of generality, the latency results are presented for the following cases [Moadeli et al. 
2010; Salminen, Kulmala and Hamalainen 2008; Wu et al. 2011]: 8-port 2-tree to 
construct the underlying on-chip interconnects; Message length: =ϖ 16 and 32 flits; 
39:16                                                                                                                            Y. Wu et al. 
 
 
Accepted by ACM Transactions on Embedded Computing Systems, Vol. xx, No. x, Article xx, Publication date: x 2013 
Flit length: =ϖL 16 bytes; The buffer size: =B 32 flits; The bandwidth is set to be 20 
messages per cycle. The point-to-point latency and switch latency are 0.2 cycles; 
Parameters 1sϕ  and 2sϕ  in the infinitesimal generator sQ  of sMMPP  are set to be: 
1sϕ = 0.08 2sϕ = 0.04 (i.e., 1sϕ = 2 2sϕ ) and 1sϕ = 0.09 2sϕ = 0.06 (i.e., 1sϕ = 3 2sϕ /2), 
representing the different degrees of traffic burstiness and correlations; Hot spot 
fraction is set to be δ  = 0.05, 0.1, 0.15 and 0.2, representing different degrees of non-
uniformity of message destinations. 
Figs. 3 and 4 depict the performance results for the communication latency 
predicted by the analytical model plotted against those provided by the simulator as 
a function of the traffic rate in the 8-port 2-tree on-chip interconnects. In these 
figures, the horizontal axis represents the traffic rate, 1sλ , at which a processing core 
injects messages into the network when sMMPP  is at state 1, while the vertical axis 
denotes the communication latency. For the sake of clarity of the figures, we have 
deliberately set the arrival rate, 2sλ , at state 2 at zero; otherwise we need to use 
three-dimensional graphs to represent the results. 
 
0 0.035 0.07 0.105
0
100
200
300
Traffic Rate (messages/cycle)
(a)
Co
m
m
un
ic
at
io
n 
La
te
nc
y 
(c
yc
le
s)
 
 
Ana, ω = 16
Ana, ω = 32
Sim, ω = 16
Sim, ω = 32
0 0.022 0.044 0.066
0
100
200
300
Traffic Rate (messages/cycle)
(b)
Co
m
m
un
ic
at
io
n 
La
te
nc
y 
(c
yc
le
s)
 
 
Ana, ω = 16
Ana, ω = 32
Sim, ω = 16
Sim, ω = 32
 
0 0.016 0.032 0.048
0
100
200
300
Traffic Rate (messages/cycle)
(c)
Co
m
m
un
ic
at
io
n 
La
te
nc
y 
(c
yc
le
s)
 
 
Ana, ω = 16
Ana, ω = 32
Sim, ω = 16
Sim, ω = 32
0 0.0127 0.0254 0.0381
0
100
200
300
Traffic Rate (messages/cycle)
(d)
Co
m
m
un
ic
at
io
n 
La
te
nc
y 
(c
yc
le
s)
 
 
Ana, ω = 16
Ana, ω = 32
Sim, ω = 16
Sim, ω = 32
 
 
Fig. 3. Communication latency predicted by the analytical model against simulation experiments with 1sϕ = 0.08 and 2sϕ = 
0.04: (a) h = 0.05, (b) h = 0.1, (c) h = 0.15, and (d) h = 0.2. 
 
These figures reveal that the results of communication latency obtained from the 
derived model closely match those obtained from the simulation as the average 
prediction error, which is calculated as simulationanalyticalsimulation resultresultresult    −  for 
all simulation points, is less than 6%. The tractability and accuracy of the model 
make it a practical and cost-effective tool to gain insight into the performance of on-
chip interconnection networks in the presence of bursty multimedia traffic with hot-
spot destinations. 
 
An Analytical Model for On-Chip Interconnects in Multimedia Embedded Systems                                    39:17  
                                                                                                                                         
 
Accepted by ACM Transactions on Embedded Computing Systems, Vol. xx, No. x, Article xx, Publication date: x 2013 
0 0.03 0.06 0.09
0
100
200
300
Traffic Rate (messages/cycle)
(a)
Co
m
m
un
ic
at
io
n 
La
te
nc
y 
(c
yc
le
s)
 
 
Ana, ω = 16
Ana, ω = 32
Sim, ω = 16
Sim, ω = 32
 
0 0.019 0.038 0.057
0
100
200
300
Traffic Rate (messages/cycle)
(b)
Co
m
m
un
ic
at
io
n 
La
te
nc
y 
(c
yc
le
s)
 
 
Ana, ω = 16
Ana, ω = 32
Sim, ω = 16
Sim, ω = 32
 
0 0.0135 0.027 0.0405
0
100
200
300
Traffic Rate (messages/cycle)
(c)
Co
m
m
un
ic
at
io
n 
La
te
nc
y 
(c
yc
le
s)
 
 
Ana, ω = 16
Ana, ω = 32
Sim, ω = 16
Sim, ω = 32
 
0 0.0107 0.0214 0.0321
0
100
200
300
Traffic Rate (messages/cycle)
(d)
Co
m
m
un
ic
at
io
n 
La
te
nc
y 
(c
yc
le
s)
 
 
Ana, ω = 16
Ana, ω = 32
Sim, ω = 16
Sim, ω = 32
 
 
Fig. 4. Communication latency predicted by the analytical model against simulation experiments with 1sϕ = 0.09 and 2sϕ = 
0.06: (a) h = 0.05, (b) h = 0.1, (c) h = 0.15, and (d) h = 0.2. 
 
6. PERFORMANCE ANALYSIS 
6.1 The Impact of Traffic Patterns on Network Performance 
Having validated the accuracy of analytical model, let us now use it to investigate the 
effects of the bursty multimedia traffic and hot-spot destinations with different 
degrees of traffic burstiness and correlations imposed by MMPP input parameters 
(which can be calculated by Eqs. (14) and (15)) and hot-spot fractions on the 
performance of on-chip interconnection networks. We consider four different cases of 
parameter settings for various traffic patterns as shown in Table 2; Case (I): the non-
bursty Poisson traffic with uniform destinations, Cases (II): the non-bursty Poisson 
traffic with hot-spot destinations, Case (III): the bursty multimedia traffic with 
uniform destinations, and Case (IV): the bursty multimedia traffic with hot-spot 
destinations. 
Table 2 Parameter Settings for Traffic Patterns 
Cases Parameter Settings 
Case (I) Poisson traffic with uniform destinations 
Case (II) Poisson traffic and hot-spot destinations ( h = 0.1) 
Case (III) Bursty multimedia traffic ( 1sϕ = 0.09, 2sϕ = 0.06) with uniform destinations 
Case (IV) Bursty multimedia traffic ( 1sϕ = 0.09, 2sϕ = 0.06) with hot-spot destinations ( h = 0.1) 
 
Fig. 5 depicts the results predicted by the derived analytical model in the presence 
of bursty multimedia traffic with uniform and hot-spot destinations under Case (III) 
and Case (IV), respectively, and the model with non-bursty Poisson traffic with 
39:18                                                                                                                            Y. Wu et al. 
 
 
Accepted by ACM Transactions on Embedded Computing Systems, Vol. xx, No. x, Article xx, Publication date: x 2013 
uniform and hot-spot destinations under Case (I) and Case (II), respectively. Through 
comparing the results of Case (IV) and Case (II) in Fig. 5(a), we can find that the 
bursty multimedia traffic degrades the network performance considerably, since the 
communication latency increases, especially under moderate and heavy traffic loads. 
Moreover, the maximum throughput that the network is able to support decreases 
when subject to the bursty multimedia traffic. Based on the comparison between 
Case (I) and Case (II), we can find that the maximum network throughput 
significantly decreases and the network performance degrades due to the presence of 
hot-spot destinations. This is because the hot-spot traffic can cause the higher traffic 
loads on network channels located closer to the hot-spot node. Thus, with the hot-spot 
destinations, these channels become overloaded quickly. To take into account the 
impact of hot-spot destinations on the performance of multimedia embedded systems, 
we further compare the results under Case (III) and Case (IV) and find the similar 
phenomenon in comparison with that under Case (I) and Case (II). Examining Fig. 
5(b) for different settings of the message size reveals the same results. 
 
0 0.028 0.056 0.084
0
50
100
150
Traffic Rate (messages/cycle)
(a)
Co
m
m
un
ic
at
io
n 
La
te
nc
y 
(c
yc
le
s)
 
 
Case (I)
Case (II)
Case (III)
Case (IV)
 
0 0.014 0.028 0.042
0
50
100
150
Traffic Rate (messages/cycle)
(b)
Co
m
m
un
ic
at
io
n 
La
te
nc
y 
(c
yc
le
s)
 
 
Case (I)
Case (II)
Case (III)
Case (IV)
 
 
Fig. 5. The performance comparison of on-chip interconnection networks under different traffic patterns with the 
parameter setting for bursty multimedia traffic and hot-spot destinations in Table 2: (a) ϖ = 16 and (b) ϖ = 32. 
 
From the above analysis, we can find that the proposed model manages to predict 
the increase in the communication latency and decrease in the maximum network 
throughput in the presence of bursty multimedia traffic with hot-spot destinations. 
These observations highlight the importance of developing and using the realistic 
models for the study and optimisation of on-chip interconnection networks. These 
results also demonstrate that the network suffers significant performance 
degradation in the presence of bursty multimedia traffic with hot-spot destinations. 
6.2 Comparison of Runtime between Analytical Model and Simulator 
To investigate the efficiency of the analytical model, in this section we compare the 
runtime required by the analytical model and simulation experiments to obtain the 
desirable performance results. To this end, we use the scenario of Fig. 3(a) as an 
example and present the runtime to obtain the analytical and simulation results with 
=ϖ 16 and 32. The other parameter settings are the same as those presented in 
Section 5. All the results were obtained on a 32-bit PC using an Intel(R) Core(TM) 2 
Quad CPU 2.66GHz with 3.46GB of RAM. Tables 3 and 4 list the runtime, for =ϖ 16 
and 32, respectively, required by the analytical model and simulation experiments. 
The tables reveal that the runtime required to reach the reliable performance results 
in simulation experiments is about 300 times higher than that required by the 
analytical model. The results demonstrate that the analytical model can be used as a 
An Analytical Model for On-Chip Interconnects in Multimedia Embedded Systems                                    39:19  
                                                                                                                                         
 
Accepted by ACM Transactions on Embedded Computing Systems, Vol. xx, No. x, Article xx, Publication date: x 2013 
cost-effective tool for performance evaluation of on-chip interconnection networks in 
multimedia embedded systems. 
Table 3 Comparison of Runtime between the Analytical Model and Simulator with =ϖ 16 
Traffic Rate (messages/cycle) Analytical Model (second) Simulation (second) 
0.012755 0.174067302 16.24 
0.02551 0.101189886 30.49 
0.038265 0.077382251 42.92 
0.05102 0.066649709 55.84 
0.063776 0.061063394 66.31 
0.076531 0.060654852 76.6 
0.089286 0.070480475 86.92 
0.102041 0.661337912 100.15 
Total runtime 1.27282578 475.47 
Table 4 Comparison of Runtime between the Analytical Model and Simulator with =ϖ 32 
Traffic Rate (messages/cycle) Analytical Model (second) Simulation (second) 
0.00649 0.558802337 15.08 
0.01298 0.298164393 28.79 
0.01947 0.213332363 42.55 
0.025961 0.172755522 54.88 
0.032451 0.151522343 68.65 
0.038941 0.147551525 80.81 
0.045431 0.173527519 93.52 
0.051921 0.894599135 112.53 
Total runtime 2.610255138 496.81 
 
7. CONCLUSIONS AND FUTURE WORK 
This paper has developed an analytical model to evaluate the performance of on-chip 
interconnection networks under bursty multimedia traffic and non-uniform 
destinations. The bursty traffic is modelled by the well-known MMPP and the 
destination distribution is modelled by the hot-spot destinations. The on-chip 
network architecture is constructed by the popular fat-tree topology. Extensive 
simulation experiments have been conducted to validate the accuracy of the model. 
The tractability and accuracy of the model make it a practical and cost-effective tool 
to gain insight into the performance of on-chip interconnection networks in the 
presence of realistic network traffic. The model is then applied to investigate the 
impact of bursty multimedia traffic with hot-spot destinations on the performance of 
on-chip interconnection networks. The analytical results have shown that the 
network performance degrades considerably under such traffic patterns. In the 
future work, we will extend the analytical model to consider the application of virtual 
39:20                                                                                                                            Y. Wu et al. 
 
 
Accepted by ACM Transactions on Embedded Computing Systems, Vol. xx, No. x, Article xx, Publication date: x 2013 
channel flow control. The key tasks for this extension include the calculation of the 
status of each virtual channel at a given physical channel when determining the 
waiting time experienced by messages and the effect of virtual channels multiplexing 
on the average latency. 
REFERENCES 
ASCIA, G., CATANIA, V., PALESI, M. AND PATTI, D. 2008. Implementation and Analysis of a New 
Selection Strategy for Adaptive Routing in Networks-on-Chip. IEEE Trans. on Computers 57, 809-820. 
BENINI, L. AND MICHELI, G.D. 2002. Networks on chip: a new SoC paradigm. IEEE Computer 35, 70-78. 
BJERREGAARD, T. AND MAHADEVAN, S. 2006. A Survey of Research and Practices of Network-on-
Chip. ACM Computing Surveys 38, Article No. 1. 
DALLY, W.J. AND TOWLES, B. 2001. Route Packets, Not Wires: Onchip Interconnection Networks. In 
Proceedings of the Design Automation Conference, 684-689. 
DALLY, W.J. AND TOWLES, B.P. 2004. Principles and Practices of Interconnection Network. Morgan 
Kaufmann. 
DUATO, J., YALAMANCHILI, S. AND NI, L. 2003. Interconnection Networks: An Engineering Approach. 
Morgan Kaufmann. 
FERNG, H.-W. AND CHANG, J.-F. 2001. Connection-Wise End-to-End Performance Analysis of Queuing 
Networks with MMPP Inputs. Performance Evaluation 43, 39-62. 
FISCHER, W. AND MEIER-HELLSTERN, K. 1993. The Markov-Modulated Poisson Process (MMPP) 
Cookbook. Performance Evaluation 18, 149-171. 
GRECU, C., PANDE, P.P., IVANOV, A. AND SALEH, R. 2004. Structured Interconnect Architecture: A 
Solution for the Non-Scalability of Bus-Based SoCs. In Proceedings of the 14th ACM Great Lakes 
symposium on VLSI, 192-195. 
HEFFES, H. 1980. A Class of Data Traffic Processes-Covariance Function Characterization and Related 
Queueing Results. Bell System Technical Journal 59, 897-929. 
HEFFES, H. AND LUCANTONI, D.M. 1986. A Markov Modulated Characterization of Packetized Voice 
and Data Traffic and Related Statistical Multiplexer Performance. IEEE Journal on Selected Areas in 
Communications 4, 856-867. 
JAVADI, B., AKBARI, M.K. AND ABAWAJY, J.H. 2006. A Performance Model for Analysis of 
Heterogeneous Multi-Cluster Systems. Parallel Computing 32, 831-851. 
KAPRE, N., MEHTA, N., DELORIMIER, M., RUBIN, R., BARNOR, H., WILSON, M.J., WRIGHTON, M. 
AND DEHON, A. 2006. Packet Switched vs. Time Multiplexed FPGA Overlay Networks. In 
Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing 
Machines (FCCM '06), 205-216. 
KLEINROCK, L. 1975. Queueing Systems. John Wiley, New York. 
KODI, A.K., SARATHY, A. AND LOURI, A. 2008. Adaptive Channel Buffers in On-Chip Interconnection 
Networks - A Power and Performance Analysis. IEEE Trans. on Computers 57, 1169-1181. 
LEE, H.G., OGRAS, U.Y., MARCULESCU, R. AND CHANG, N. 2006. Design Space Exploration and 
Prototyping for On-Chip Multimedia Applications. In Proceedings of the 43rd ACM/IEEE Design 
Automation Conference (DAC'06), 137-142. 
LIN, X.-Y., CHUNG, Y.-C. AND HUANG, T.-Y. 2004. A Multiple LID Routing Scheme for Fat-Tree-Based 
InfiniBand Networks. In Proceedings of the IEEE International Parallel and Distributed Processing 
Symposium (IPDPS'04), CD-ROM. 
LIU, K.-H., LING, X., SHEN, X. AND MARK, J.W. 2008. Performance Analysis of Prioritized MAC in 
UWB WPAN With Bursty Multimedia Traffic. IEEE Trans. Vehicular Technology 57, 2462-2473. 
MAJETI, D., PASALAPUDI, A. AND YALAMANCHILI, K. 2009. Low Energy Tree Based Network on 
Chip Architectures Using Homogeneous Routers for Bandwidth and Latency Constrained Multimedia 
Applications. In Proceedings of the International Conference on Emerging Trends in Engineering and 
Technology (ICETET'09), 358-363. 
MARCULESCU, R., OGRAS, U.Y., PEH, L.-S., JERGER, N.E. AND HOSKOTE, Y. 2009. Outstanding 
Research Problems in NoC Design: System, Microarchitecture, and Circuit Perspectives. IEEE Trans. 
on Computer-Aided Design of Integrated Circuits and Systems 28, 3-21. 
MATSUTANI, H., KOIBUCHI, M., YAMADA, Y., HSU, D.F. AND AMANO, H. 2009. Fat H-Tree: A Cost-
Efficient Tree-Based On-Chip Network. IEEE Trans. on Parallel and Distributed Systems 20, 1126-
1141. 
MEIER-HELLSTERN, K.S. 1989. The Analysis of A Queue Arising in Overflow Models. IEEE Trans. on 
Communications 37, 367-372. 
MIN, G. AND OULD-KHAOUA, M. 2004. Performance Modelling and Evaluation of Virtual Channels in 
Multicomputer Networks with Bursty Traffic. Performance Evaluation 58, 143-162. 
MIRZA-AGHATABAR, M., KOOHI, S., HESSABI, S. AND PEDRAM, M. 2007. An Empirical Investigation 
of Mesh and Torus NoC Topologies Under Different Routing Algorithms and Traffic Models. In 
An Analytical Model for On-Chip Interconnects in Multimedia Embedded Systems                                    39:21  
                                                                                                                                         
 
Accepted by ACM Transactions on Embedded Computing Systems, Vol. xx, No. x, Article xx, Publication date: x 2013 
Proceedings of the 10th Euromicro Conference on Digital System Design Architectures, Methods and 
Tools (DSD '07), 19-26. 
MOADELI, M., SHAHRABI, A., VANDERBAUWHEDE, W. AND MAJI, P. 2010. An Analytical 
Performance Model for the Spidergon NoC with Virtual Channels. Journal of Systems Architecture 56, 
16-26. 
OGRAS, U.Y. AND MARCULESCU, R. 2008. Analysis and Optimization of Prediction-Based Flow Control 
in Networks-on-Chip. ACM Trans. on Design Automation of Electronic Systems 13, Article No. 11. 
OULD-KHAOUA, M. AND SARBAZI-AZAD, H. 2001. An Analytical Model of Adaptive Wormhole Routing 
in Hypercubes in the Presence of Hot Spot Traffic. IEEE Trans. on Parallel and Distributed Systems 
12, 283-292. 
PANDE, P.P., GRECU, C., JONES, M., IVANOV, A. AND SALEH, R. 2005. Performance Evaluation and 
Design Trade-Offs for Network-on-Chip Interconnect Architectures. IEEE Trans. on Computers 54, 
1025-1040. 
PENG, H.-K. AND LIN, Y.-L. 2010. An Optimal Warning-Zone-Length Assignment Algorithm for Real-
Time and Multiple-QoS On-Chip Bus Arbitration. ACM Trans. on Embedded Computing Systems 9, 
Article No. 35. 
PFISTER, G.J. AND NORTON, V.A. 1985. Hot-Spot Contention and Combining in Multistage 
Interconnection Networks. IEEE Trans. on Computers 34, 943-948. 
SALMINEN, E., KULMALA, A. AND HAMALAINEN, T.D. 2008. Survey of Network-on-chip Proposals. 
White Paper, OCP-IP. 
SANCHEZ, D., MICHELOGIANNAKIS, G. AND KOZYRAKIS, C. 2010. An Analysis of On-Chip 
Interconnection Networks for Large-Scale Chip Multiprocessors. ACM Transactions on Architecture 
and Code Optimization 7, Article No. 4. 
SARBAZI-AZAD, H., OULD-KHAOUA, M. AND MACKENZIE, L.M. 2001. Analytical Modeling of 
Wormhole-Routed k-Ary n-Cubes in the Presence of Hot-Spot Traffic. IEEE Trans. on Computers 50, 
623-634. 
SCHROEDER, M.D., BIRRELL, A.D., BURROWS, M., MURRAY, H., NEEDHAM, R.M., RODEHEFFER, 
T.L., SATTERTHWAITE, E.H. AND THACKER, C.P. 1991. Autonet: A High-Speed, Self-Configuring 
Local Area Network using Point-to-Point Links. IEEE Journal on Selected Areas in Communications 9, 
1318-1335. 
SHAH-HEYDARI, S. AND LE-NGOC, T. 2000. MMPP Models for Multimedia Traffic. Telecommunication 
Systems 15, 273-293. 
TAKTAK, S., DESBARBIEUX, J.-L. AND ENCRENAZ, E. 2008. A Tool for Automatic Detection of 
Deadlock in Wormhole Networks on Chip. ACM Trans. on Design Automation of Electronic Systems 13, 
Article No. 6. 
VARATKAR, G. AND MARCULESCU, R. 2002. Traffic Analysis for On-Chip Networks Design of 
Multimedia Applications. In Proceedings of the 39th Annual Design Automation Conference (DAC'02), 
795-800. 
VARATKAR, G. AND MARCULESCU, R. 2004. On-Chip Traffic Modeling and Synthesis for MPEG-2 
Video Applications. IEEE Trans. on Very Large Scale Integration (VLSI) Systems 12, 108-119. 
WANG, Z., XU, J., WU, X., YE, Y., ZHANG, W., LIU, W., NIKDAST, M., WANG, X. AND WANG, Z. 2012. 
A Novel Low-Waveguide-Crossing Floorplan for Fat Tree based Optical Networks-on-Chip. In 
Proceedings of the 2012 IEEE Optical Interconnects Conference, 100-101. 
WU, Y., MIN, G., OULD-KHAOUA, M. AND YIN, H. 2008. Analytical Modelling of Pipelined Circuit 
Switching with Bursty and Hot-Spot Traffic. In Proceedings of the the 10th IEEE International 
Conference on High Performance Computing and Communications (HPCC'08) IEEE Computer Society, 
Washington, DC, USA, 470-477. 
WU, Y., MIN, G., OULD-KHAOUA, M. AND YIN, H. 2011. Modelling and Analysis of Pipelined Circuit 
Switching in Interconnection Networks with Bursty Traffic and Hot-spot Destinations. Journal of 
Systems and Software 84, 2097-2106. 
XIONG, Y., LIU, S. AND SUN, P. 2001. On the Defense of the Distributed Denial of Service Attacks: An 
On-Off Feedback Control Approach. IEEE Trans. Systems Man & Cybernetics - Part A: Systems & 
Humans 31, 282-293. 
ZHANG, Y. AND JONES, A.K. 2009. Non-Uniform Fat-Meshes for Chip Multiprocessors. In Proceedings of 
the IEEE International Symposium on Parallel and Distributed Processing (IPDPS'09), 1-8. 
 
Received August 2011;  revised March 2012;  revised November 2012;  accepted March 2013 
