Computing worst-case contention delays for networks on chip by Cardona, Jordi et al.
Computing Worst-Case Contention Delays for
Networks on Chip
Jordi Cardona∗†, Carles Hernandez∗‡, Jaume Abella∗
∗Barcelona Supercomputing Center, Barcelona, Spain
†Universitat Politècnica de Catalunya, Barcelona, Spain
‡Universitat Politècnica de València, València, Spain
E-mail: {jordi.cardona, carles.hernandez, jaume.abella}@bsc.es
Keywords—NoC, Mesh, WCET, ILP; Contention
I. EXTENDED ABSTRACT
A. Introduction
Computing performance needs in domains such as auto-
motive, avionics, railway, and space are on the rise. This
is fueled by the trend towards implementing an increasing
number of product functionalities in software that ends up
managing huge amounts of data and implementing complex
artificial-intelligence functionalities [1], [2].
Manycores are able to satisfy, in a cost-efficient manner, the
computing needs of embedded real-time industry [3], [4]. In
this line, building as much as possible on manycore solutions
deployed in the high-performance (mainstream) market [5],
[6], contributes to further reduce costs and increase availability.
However, commercial off the shelf (COTS) manycores bring
several challenges for their adoption in the critical embedded
market. One of those is deriving timing bounds to tasks’
execution times as part of the overall timing validation and
verification processes [7]. In particular, the network-on-chip
(NoC) has been shown to be the main resource in which
contention arises, and hence hampers deriving tight bounds
to the timing of tasks [8].
For widely-used wormhole NoCs (wNoCs) [6], [5], several
proposals show how to compute latency upperbounds to the
different flows communicating on the manycore [9], [10] under
some restrictions, e.g. deterministic routing. Unfortunately,
WCET estimates computed with wNoCs are generally pes-
simistic when – as required to achieve composable estimates –
no restrictions are imposed on when and where interference oc-
curs in the wNoC. Interestingly, wNoCs offer several software-
controllable parameters that allow to optimize (reduce) the
worst-case contention delay (WCD) that packets crossing
can suffer. These include mapping, routing, and allocation
of weights (referred to as walloc) to arbitration policies in
each router. NoC contention optimization solutions have been
proposed for mapping [11], [12] and combining routing and
mapping [13], [14]. Additionally, optimal allocation of weights
to achieve fair bandwidth balancing have been also proposed
for TDMA [15] and wNoCs [16]. In general, those solutions
do not tackle all parameters at once, which leads to globally
suboptimal solutions.
Overall, reducing the WCD in NoCs is indeed a multidi-
mensional problem and, to make things worse, strong depen-
dencies exist between the different parameters. For instance,
the impact of routing in WCD is heavily affected by the
mapping of tasks to cores.
Despite the inter-dependences among these parameters, to
our knowledge, no previous work proposes an integral solution
to the problem of WCD reduction simultaneously optimizing
mapping, routing and walloc.
B. Contribution
In this study, we cover this gap by proposing a wNoC ILP-
and stochastic-based optimization framework that minimizes
WCD estimates (and hence WCET estimates) of applications
running in the wNoC-connected manycores.
(a) 2D Mesh router coordinates (b) Unfair round-robin bandwidth al-
location under wormhole switching
Fig. 1: Mesh manycore system and how mapping, routing and
bandwidth allocation parameters have impact in the WCD
We focus the target of this study in systems build with 2D
mesh NoC topologies like the ones showed in Figure 1, even
though the same analysis can be done to other topologies. In
Figure1(a), we show a block diagram of a 2D mesh multicore
system where each node coordinate represents a router that
has attached a processor and/or a memory element and it is
connected to the other routers forming a mesh topology.
Our target is to create a framework that given some tasks
that are going to run on the top of a mesh multicore system,
it optimizes the mapping, routing and bandwidth allocation
configuration of the mesh all at the same time so as to reduce
the WCET estimates of these tasks. One of the possible NoC
configuration outputs after running the optimizing framework
is shown in Figure 1(b). In this figure, we show a feasible
NoC solution for the three main NoC parameters object
of these study: mapping assigned as first come first serve
(FCFS), XY routing algorithm and round-robin bandwidth
arbitration).
In this study:
1) We analyze the main wNoC parameters that cause
variability in WCD (mapping, routing, and walloc)
separately and how they relate to each other. We
propose a particular WCD-centric abstraction to ad-
dress the main sources of jitter in a wNoC, namely:
placement of tasks (threads) to cores, routing, and
weighted bandwidth allocation (walloc).
2) We show that reducing WCD is a multidimensional
problem that we decompose into a stochastic-based
optimization and an ILP formulation. The former
covers the optimization of the routing, whereas the
latter optimizes mapping and walloc.
3) We compare the effectiveness of the ILP method with
respect to hand-made setups and other approaches
that optimize a subset of the parameters. Our results
confirm that our multidimensional optimization ap-
proach achieves performance guarantees that outper-
form the other ones evaluated. We also show that
optimizing virtual-channel (VC) allocation provides
a subset of the configurations obtained with walloc,
so that optimizing walloc makes VC not to provide
any additional advantage.
We focus on high-performance wormhole NoCs in which
time-predictability is achieved by leveraging an optimal con-
figuration of parameters. This includes features like arbitration
and routing already configurable from software in existing real
wNoC designs. Weight allocation, while to our knowledge
it has not been implemented in commercial NoCs yet, it is
widely used in high-performance routers for off-chip wormhole
networks [17]. Given that the implementation cost of weighted
arbitration is quite low [16], they can be included with low cost
in high-performance on-chip designs. Moreover, modifications
required to implement weighted arbitration are local in contrast
to hardware proposals that require global changes like, new
signals among routers and nodes, different flow-control, global
clocks or the like.
REFERENCES
[1] “Intel GO Automated Driving Solution Product Brief,”
https://www.intel.es/content/dam/www/public/us/en/documents/platform-
briefs/go-automated-accelerated-product-brief.pdf.
[2] M. Girone, “Computing Challenges at the Large Hadron Collider
(LHC),” Keynote at the HiPEAC Conference 2018, 2018.
[3] Kalray MPPA 256 Many-Core Processor, http://www.kalray.eu/
products/mppa-manycore,.
[4] R. Ginosar, P. Aviely, T. Israeli, and H. Meirov, “RC64: High perfor-
mance rad-hard manycore,” in 2016 IEEE Aerospace Conference, March
2016, pp. 1–9.
[5] Tilera, TILE-Gx Processors Family
http://www.tilera.com/products/TILE-Gx.php.
[6] S. Ramos and T. Hoefler, “Capability models for manycore memory
systems: A case-study with Xeon Phi KNL,” in 2017 IEEE International
Parallel and Distributed Processing Symposium (IPDPS), May 2017,
pp. 297–306.
[7] M. Paulitsch, O. M. Duarte, H. Karray, K. Mueller, D. Münch,
and J. Nowotsch, “Mixed-criticality embedded systems - A
balance ensuring partitioning and performance,” in 2015 Euromicro
Conference on Digital System Design, DSD 2015, Madeira, Portugal,
August 26-28, 2015, 2015, pp. 453–461. [Online]. Available:
https://doi.org/10.1109/DSD.2015.100
[8] M. Panic, E. Quinones, P. G. Zavkov, C. Hernandez, J. Abella, and F. J.
Cazorla, “Parallel many-core avionics systems,” in 2014 International
Conference on Embedded Software (EMSOFT), Oct 2014, pp. 1–10.
[9] M. Panic, C. Hernandez, E. Quinones, J. Abella, and F. J. Cazorla,
“Modeling high-performance wormhole NoCs for critical real-time em-
bedded systems,” in 2016 IEEE Real-Time and Embedded Technology
and Applications Symposium (RTAS), April 2016, pp. 1–12.
[10] D. Rahmati, S. Murali, L. Benini, F. Angiolini, G. D. Micheli, and
H. Sarbazi-Azad, “Computing accurate performance bounds for best
effort networks-on-chip,” IEEE Transactions on Computers, vol. 62,
no. 3, pp. 452–467, March 2013.
[11] C.-L. Chou and R. Marculescu, “Contention-aware application map-
ping for network-on-chip communication architectures,” in 2008 IEEE
International Conference on Computer Design, Oct 2008, pp. 164–169.
[12] C. Zimmer and F. Mueller, “Low contention mapping of real-time tasks
onto TilePro 64 core processors,” in 2012 IEEE 18th Real Time and
Embedded Technology and Applications Symposium, April 2012, pp.
131–140.
[13] L. Yang, W. Liu, P. Chen, N. Guan, and M. Li, “Task mapping on
SMART NoC: Contention matters, not the distance,” in 2017 54th
ACM/EDAC/IEEE Design Automation Conference (DAC), June 2017,
pp. 1–6.
[14] H. Yu, Y. Ha, and B. Veeravalli, “Communication-aware application
mapping and scheduling for NoC-based MPSoCs,” in Proceedings of
2010 IEEE International Symposium on Circuits and Systems, May
2010, pp. 3232–3235.
[15] M. Shekhar, H. Ramaprasad, and F. Mueller, “Network-on-chip aware
scheduling of hard-real-time tasks,” in Proceedings of the 9th IEEE
International Symposium on Industrial Embedded Systems (SIES 2014),
June 2014, pp. 141–150.
[16] M. Pani, C. Hernandez, J. Abella, A. Roca, E. Quiones, and F. J.
Cazorla, “Improving performance guarantees in wormhole mesh NoC
designs,” in 2016 Design, Automation Test in Europe Conference
Exhibition (DATE), March 2016, pp. 1485–1488.
[17] E. Z. Diego Crupnicoff, Sujal Das, Deploying Quality of Service
and Congestion Control in InfiniBand-based Data Center Networks,
Mellanox Technologies, 2005.
Jordi Cardona is a PhD. Student for the CAOS
group at BSC. He obtained his M.S. degree in
2018 and graduated in Informatics Engineering
in 2016, both titles obtained from the Universitat
Politcnica de Catalunya. He enrolled BSC in
2016 where he started working on the analysis
of COTS networks on chip for real-time multi-
core systems during his master thesis and his
current research focuses on monitor and control
contention in shared resources of critical real-time
systems.
