Proposition of a benchmark for evaluation of cores mapping onto NoC architectures by Delorme, Julien et al.
Proposition of a benchmark for evaluation of cores
mapping onto NoC architectures
Julien Delorme, Dominique Houzet, Romain Lemaire, Didier Lattard
To cite this version:
Julien Delorme, Dominique Houzet, Romain Lemaire, Didier Lattard. Proposition of a bench-
mark for evaluation of cores mapping onto NoC architectures. ReCoSoC, 2005, Montpellier,
France. 2005. <hal-00018049>
HAL Id: hal-00018049
https://hal.archives-ouvertes.fr/hal-00018049
Submitted on 27 Jan 2006
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entific research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destine´e au de´poˆt et a` la diffusion de documents
scientifiques de niveau recherche, publie´s ou non,
e´manant des e´tablissements d’enseignement et de
recherche franc¸ais ou e´trangers, des laboratoires
publics ou prive´s.
 - 1 - 
Proposition of a benchmark for evaluation of 
cores mapping onto NoC architectures 
 
Julien Delorme, Dominique Houzet 
INSA/ IETR Laboratory 
20 avenues des Buttes de Coesmes 
35043 Rennes Cedex, France 
julien.delorme@ens.insa-rennes.fr 
dominique.houzet@insa-rennes.fr 
Romain Lemaire, Didier Lattard 
CEA-LETI Grenoble 
Conception and system Integration Department 
17 rue des Martyrs 
38054 Grenoble Cedex 9, France 
romain.lemaire@cea.fr 
didier.lattard@cea.fr  
 
 
Abstract—Complex application specific SoC are often based on 
the NoC approach [1]. NoC are under investigation since 
several years and many architectures have been proposed[2]. 
Generic NoC are often proposed with their synthesis tool in 
order to rapidly tailor a solution for a specific application 
implementation [4][5]. The optimised mapping of cores on a 
NoC [3] and the optimised NoC configuration in terms of 
topology, FIFO and link sizes for instance is a new research 
area which is now investigated deeply. Validation and 
evaluation of solutions is often conducted through simulations 
for deterministic applications. Comparisons between proposed 
optimisation approach is difficult as they use their own 
evaluative application. Benchmarking is a classical solution to 
normalize comparisons. We are proposing in this paper a set of 
application tasks behaviours in order to evaluate NoC 
topologies as well as NoC core mapping techniques. We 
illustrate this benchmark proposition on a specific NoC 
simulation. 
 
Index—SoC, IP, 4G, NoC, benchmark. 
 
I. INTRODUCTION 
 
Future Systems-on-Chip (SoC) for multimedia, video or 
telecommunication will contain a great amount of IP blocks. 
All of these have to be connected together and require a high 
bandwidth to satisfy the Quality of Service (QoS). The 
leading features for SoC Design are scalability, flexibility, 
reusability, and reprogrammability. 
In this way, the Network-on-Chip (NoC) paradigm has been 
proposed and used for interconnecting the cores and 
replacing bus topology. The use of NoC interconnection has 
several advantages including better structure, performance 
and modularity. As a consequence, CAD tools have to 
explore NoC parameters before synthesis with regards to the 
target set of applications. Our approach here is to have a 
benchmarking simulation of the NoC by using a 
communication behaviour model of each computing 
resource interconnected to the cycle accurate model of the 
NoC. Our approach is based on SystemC cycle accurate 
simulations. 
The proposed application is a mobile terminal MC-CDMA 
chain applied to the future 4G Radio telecommunication 
standard. This study is part of the 4MORE [6] European 
project we are involved in. This application is divided in 21 
cores which have been evaluated separately. Throughput, 
data size, treatment latency are provided for each computing 
resources. With such data it is possible to conduct a full 
NoC architectural exploration. The second part of the paper 
illustrates the generic mapping of those cores on a given 
NoC named FAUST[13]. 
 
II. RADIO COMMUNICATION APPLICATION 
 
Future telecommunication systems need to be more and 
more performant with greater bandwidth, higher mobility 
and autonomy to answer the needs of a growing market. 3G 
standard technologies are not yet fully operational, but 
researches already work to specify the fourth generation of 
mobile systems. Performance and optimized power 
consumption are still the key factors for these systems. 
These new systems have to be the most flexible as possible 
and have to support different standards to allow evolution 
and updating of SoCs. Such constraints imply a radical 
change on the actual conception methodologies to designed 
future SoCs. This kind of application is a good candidate to 
NoC based SoC implementation. 
A high-performance candidate for future mobile systems is 
the Multi-Carrier Code Division Multiple Access (MC-
CDMA) technique [7]. This new modulation technique 
brings new algorithms and computation constraints. So 
designers have to face with new implementation constraints 
in the way to propose suitable solutions. We focus here on 
the physical layer, precisely on the implementation of a MC-
CDMA transmitter and receiver in a baseband modem. This 
baseband modem implements different chained processing 
sub-systems schematically presented on Figure 1. The 
different functions in the physical layer impose strong 
constraints on IP blocks concerning power calculation, 
performances and complexity. 
 
 - 2 - 
Figure 1: Block diagram of the TX and RX MC-CDMA physical layer 
 
III. IP BLOCKS BENCHMARKING 
 
As we mentioned before, the idea of our benchmarking 
method is to model each computing resources of the future 
4G chain presented in Figure 1 with several parameters. In 
deed, each resource requires different input data sizes and 
produces different output data sizes. Moreover, they do not 
have the same data treatment time implying different 
latencies between blocks which are data dependent.  
As a consequence, in a NoC implementation this kind of 
evaluation is mandatory to respect real-time constraints due 
to a too long latency between resources. It is especially true 
in our case where we have to respect the frame time 
conditions as mentioned on Figure 2. Also all the blocks can 
be pipelined. The bandwidth constraint is then only to 
respect the treatment of OFDM data per 20.8 microseconds.  
Figure 2: 4MORE frame 
We have modelled all the resources of the Figure 1 and 
parameterised them with the necessary parameters: input 
data size, output data size, computing time, input data path 
(source), output data path (destination). 
With this information we can model the global behaviour of 
each block inside the chain without modelling precisely the 
algorithms used. Thus for instance the parameters of the 
coder core of the TX chain can be specified as below: 
• Input data size: 32 bits 
• Output data size: 64 bits 
• Computing time: 64 Cycles 
• Input data path: Mac Layer 
• Output data path: Bit Interleaving 
Figure 1 mentions the block diagram which gives the paths 
between the different resources, thus only input and output 
data size and compute time of the whole TX chain are 
mentioned in the Table 1. 
 
TX Input data Output data Compute Time
Channel Coder 32 64 64 
Bit Interleaving 256 256 64 
Mapping Unit 32 192 6 
Spreading 256 256 48 
MIMO encoding 1536 1536 50 
FFT 1024 768 40960 2 620 
RF to Base band 32 32 10 
Table 1: TX chain resource parameters 
 
Input and output data size are expressed in bits and compute 
time is in clock cycles to be technology and design 
frequency independent. In the same manner, Table 2 gives 
parameters for the whole RX chain. 
TIME
FREQUENCY
SYNCHRO
RF?BB
FRAME
SYNCRO.
MIMO
CHANNEL
ESTIMATION
FFT
MIMO
Decoding
RF?BB
DECODERSPREADING DE-MAPPING
DE-
INTER-
LEAVING.
FFT
RECEIVED DATA
Receiver (RX)
IFFTMAPPING. MIMOEncodingSPREADING BB?RF
IFFT BB?RF
BIT
INTER-
LEAVING.
CODER
Transmitter (TX)
DATA SOURCE
TG=20.8µs
Tslot=0.667 ms
TOFDM=20.8µs
DL DL UL DL ULTG TG TG TG TGTG
Tslot
P P P P P PS ZD D D D D D D D D D D DD D D D D D D D D D D D
TOFDM
Frame
 - 3 - 
 
RX Input data Output data Compute Time
RF to Base BB 96 96 10 
Frame synchro 40960 32 1 280 
Frequency/timing 
synchronisation 44480 32 1 043 
MIMO channel 
estimation 88960 43008 1 390 
IFFT 1024 40960 21504 2 620 
MIMO decoding 86016 21504 2 016 
De-spreading 21504 2688 4 032 
De-mapping 2688 2016 504 
De-interleaving 1024 1024 256 
Convolutional Decoder  256 32 32 
Table 2: RX chain resource parameters 
All these parameters specify the core treatment of each 
block of the TX and RX chain but do not represent the 
treatment for a complete OFDM symbol. Some computing 
resources have to realize several loops to treat a complete 
OFDM symbol. These particular cases with long treatment 
times like FFT treatment can be a bottleneck to satisfy the 
real-time constraints request for the whole frame. 
With these constraints, we have chosen to study the 
feasibility of the integration of this 4G radiocommunication 
MIMO MC-CDMA chain on the FAUST NoC topology. 
 
IV. FAUST NOC DESCRIPTION 
 
In this study, we use this benchmark to dimension and build 
the interconnection between hardware modules with a 
Network-on-Chip (NoC) structure. NoC are flexible, and 
offer significant aggregated bandwidth since they allow 
simultaneous communications. 
As mentioned earlier, a solution based on independent data 
processing units interconnected by a NoC seems the most 
appropriate for a high data rate Baseband system. In the 
context of radio telecommunication applications, specific 
characteristics can be taken into account to have a well 
adapted network. It is for this reason that we plan to use the 
FAUST NoC of CEA/LETI whose characteristics fits the 
requirements of 4G radiocommunication. The NoC 
mechanism are more detailed in [13]. 
With FAUST, we can build a topology with four links per 
swith and one link for the IP core. The mapping of the MC-
CDMA application exhibits mainly communications with 
few neighbours for each core. In consequence, the network 
doesn’t need a high order topology, a 2D mesh topology is 
enough [9]. 
Secondly, data processing units are specialized and traffic is 
rather predictable here. 
Finally, communications are often bursty and short, so 
packet switching is preferable to circuit-switching since no 
connection time is needed, which is time-consuming. 
FAUST uses wormhole switching mode [11] with a credit 
based control flow. It has low latency, saves memory buffers 
and, with an appropriate routing algorithm, deadlock are 
avoided [12]. 
Functional units are connected to the Network on hip 
through network interface (NI) as showed on  Figure 3. The 
NI manages the NoC communication mechanism by use of 
credits to fill input FIFO and empty output FIFO of the 
application core (HW or SW). Also the NoC architecture is 
well adapted to our radiocommunication application because 
it decouples computing from communication. 
In our case, we check if the latency induced by the NoC 
communication mechanism added to the compute time of 
resources respect the time frame constraint. More precisely, 
we studied the output throughput of the most time 
consuming IFFT core to check if we respect the timing 
constraint of 20.8µs mentioned before Figure 2 for each 
OFDM symbol. The simulation have to demonstrate that the 
system fulfils the real-time requirements.  
 
 Figure 3: FAUST communication concept 
 
Once function partitioning is done, the mapping of 
functional units must be done cleverly in order to minimize 
routing path lengths. All these issues are not trivial and 
architecture exploration and validation is necessary to find 
good configurations before designing the NoC. To achieve 
this aim, we propose in this study to use a first topology as 
mention on Figure 4. This choice have been done manually  
to check rapidly if timing constraints could be satisfied. 
In case of non respect of timing constraints, the NoC have 
several parameters (Size of FIFO, packet size,…) that can be 
modified to reach the constraints. This possibility to change 
the sizes of FIFO can reduce traffic congestion which is 
frequent in NoC. 
 
HW
Interface
node node
nodenode
node node
node
node
node
Computing Units
Communication Resources
SW
Interface
HW
Interface
HW
Interface
HW
Interface
HW
Interface
RAM
Interface
HW
Interface
SW
Interface
 - 4 - 
Figure 4: Function mapping topology on NoC 
 
The FAUST NoC is based on 32 bits node to node and node 
to computing resources links. As a consequence the smallest 
entity exchange on the NoC is 32 bits which is called a flit. 
For the simulation point of view, data given in Table 1and 
Table 2 have to be divided by 32. 
In the next section is presented the first results obtained on 
the SystemC TX chain simulations. 
 
V. SIMULATION RESULTS 
 
As we have mentioned before, the aim of our simulations is 
to check the respect of the timing constraints of the 
radiocommunication frame. 
For this aspect we have focused our study on the IFFT 
which transmits the OFDM symbol to the DAC. We have 
decided to measure the elapse time between the reception of 
data in input FIFO of the NI and the complete emptying of 
the output FIFO. 
The size of FIFOs in input can be bigger than the requested 
flits required by the computing resources. So the elapsed 
time is measured as below: 
oti TTTT ++=  
 
where Ti is the time elapsed between the first flit read and 
the last flit read in the input FIFO until the amount required 
by the resource is satisfied. Tt is the time elapsed for the 
treatment of data. To is the time elapsed between the first flit 
and the last flit sent from the output FIFO to the target 
resource until the amount specified by the resource is 
satisfied. 
The first results are presented on Figure 5. The timing 
constraints on the IFFT is respected compared to the frame 
timing specification but the margin between results shows 
that some effort could be done on FIFO mechanism and 
packet sizes. For this simulation, we have an input FIFO 
size of 1344 flits and an output FIFO size of 1280 flits. 
The input credit size is of 92 flits and the output credit size 
is of 124 flits whereas core requirements are 672 flits in and 
1280 flits out. 
Some simulations have to be made to show the impact of 
these parameters on the throughput of the IFFT and the 
other resources. 
OFDM
MOD.MAPP.
RAM CPU RAM
AHB
FRAME
SYNC.
ODFM
DEM.
CDMA
DEM.
SOFT 
DE-
MAPP.
DE-
INTER.
MIMO
Encoding
MIMO
Decoding
Time
Frequency
Synchro
MIMO
Channel
Estimat°
ODFM
DEM.
CDMA
MOD.
1 2
7 8
3
9
4 5 6
10 11 12
15 16 17 18
21 22 23 2419 20
13 14
BB?RF
OFDM
MOD.
CONV.
DEC.
BB?RF
BIT
INTER.
RF?BB
CONV.
CODER
RF?BB
 - 5 - 
TX IFFT Throughput
0
5000
10000
15000
20000
25000
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29
OFDM symbol
Ti
m
e 
in
 n
s
OFDM symbol Delay
Delay Maximum
 
Figure 5:TX IFFT throughput 
 
VI. CONCLUSIONS AND FUTURE WORK 
 
In this paper we have presented a new application 
benchmark for the evaluation of cores mapping onto NoC 
architectures. We have proposed a portable benchmark for 
any NoC evaluation. We have also illustrated that the NoC 
architecture based on packet switching is realistic for future 
chips, particularly for the future radiocommunication 
standards. We have shown the feasibility of the mapping of 
a 4G chain onto a NoC system but some more exploration 
have to be realized to improve the throughput. 
Our future work is to continue our study on NoC 
architectures for telecommunication systems. We are 
working on the validation of the whole chain (RX and TX), 
and explore NoC dimensioning concerning the FIFO sizes, 
credit sizes and influence of topology.  
 
VII. ACKNOWLEDGMENT 
 
The work presented in this paper was supported by the 
European IST project 4MORE (4G MC-CDMA multiple 
antenna system On chip for Radio Enhancements) [13]. 
 
VIII. REFERENCES 
 
[1] L. Benini, G. De Micheli, “Networks on Chip: A New SoC 
Paradigm”, IEEE Computer, 2002. 
[2] P. Guerrier and A. Greiner. A generic architecture for on-chip 
packet-switched interconnections. In DATE, pp. 250– 256, Mar. 
2000. 
[3] S. Murali, G. De Micheli, "Bandwitdh-Constrained Mapping of 
Cores onto NoC Architectures", DATE 2004. 
[4] D. Bertozzi, A; Jalart, S. Murali, R. Tamhankar, S; Stergiou, L. 
Benini, G. De Micheli, "NoC Synthesis flow for customized 
domain specific multiprocessor systems-on-chip" IEEE trans. on 
parallel and distributed systems, feb. 2005. 
[5] S. Evain, J. P. Diguet, D. Houzet, "µSpider: A CAD Tool for 
Efficient NoC Design", IEEE NORCHIP 2004, Oslo, NORWAY, 
November 8-9, 2004. 
[6] IST 4MORE project http://4more.av.it.pt/ European project 2004-
2007 
[7] A. Chouly, A. Brajal, and S. Jourdan. Orthogonal multicarrier 
techniques applied to direct sequence spread spectrum CDMA 
systems. In GLOBECOM’93, pp. 1723–1728, 1993. 
[8] http://www.systemc.org/ 
[9] R. Thid, M. Millberg, and A. Jantsch. Evaluating NoC 
communication backbones with simulation. In IEEE NorChip 
Conference, Nov. 2003. 
[10] C. Glass and L. Ni. The Turn Model for Adaptive Routing. Journal 
of the Association for Computing Machinery, 41:874–902, Sep. 
1994. 
[11] S. Felperin, P. Raghavan, and E. Upfal. A theory of wormhole 
routing in parallel computers. IEEE Transactions on Computers, 
45:704–713, Jun. 1996. 
[12]  C. Glass and L. Ni. The Turn Model for Adaptive Routing. 
Journal of the Association for Computing Machinery, 41:874–902, 
Sep. 1994. 
[13] D4.4 IST 4MORE project http://4more.av.it.pt/docs/ 
 
