A Novel Approach for Network on Chip Emulation by Genko, Nicolas et al.
A Novel Approach for
Network on Chip Emulation
Nicolas Genko, LSI/EPFL Switzerland
David Atienza, DACYA/UCM Spain
Giovanni De Micheli, LSI/EPFL Switzerland
Luca Benini, DEIS/Bologna Italy
José Mendias, DACYA/UCM Spain
Roman Hermida, DACYA/UCM Spain
Francky Catthoor, IMEC Belgium
2Outline
• Introduction
• General Approach
• Applications
• Results
• Conclusion
•Introduction
•General
Approach
•Applications
•Results
•Conclusions
3Motivation -- NoCs
• Provide a structured methodology for
realizing on chip communication schemes
– Modularity
– Flexibility
• Overcome the limitations of busses
– Performance and power do not scale up
• Support reliable operation
– Layered approach to error detection and correction
•Introduction
•General
Approach
•Applications
•Results
•Conclusions
4Motivation -- NoC Emulation
• NoCs are designed for:
– On-chip multiprocessing (regular networks)
– Specific applications (ad hoc networks)
• Design tools:
– Synthesis: create NoC circuitry from architectural templates
(e.g., Xpipes)
– Analysis: validate functionality and performance
• Software simulation (cycle accurate)
• Emulation with Field Programmable Gate Arrays (FPGAs)
•Introduction
•General
Approach
•Applications
•Results
•Conclusions
5Previous work
• NoC software simulation:
– High level models in C/C++  [H.-Sheng et al; Kolso et al]
– Evaluate latency NoCs      [Siguenza et al; Angiolini et al]
– Evaluate throughput NoCs  [Wiklund et al; Pestana et al]
• NoC implementation on FPGAs:
– For functional validation    [Marescaux et al; Moraes et al]
– Show effectiveness NoCs         [Kumar et al; Pinto et al]
– Validate NoCs features        [Brebner et al; Zeferino et al]
•Introduction
•General
Approach
•Applications
•Results
•Conclusions
6NoC Emulation on FPGA
• Emulation on FPGA enables functional and
performance validation of NoC based systems
– Accurate execution model
– Probing for profiling and gathering of statistics
• The emulation can achieve important speedups
compared to cycle accurate simulation:
– Up to four orders of magnitude faster
– Real inputs with millions of packets can be used
•Introduction
•General
Approach
•Applications
•Results
•Conclusions
7Outline
• Introduction
• General Approach
• Applications
• Results
• Conclusion
•Introduction
•General
Approach
•Applications
•Results
•Conclusions
8General Approach
• A platform which instantiate a NoC on FPGA
with modules for emulation:
– Traffic generators & receptors
– NoC switches
– Traffic analyzers
– Network interfaces (NIs) to cores can be included
• A system which is controlled by a processor
– The processor configures and controls the traffic
pattern to be emulated and analyzes the
statistics provided by the platform
•Introduction
•General
Approach
•Applications
•Results
•Conclusions
9Development board
Xilinx XUP
•Introduction
•General
Approach
•Applications
•Results
•Conclusions
Power
Programming
cable
Serial interfaceVirtex-II Pro FPGA
•2 Power PC Cores
•3 M programmable gates
10
Emulated NoC Architectures
• Processor linked to each
system component
– Monitor
– Traffic generators
– Traffic receptors
– Traffic analyzers
• Two architectures:
– Network of switches
– Network with switches
and network interfaces
•Introduction
•General
Approach
•Applications
•Results
•Conclusions
11
The NoC Architectural Flavour
•Introduction
•General
Approach
•Applications
•Results
•Conclusions
Open Core
Protocol (OCP)
Network Protocol
core
Net
wo
rk 
Inte
rfac
e switch network
PAYLOAD HEADERTAIL
FLITFLITFLIT…FLIT
•Transmit
– Access routing tables
– Assemble packets
– Split into flits
•Receive
– Synchronize
– Drop routing information
12
Outline
• Introduction
• General Approach
• Applications
• Results
• Conclusion
•Introduction
•General
Approach
•Applications
•Results
•Conclusions
13
Architecture 1-- Network of Switches
• A Processor (PowerPC):
Orchestrates the process and
access each component
independently
• A Monitor:
Displays on the PC screen
the information extracted
• The Emulation Platform:
– Traffic generators
– Traffic receptors
– Network of switches
•Introduction
•General
Approach
•Applications
•Results
•Conclusions
14
• Several types of traffic:
– Stochastic traffic:
• Uniform model
• Burst model (with a two state Markov chain)
– Trace-driven traffic (real workload)
• Several types of statistics:
– Measurement of latency of packets
– Congestion counter (not-acknowledged flits)
• Routing policy evaluation:
– The routing policy is programmed by software
– Evaluation of many routing policies without re-synthesis
•Introduction
•General
Approach
•Applications
•Results
•Conclusions
Emulation of a Network of Switches
15
Architecture 2-- NoCs with interfaces
• Common components:
– Monitor
– Processor
• Additional components:
– Traffic analyzers
– NIs to cores
• Slave core receptiveness
– Modeled by a two-state (on/off)
Markov chain
• Traffic analyzers monitor
network links activity and
interface behavior
•Introduction
•General
Approach
•Applications
•Results
•Conclusions
16
Emulation of NoC with interfaces
• Trace-based traffic:
– Master cores generate traffic according to traces provided
by the processor from real applications
• Statistics generated by this platform:
– Master cores measure average operation execution time
– Slave cores measure packets latency through the NoC
– Traffic analyzers measure ACK & NACK activities on links
• Main use of emulation platform:
– Tuning of a NoC for a specific application
– Latency analysis for application-specific NoC
•Introduction
•General
Approach
•Applications
•Results
•Conclusions
17
Outline
• Introduction
• General Approach
• Applications
• Results
• Conclusion
•Introduction
•General
Approach
•Applications
•Results
•Conclusions
18
FPGA Reports•Introduction•General
Approach
•Applications
•Results
•Conclusions
50 MHz
7914 slices (51%)
(4 switches + 4 master
cores + 4 slave cores)
2. Emulation of a
NoC with NIs
50 MHz
7387 slices (47%)
(6 switches + 4 traffic
generators + 4 traffic
receptors)
1. Emulation of a
Network of switches
SpeedXilinx SlicesEmulation Architecture
19
Speed comparison in cycle-
accurate NoC environments
•Introduction
•General
Approach
•Applications
•Results
•Conclusions
3’20’’3.2 sec50MOur emulationarchitectures
5 days 19h2h13’20K
SystemC
(MPARM)
36 days 4h13h53’3.2K
Verilog
(ModelSim)
Simulation time
For 1000
Mpackets
Simulation time
For 16 Mpackets
Speed
(cycles/sec)
Simulation mode
20
Emulation Network of Switches
• Example of statistics:
– Average latency of packets
• Parameters of the emulation – Burst traffic:
– Average number of packets/burst
– Average number of flits/packet
•Introduction
•General
Approach
•Applications
•Results
•Conclusions
21
Emulation NoC with NIs
• Statistics:
– Ratio Ack/(Ack+Nack).
– Average latency of
packets on the NoC
• Emulation parameters:
– OCP activity
– Average number of R/Ws
per burst
•Introduction
•General
Approach
•Applications
•Results
•Conclusions
22
Outline
• Introduction
• General Approach
• Applications
• Results
• Conclusions
•Introduction
•General
Approach
•Applications
•Results
•Conclusions
23
Conclusions
• Mixed HW/SW framework that helps designers to
design and validate ad-hoc NoCs
• Two architectures:
– Emulation of a network of switches.
– Emulation of a complete NoC with OCP-compliant
interfaces
• The FPGA emulation enables to tune NoC parameters
with realistic inputs (experiments based on traces from
real applications with millions of packets):
– Topology efficiency
– Routing policies
– Latency effects
– OCP traffic pattern influence
•Introduction
•General
Approach
•Applications
•Results
•Conclusions
Thank you
•Introduction
•General
Approach
•Applications
•Results
•Conclusions
•Questions
