




Federico Angiolini, Srinivasan Murali,
Luca Benini, David Atienza, Antonio Pullini
Intel’s 80-tile NoC






1 poly, 8 metal (Cu)Interconnect
65nm CMOS ProcessTechnology
Vangal et al.  ISSCC 2007
Application domains
n Multiprocessors on chip
n Homogenous fabric




n QoS and power constraints
n Domain specific software
Wireless Networks Mesh nodes, Picocells
Picochip PC205 (Apr’06)
n 260MHz, 31GMAC/s, 160GIP/s 
n 64KB I,D$, 128KB SRAM 








n Roadmap continues: 90®65®45 nm
n “Traditional” Bus-based SoCs fit in one tile !!
Architecture Evolution
n Communication demand is staggering, but unevenly

































n 65 nm low-power library
n low Vt library, high VDD – power/perf tradeoff
n very high frequencies or very long links infeasible
n but even some feasible links burn up to 30 mW!!









n 65 nm low-power library
n High Vt library, low VDD – absolute min power
n Even at 250 MHz, > 2 mm link length infeasible
Addressing Interconnect Issues
n High-end industrial solutions:







n Complexity: how to analyze, verify “spaghetti interconnects”?
n Scalability: bus is bandwidth-limited, Xbar is size-limited


















The “power of NoCs”:
n Clean separation at session layer
n Cores issue end-to-end 
transactions
n Network deals with transport, 
network, link, physical




n Physical design aware
(floorplan global routing)
Scalability is supported from the ground up!
SoC and  NoC Characteristics
n Typical applications targeted by SoCs
n Complex
n Highly heterogeneous (component specialization)
n Communication intensive
n Tailor-made interconnects for applications
n NoCs are resource constrained:
n Power, area constraints – low buffering available
n Large available wire bandwidth
n But tapping it with modular, structured design is key
New design challenges
n From multiprocessor field
n Assigning tasks to processors
n Synchronization, consistency, coherency  
n Networking
n Network topology, routing, flow control
n Quality of Service (QoS) needs 
n VLSI
n Floorplan in 2D, wire lengths
n Power, area, performance



































Orthogonalize computation from communication
Why Design Automation ?
n Large design space, several steps
1. Capturing application traffic
• How to capture ?
• How to account for burst, jitter ?
• What about multiple applications? 
Why Design Automation ?
n Large design space, several steps
1. Capturing application traffic
2. What topology ?
3. Mapping ?
4. Routes to use ?
Why Design Automation ?
n Large design space, several steps
1. Capturing application traffic
2. What topology ?
3. Mapping ?
4. Routes to use ?
-Resource constrained: power, area
-Large wire bandwidth - tapping it 
with modular design is key
More Steps ! 
5. Tuning communication architecture
parameters (link width, buffer sizes)
6. Verification for correctness, performance
7. Build simulation, synthesis, emulation models
8. Reliable operation under unreliable conditions
Automating and integrating the steps essential !










































































































n Design application-specific custom topologies 
Synthesize best topology for application
• Objectives: Power, performance (hop delay)
• Constraints: performance, power, bandwidth
• Tune NoC frequency: 
match needs
• Design deadlock-free network
• Consider timing constraints
early in design cycle
• Use accurate floorplan information





























• Consider bursty traffic, criticality of streams
• Obtained from initial simulations, application knowledge














































































direction of data flowcommunication.xml
[Kees Goossens, NXP]
NOC design flow
n Split large optimization 



























n Split large optimisation 
problem in smaller 
pieces






















n Split large optimisation 
problem in smaller 
pieces























n Split large optimisation 
problem in smaller 
pieces



























n SoCs typically support multiple applications
n Applications can run in parallel: compound modes
n UMARS supports multiple applications















Several NoC CAD efforts
Nostrum simulation environment
NoC buffering with queueing theory [Hu]
OEDIPUS design system [Ahonen]
Case Study 1:




SUNFLOOR vs Manual design
On the 30-core multimedia benchmark
P-processors, M-private memories, 
T-traffic generators, S-shared slaves





















• Design time: weeks
•0.13 µm technology
Hand-mapped design:




























277 mW (-25 )
• Cell area:
37 mm2 (+4 )
• Design time: 4 hours 
design to layout
•0.13 µm technology
Benchmark execution time comply with application 

















































Custom Vs Regular Topologies
§On average, SunFloor 
custom topologies:
§ 2.75x less power 
consumption
§ 1.55x less hop delay
§Despite large design 
space, maximum run 



































































MEM 1E  1




































MEM 1E  1



















































































































§ Lower power in 65nm for same design
§65 nm supports 2x BW, at lower power!
§NoC for a big design (38 cores) operates at 800 MHz
§With increasing app BW or number of cores, more 
switches needed (due to freq limit of switches)
dVOPD
Case Study 3:

























































Energy efficiency: 2.2Gbs/mWà 2.5x better than high-perf NoC
Custom Topology Layout 
Conclusions
n Design flows and CAD tools are critical for NoCs
n Layered design flow 
n Tackle problems from several levels
n Several key steps
n Traffic analysis, mapping, topology design, routing,…
n Integrated approach is critical
n Interact with existing back-end tools
n Fertile ground for more R&D work:
n Run-time configurability
n Robustness w.r.t. to static/dynamic variations, errors
n Tackle floorplan and layout issues
