Systems and Networks on Chip - Challenges and Solutions by De Micheli, Giovanni





What is a System on Chip?
• What is a system?
De Micheli 3







– Multi-cores, multi-threaded sw
• Power-consumption limited
• Very expensive to design
– Non recurring engineering costs








– Flexibility vs optimality
• Example:




• Economic viability requires large production
volumes
– Domain-specific hardware












– Variability and reliability
– Thermal management and networking
• Revolutionary technologies
– Nano and molecular electronics
• Summary and conclusions
De Micheli 9[Diskobolos - Myron: 460BC]
De Micheli 10
Medium term trends
• Feature size downscaling
• Increasing transistor density and clock frequency
Power and thermal management
• Lower supply voltage
Reduced noise immunity
• Increasing the spread of physical parameters
 Inaccurate modeling of physical behavior
Variability and reliability
De Micheli 11
20 nm MOSFET  (2010 ?)
50 Si atoms along the channel
4 nm MOSFET  (2020 ?)















• Models used for design are not accurate enough


















typmax As parameters spread,




• Address variability and
robustness
• Design self-calibrating circuits
operating at the edge of failure
• Examples:
– Dynamic voltage scaling of bus
swings [Ienne –EPFL]
– Dynamic voltage scaling in
processors
• Razor [Austin – U Mich]










































































– Variability and reliability
– Thermal management and networking
• Revolutionary technologies
– Nano and molecular electronics
• Summary and conclusions





– Data corruption due external
radiation exposure
• Crosstalk
– Data corruption due to internal
field exposure
• Both malfunctions manifest
themselves as timing errors
– Error containment
De Micheli 19























• Vary with altitude and latitude
De Micheli 21














































- 4x4 bits: 5-bit ECC
  (instead of 4x3=12)
- 4x8 bits:  6-bit ECC
  (instead of 4x5=20)
Shared ECC











































scan in scan out








– Variability and reliability
– Thermal management and networking
• Revolutionary technologies
– Nano and molecular electronics
• Summary and conclusions
De Micheli 25


























• Keep chip as cool as possible
– Reduce failure rates and power consumption
• In multi processor (core) system,
power management shuts down idle cores
– The temperature distribution will change in time
– Thermal stress may increase




• Use stand-by components to
replace faulty ones













From power to system
management
• Analyze system-level reliability
– as a function of a power management policy
• Determine a system management policy
– to maximize reliability (over a time interval) and
minimize energy consumption
• Determine a system management policy and
system back-up topology




• Reliability and energy management can be
modeled by stochastic processes
– Stochastic optimum control for policy design
– As more accurate models are required, policy
design is harder
• Simulation of system management policies is
useful for assessing effectiveness of
redundancy and energy cost
– Simulation results show dominant effect of
temperature and its cycling on system reliability
• Optimal policy design is also possible
De Micheli 33
Effect of DPM policy on MTTF
• Power and temperature gap
between active and sleep state
• Small gap
– Thermal cycle effects dominate
EM and TDDB only in the lower
temperature spectrum
– MTTF decreases/increases as
DPM gets more aggressive
• Wider gap
– Thermal cycles effects dominate






– Variability and reliability
– Thermal management and networking
• Revolutionary technologies
– Nano and molecular electronics
• Summary and conclusions
De Micheli 35[Dorifero of Policleto]
De Micheli 36
Component-based SoC design
• SoCs are designed (re)-using large macrocells
– Processors, controllers, memories…
– Plug and play methodology is very desirable
– Components are qualified before use
• Design challenge:
– Provide a functionally-correct, reliable operation of the
interconnected components
   Critical issue:




• Communication scalability is the bottleneck
De Micheli 38
Entire Chip is Not Reachable in One Clock Cycle !
[Source: Leblebici]
De Micheli 39
Why on-chip networking ?
• Provide a structured methodology
 for realizing on-chip communication
– Modularity
– Flexibility
• Cope with inherent limitations of busses
– Performance and power of busses do not scale up
• Support reliable operation








the RAW architecture [MIT]
• Fully programmable SoC
– Homogenous array of tiles:
• Processor cores
with local storage
• Each tile has a router
• The raw architecture is exposed to the compiler
– Cores and routers are programmable
– Compiler determines which wires are used at each cycle




Metrics for NoC design
• Low communication latency
– Streamlined control protocols
– Data and control signals can be separate
• High communication bandwidth
– To support demanding SW applications
– Great match to stream computing
• Low energy consumption
– Wiring switched capacitance dominates
• Error resiliency
– To compensate/correct electrical-level errors






• ECC in switches
– Global end to end




Flexibility in NoC design









• Several parameters for optimization








































• Support for several topologies
and routing functions
• Ar a, power, delay optimization
  Comparisons
 130nm UMC library
 Cores: 1mm² obstructions (ARM cores, 32kB SRAM)










2 NIs + 1 switch
 Summary of results:
2.7% vs. 17% post P&R timing degradation → Much improved physical scalability
Clock frequency 885 vs. 400 MHz → Much faster
16% application speedup (longer latency, but more effective bandwidth)
7x more area and 5x more  power (mostly due to flip-flops in buffers)
Overall better energy efficiency for >4 Watts Proc&Mem power
Predictability is hig ly enhanced





– Variability and reliability
– Thermal management and networking
• Revolutionary technologies
– Nano and molecular electronics





• When will current semiconductor technologies run out of steam?
• What factor will provide a radical change in technology?
– Performance, power density, cost?
• Several emerging technologies:
– Silicon nanowires, carbon nanotubes, single-electron devices,
molecular switches, quantum devices, biological computing, …
• Are these technologies compatible with silicon?
– What is the transition path?





• Self-assembly used to create structures
– Manufacturing paradigm is bottom-up
• Significant presence of physical defects
– Massively fault-tolerant design style
• Competitive advantage stems from the
 high density of computing elements





– Includes scaled-down traditional CMOS
– Challenges induced by nanometric scale
– Scaling limit?
• Molecular electronics
– Devices exploit molecular structure










• Massive parallelism and redundancy
• Local and global configuration
• Regular layout
– Exploit properties of crosspoint architectures
• E.g., Programmable Logic Arrays (PLAs)
– Wiring delay are predictable





• Device level redundancy
– Duplicate transistors to achieve broader
coverage
– Cover Boolean implicants more than once
• New paradigm for testing
– Circuit with faulty devices may still be OK
– Exploit, rather than remove, redundancy







































General weighted averaging and re-scaling function
 used in the third layer
[Leblebici and Schmid] 
De Micheli 56
Architectural implications
• Modularity, redundancy, regularity
• Cellular approach to computation
– Cellular nonlinear networks
– Stream computing
• Programming paradigms
– Designer need to think “parallel” to exploit




– Variability and reliability
– Thermal management and networking
• Revolutionary technologies
– Nano and molecular electronics
• Summary and conclusions
De Micheli 58
Summary and conclusions
• The electronic market is driven by embedded applications where
performance and reliability are key figures of merit
• Hardware systems are more prone to fail
– Variations in manufacturing
– Hard and soft malfunctions
• Reliability can be enhanced by component and communication redundancy
– System management is critical for long-lasting operation
– On-chip networks support redundancy
• Massive parallelism and redundancy are key to design highly-dependable
circuits with nano-technologies
– Sub 45nm CMOS technologies
– Novel silicon and non-silicon based nano-technologies
De Micheli 59
De Micheli 60
