Power, performance and reliability optimisation of on-chip interconnect by adroit use of dark silicon

Abstract

Continuous transistor scaling has enabled computer architecture to integrate increasing numbers of cores on a chip. Packet switched Network-on-Chip (NoC) is envisioned as a scalable and cost effective communication fabric for multicore architectures with tens and hundreds of cores. Extreme transistor scaling (45nm and beyond) has its own share of technical challenges. For recent technology nodes, the power per transistor is not reducing at the same rate as area. Failed Dennard's Scaling has resulted in a situation where we have abundant transistors, but not enough power to switch on these transistors at the same time, a phenomenon termed Dark Silicon. Previous research on dark silicon concentrated on integrating application specific accelerators or cores to improve energy efficiency and reliability, completely neglecting the interplay of dark silicon and NoC architecture.For the first time, this thesis proposes various NoC architectures that exploit dark silicon to improve the energy efficiency, performance and reliability of the on-chip interconnect. The first proposal is an on-chip interconnect, named darkNoC, that consists of multiple NoCs where each NoC is optimised at design time using multi-vt optimisation for different voltage-frequency (VF) levels. This architecture can provide up to 52% saving in NoC energy delay product (EDP) for certain benchmarks, whereas state-of-the-art DVFS scheme only saved 15% EDP. Then, the Malleable NoC architecture is proposed, which further improves the energy efficiency of darkNoC by a combination of multiple VF optimised routers and per node VF selection, and by exploiting the heterogeneity of application workload and application-to-core mapping. Next, this thesis proposes SuperNet NoC architecture, that exchanges dark silicon for optimising the energy, performance and reliability of on-chip interconnect. SuperNet consists of two parallel NoC planes that are optimised for different VF levels, and can be configured at runtime to operate in energy efficient mode, performance mode or reliability mode. Finally, a design flow for designing custom on-chip communication for application specific MPSoCs targeting streaming applications is proposed. To reduce the runtime of the framework, a heuristic with linear time complexity is introduced for exploring exponential design space, reducing framework runtime by 27x compared to a state-of-the-art heuristic

    Similar works

    Full text

    thumbnail-image

    Available Versions