unknown

Feasibility of Accelerator Generation to Alleviate Dark Silicon in a Novel Architecture

Abstract

This thesis presents a novel approach to alleviating Dark Silicon problem by reducing power density. Decreasing the size of transistor has generated an increasing on power consumption. To attempt to manage the power issue, processor design has shifted from one single core to many cores. Switching on fewer cores while the others are off helps the chip to cool down and spread power more evenly over the chip. This means that some transistors are always idle while others are working. Therefore, scaling down the size of the chip, and increasing the amount of power to be dissipated, increases the number of inactive transistors. As a result it generates Dark Silicon, which doubles every chip generation [63] One of the most effective techniques to deal with Dark Silicon is to implement accelerators that execute the most energy consumer software functions. In this way the CPU is able to dissipate more energy and reduce the dark silicon issue. This work explores a novel accelerator design model which could be interfaced to a Stack CPU and so could optimise the transistor logic area and improve energy efficiency to tackle the dark silicon problem based on heterogeneous multi-accelerators (co-processor) in stack structure. The contribution of this thesis is to develop a tool to generate coprocessors from software stack machine code. But it also employs up-to-date code optimisation strategies to enhance the code at the input stage. Analysis of the cores using key metrics, based on 65nm synthesis experiments and industry standard tool-sets. It further introduces a novel architecture to decrease the power density of the accelerator. In order to test these expectations, a large-scale synthesis translation experiment was conducted, covering widely recognised benchmarks, and generating a large number of cores (in the thousands). These were evaluated for a range of key metrics: silicon di-area, timing, power, instructions-per-clock, and power density, both with and without code optimisation applied. The results obtained demonstrate that one of two competing core models, ‘Wavecore’ (which is proposed in this thesis), delivers superior power density to the standard approach (which it refers to as Composite core), and that this is achieved without significant cost in terms of critical metrics of overall power consumption and critical path delays. Finally, to understand the benefit of these accelerators, these auto-generated cores are analysed in comparison to a standard stack-machine CPU executing the same code sequences. Both the cores generation work and the benchmark CPU assume a 65nm CMOS process node, and are evaluated with industry standard design tools. It is demonstrated that the generated cores achieve better power efficiency improvements over the relatively CPU core

    Similar works