3 research outputs found

    Energy-efficient data prefetch buffering for low-end embedded processors

    Get PDF
    An energy-efficient architecture should jointly optimize energy consumption and throughput, as captured by the Energy-Delay-Square Product (ED2P) metric. This paper introduces a prefetch data buffer micro-architecture, which achieves that goal with the aid of software-inserted control words to govern the prefetch process. The proposed architecture is aimed at low-end embedded processors, which, so as to reduce energy consumption, lack a cache-based memory hierarchy. By identifying after compilation which data should be prefetched and modifying the object code, the rate of prefetch misses is reduced. And by pre-computing memory addresses using auxiliary software after compilation and modifying the object code, address computation by hardware at run time is avoided, reducing pipeline stalls and, thus, improving throughput. Additionally in the case of branches, by prefetching two data items at any one time, alternative instruction outcomes are anticipated. The paper contains results from running a range of well-known and representative benchmarks on the proposed architecture. There was an improvement of 6−20% compared to an unbuffered architecture in execution times when tested over those seven benchmarks. Furthermore, the average ED2P for the buffered architecture when normalized against the same architecture without buffering was found to vary between 54% and 90% according to benchmarking, though there is a cost in code size increase. That is to say, for the benchmarks tested there was a net energy efficiency improvement of between 10% and 46% in comparison with the equivalent unbuffered architecture with a lower area overhead

    Energy analysis and optimisation techniques for automatically synthesised coprocessors

    Get PDF
    The primary outcome of this research project is the development of a methodology enabling fast automated early-stage power and energy analysis of configurable processors for system-on-chip platforms. Such capability is essential to the process of selecting energy efficient processors during design-space exploration, when potential savings are highest. This has been achieved by developing dynamic and static energy consumption models for the constituent blocks within the processors. Several optimisations have been identified, specifically targeting the most significant blocks in terms of energy consumption. Instruction encoding mechanism reduces both the energy and area requirements of the instruction cache; modifications to the multiplier unit reduce energy consumption during inactive cycles. Both techniques are demonstrated to offer substantial energy savings. The aforementioned techniques have undergone detailed evaluation and, based on the positive outcomes obtained, have been incorporated into Cascade, a system-on-chip coprocessor synthesis tool developed by Critical Blue, to provide automated analysis and optimisation of processor energy requirements. This thesis details the process of identifying and examining each method, along with the results obtained. Finally, a case study demonstrates the benefits of the developed functionality, from the perspective of someone using Cascade to automate the creation of an energy-efficient configurable processor for system-on-chip platforms

    Energy characterization of hardware-based data prefetching

    No full text
    This paper evaluates several hardware-based data prefetching techniques from an energy perspective, and explores their energy/performance tradeoffs. We present detailed simulation results and make performance and energy comparisons between different configurations. Power characterization is provided based on HSpice circuit-level simulation of state-of-the-art low-power cache designs implemented in deep-submicron process technology. This is combined with architecture-level simulation of switching activities in the memory system. The results show that while aggressive prefetching techniques often help to improve performance, they increase energy consumption in most of the cases. In designs implemented in deepsubmicron 100-nm BPTM process technology, cache leakage becomes one of the dominant factors of the energy consumption. We have, however, found that if leakage is optimized with recently-proposed circuit-level techniques, most of the energy degradation is due to prefetch-hardware related costs and unnecessary L1 data cache lookups related to prefetches that hit in the L1 cache. This overhead on the memory system can be as much as 20%. 1
    corecore