7 research outputs found

    A system-level methodology for fast multi-objective design space exploration

    Get PDF

    Multi-objective co-exploration of source code transformations and design space architectures for low-power embedded systems

    Full text link
    The exploration of the architectural design space in terms of energy and performance is of mainly importance for a broad range of embedded platforms based on the System-On-Chip approach. This paper proposes a methodology for the co-exploration of the design space composed of architec-tural parameters and source program transformations. A heuristic technique based on Pareto Simulated Annealing (PSA) has been used to efficiently span the multi-objective co-design space composed of the product of the parame-ters related to the selected program transformations and the configurable architecture. The analysis of the proposed framework has been carried out for a parameterized super-scalar architecture executing a selected set of benchmarks. The reported results show the effectiveness of the proposed co-exploration with respect to the independent exploration of the transformation and architectural spaces to efficiently derive approximate Pareto curves

    Towards a performance- and energy-efficient data filter cache

    Full text link
    As CPU data requests to the level-one (L1) data cache (DC) can represent as much as 25% of an embedded processor\u27s total power dissipation, techniques that decrease L1 DC accesses can significantly enhance processor energy efficiency. Filter caches are known to efficiently decrease the number of accesses to instruction caches. However, due to the irregular access pattern of data accesses, a conventional data filter cache (DFC) has a high miss rate, which degrades processor performance. We propose to integrate a DFC with a fast address calculation technique to significantly reduce the impact of misses and to improve performance by enabling one-cycle loads. Furthermore, we show that DFC stalls can be eliminated even after unsuccessful fast address calculations, by simultaneously accessing the DFC and L1 DC on the following cycle. We quantitatively evaluate different DFC configurations, with and without the fast address calculation technique, using different write allocation policies, and qualitatively describe their impact on energy efficiency. The proposed design provides an efficient DFC that yields both energy and performance improvements

    Run-Time Instruction Cache Configurability For Energy Efficiency In Embedded Multitasking Workloads

    Get PDF
    In this thesis we propose a methodology for energy reduction in multitasking computing systems by addressing the energy consumption of the on-chip instruction cache. Our technique leverages recently introduced reconfigurable cache technology to partition the instruction cache at run-time using application specific profile information. Each application is given a sub-section of the cache as its partition which alone is kept active while the corresponding application is executed. The remaining inactive sections are kept in a low-power mode, reducing both dynamic and leakage power. Isolating tasks into disjoint cache partitions also results in eliminating or drastically reducing inter-task I-cache interference. No prior information about the timing of the tasks within the workload is required. In some cases, partitions may be required to overlap, which could degrade performance because of cache interference in the overlapped region. For such cases we propose and evaluate run-time partition update policies which trade-off the power savings to ensure guaranteed performance

    Architectural and Compiler Techniques for Energy Reduction in High-Performance Microprocessors

    No full text
    114 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 1999.More specifically, we propose a technique that uses an additional mini cache located between the instruction cache (I-Cache) and the CPU core; the mini cache buffers instructions that are nested within loops and are continuously fetched from the I-Cache. This mechanism can create very substantial energy savings, since the I-Cache unit is one of the main power consumers in most of today's high-performance microprocessors. Results are reported for the SPEC95 benchmarks in the R-4400 processor which implements the MIPS2 instruction set architecture.U of I OnlyRestricted to the U of I community idenfinitely during batch ingest of legacy ETD

    Architectural and Compiler Techniques for Energy Reduction in High-Performance Microprocessors

    No full text
    INTELOpe

    Energy-Performance Optimization for the Cloud

    Get PDF
    corecore