31 research outputs found

    Memory and compiler optimizations for low-power and -energy.

    Get PDF
    ICOOOLPS'2006 was co-located with the 20th European Conference on Object-Oriented Programming (ECOOP'2006).International audienceEmbedded systems become more and more widespread, especially autonomous ones, and clearly tend to be ubiquitous. In such systems, low-power and low-energy usage get ever more crucial. Furthermore, these issues also become paramount in (massively) multi-processors systems, either in one machine or more widely in a grid. The various problems faced pertain to autonomy, power supply possibilities, thermal dissipation, or even sheer energy cost. Although it has since long been studied in harware, energy optimization is more recent in software. In this paper, we thus aim at raising awareness to low-power and low-energy issues in the language and compilation community. We thus broadly but briefly survey techniques and solutions to this energy issue, focusing on a few specific aspects in the context of compiler optimizations and memory management

    Fast, predictable and low energy memory references through architecture-aware compilation

    Get PDF
    The design of future high-performance embedded systems is hampered by two problems: First, the required hardware needs more energy than is available from batteries. Second, current cache-based approaches for bridging the increasing speed gap between processors and memories cannot guarantee predictable real-time behavior. A contribution to solving both problems is made in this paper which describes a comprehensive set of algorithms that can be applied at design time in order to maximally exploit scratch pad memories (SPMs). We show that both the energy consumption as well as the computed worst case execution time (WCET) can be reduced by up to to 80% and 48%, respectively, by establishing a strong link between the memory architecture and the compiler

    Compilation et faible consommation

    Get PDF
    Chronique. Parution initialement prévue en 2006-2007, puis repoussée à 2008... Modifications à faire.Résumé : In autonomous embedded systems --- which become more and more present in our daily life --- energy management is a crucial issue. Although it has been widely studied from a hardware point of view, this domain remains less explored in software, especially with respect to compiler support where historically optimizations generally dealt with speed. This paper thus aims at providing a survey of low-power and low-energy compilation techniques. It briefly introduces low-level techniques, theNational audienceIn autonomous embedded systems --- which become more and more present in our daily life --- energy management is a crucial issue. Although it has been widely studied from a hardware point of view, this domain remains less explored in software, especially with respect to compiler support where historically optimizations generally dealt with speed. This paper thus aims at providing a survey of low-power and low-energy compilation techniques. It briefly introduces low-level techniques, then presents more largely higher-level techniques, especially those pertaining to memory management and operation modes of resources, before drawing the domain perspectives

    Dynamic code mapping for limited local memory systems

    Full text link
    Abstract—This paper presents heuristics for dynamic man-agement of application code on limited local memories present in high-performance multi-core processors. Previous techniques formulate the problem using call graphs, which do not capture the temporal ordering of functions. In addition, they only use a conservative estimate of the interference cost between functions to obtain a mapping. As a result previous techniques are unable to achieve efficient code mapping. Techniques proposed in this paper overcome both these limitations and achieve superior code mapping. Experimental results from executing benchmarks from MiBench onto the Cell processor in the Sony Playstation 3 demonstrate upto 29 % and average 12 % performance improve-ment, at tolerable compile-time overhead. I

    Analysis of scratch-pad and data-cache performance using statistical methods

    Full text link
    Abstract—An effectively designed and efficiently used memory hierarchy, composed of scratch-pads or cache, is seen today as the key to obtaining energy and performance gains in data-dominated embedded applications. However, an unsolved problem is – how to make the right choice between the scratch-pad and the data-cache for different class of applications? Recent studies show that applications with regular and manifest data access patterns (e.g. matrix multiplication) perform better on the scratch-pad compared to the cache. In the case of dynamic applications with irregular and non-manifest access patterns, it is however commonly and intuitively believed that the cache would perform better. In this paper, we show by theoretical analysis and empirical results that this intuition can sometimes be misleading. When access-probabilities remain fixed, we prove that the scratch-pad, with an optimal mapping, will always outperform the cache. We also demonstrate how to map dynamic applications efficiently to scratch-pad or cache and additionally, how to accurately predict the performance. I

    Towards a performance- and energy-efficient data filter cache

    Full text link
    As CPU data requests to the level-one (L1) data cache (DC) can represent as much as 25% of an embedded processor\u27s total power dissipation, techniques that decrease L1 DC accesses can significantly enhance processor energy efficiency. Filter caches are known to efficiently decrease the number of accesses to instruction caches. However, due to the irregular access pattern of data accesses, a conventional data filter cache (DFC) has a high miss rate, which degrades processor performance. We propose to integrate a DFC with a fast address calculation technique to significantly reduce the impact of misses and to improve performance by enabling one-cycle loads. Furthermore, we show that DFC stalls can be eliminated even after unsuccessful fast address calculations, by simultaneously accessing the DFC and L1 DC on the following cycle. We quantitatively evaluate different DFC configurations, with and without the fast address calculation technique, using different write allocation policies, and qualitatively describe their impact on energy efficiency. The proposed design provides an efficient DFC that yields both energy and performance improvements
    corecore