69 research outputs found

    Optimizing the flash-RAM energy trade-off in deeply embedded systems

    Full text link
    Deeply embedded systems often have the tightest constraints on energy consumption, requiring that they consume tiny amounts of current and run on batteries for years. However, they typically execute code directly from flash, instead of the more energy efficient RAM. We implement a novel compiler optimization that exploits the relative efficiency of RAM by statically moving carefully selected basic blocks from flash to RAM. Our technique uses integer linear programming, with an energy cost model to select a good set of basic blocks to place into RAM, without impacting stack or data storage. We evaluate our optimization on a common ARM microcontroller and succeed in reducing the average power consumption by up to 41% and reducing energy consumption by up to 22%, while increasing execution time. A case study is presented, where an application executes code then sleeps for a period of time. For this example we show that our optimization could allow the application to run on battery for up to 32% longer. We also show that for this scenario the total application energy can be reduced, even if the optimization increases the execution time of the code

    Influence of Memory Hierarchies on Predictability for Time Constrained Embedded Software

    Full text link
    Safety-critical embedded systems having to meet real-time constraints are expected to be highly predictable in order to guarantee at design time that certain timing deadlines will always be met. This requirement usually prevents designers from utilizing caches due to their highly dynamic, thus hardly predictable behavior. The integration of scratchpad memories represents an alternative approach which allows the system to benefit from a performance gain comparable to that of caches while at the same time maintaining predictability. In this work, we compare the impact of scratchpad memories and caches on worst case execution time (WCET) analysis results. We show that caches, despite requiring complex techniques, can have a negative impact on the predicted WCET, while the estimated WCET for scratchpad memories scales with the achieved Performance gain at no extra analysis cost.Comment: Submitted on behalf of EDAA (http://www.edaa.com/

    Memory Optimizations for Time-Predictable Embedded Software

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    WCET-Aware Scratchpad Memory Management for Hard Real-Time Systems

    Get PDF
    abstract: Cyber-physical systems and hard real-time systems have strict timing constraints that specify deadlines until which tasks must finish their execution. Missing a deadline can cause unexpected outcome or endanger human lives in safety-critical applications, such as automotive or aeronautical systems. It is, therefore, of utmost importance to obtain and optimize a safe upper bound of each task’s execution time or the worst-case execution time (WCET), to guarantee the absence of any missed deadline. Unfortunately, conventional microarchitectural components, such as caches and branch predictors, are only optimized for average-case performance and often make WCET analysis complicated and pessimistic. Caches especially have a large impact on the worst-case performance due to expensive off- chip memory accesses involved in cache miss handling. In this regard, software-controlled scratchpad memories (SPMs) have become a promising alternative to caches. An SPM is a raw SRAM, controlled only by executing data movement instructions explicitly at runtime, and such explicit control facilitates static analyses to obtain safe and tight upper bounds of WCETs. SPM management techniques, used in compilers targeting an SPM-based processor, determine how to use a given SPM space by deciding where to insert data movement instructions and what operations to perform at those program locations. This dissertation presents several management techniques for program code and stack data, which aim to optimize the WCETs of a given program. The proposed code management techniques include optimal allocation algorithms and a polynomial-time heuristic for allocating functions to the SPM space, with or without the use of abstraction of SPM regions, and a heuristic for splitting functions into smaller partitions. The proposed stack data management technique, on the other hand, finds an optimal set of program locations to evict and restore stack frames to avoid stack overflows, when the call stack resides in a size-limited SPM. In the evaluation, the WCETs of various benchmarks including real-world automotive applications are statically calculated for SPMs and caches in several different memory configurations.Dissertation/ThesisDoctoral Dissertation Computer Science 201

    Power-Efficient and Low-Latency Memory Access for CMP Systems with Heterogeneous Scratchpad On-Chip Memory

    Get PDF
    The gradually widening speed disparity of between CPU and memory has become an overwhelming bottleneck for the development of Chip Multiprocessor (CMP) systems. In addition, increasing penalties caused by frequent on-chip memory accesses have raised critical challenges in delivering high memory access performance with tight power and latency budgets. To overcome the daunting memory wall and energy wall issues, this thesis focuses on proposing a new heterogeneous scratchpad memory architecture which is configured from SRAM, MRAM, and Z-RAM. Based on this architecture, we propose two algorithms, a dynamic programming and a genetic algorithm, to perform data allocation to different memory units, therefore reducing memory access cost in terms of power consumption and latency. Extensive and intensive experiments are performed to show the merits of the heterogeneous scratchpad architecture over the traditional pure memory system and the effectiveness of the proposed algorithms

    Processor Energy Characterization for Compiler-Assisted Software Energy Reduction

    Get PDF

    Reducing memory space consumption through dataflow analysis

    Get PDF
    Memory is a key parameter in embedded systems since both code complexity of embedded applications and amount of data they process are increasing. While it is true that the memory capacity of embedded systems is continuously increasing, the increases in the application complexity and dataset sizes are far greater. As a consequence, the memory space demand of code and data should be kept minimum. To reduce the memory space consumption of embedded systems, this paper proposes a control flow graph (CFG) based technique. Specifically, it tracks the lifetime of instructions at the basic block level. Based on the CFG analysis, if a basic block is known to be not accessible in the rest of the program execution, the instruction memory space allocated to this basic block is reclaimed. On the other hand, if the memory allocated to this basic block cannot be reclaimed, we try to compress this basic block. This way, it is possible to effectively use the available on-chip memory, thereby satisfying most of instruction/data requests from the on-chip memory. Our experiments with this framework show that it outperforms the previously proposed CFG-based memory reduction approaches. © 2011 Elsevier Ltd. All rights reserved

    Memory Power Optimization of Java-Based Embedded Systems Exploiting Garbage Collection Information

    Get PDF
    Nowadays, Java is used in all types of embedded devices. For these memory-constrained systems, the automatic dynamicmemorymanager (Garbage Collector or GC) has been always a key factor in terms of the Java Virtual Machine (JVM) performance. Moreover, in current embedded platforms, power consumption is becoming as important as performance. Thus, in this paper we present an exploration, from an energy viewpoint, of the different possibilities of memory hierarchies for high-performance embedded systems when used by state-of-the-art GCs. This is a starting point for a better understanding of the interactions between the Java applications, the memory hierarchy and the GC. Hence, we subsequently present two techniques to reduce energy consumption on Java-based embedded systems, based on exploiting GC information. The first technique uses GC execution behavior to reduce leakage energy consumption taking advantage of the low-power mode of actual multi-banked SDRAM memories and it is intended for generational collectors. This technique can achieve a reduction up to 50% of SDRAM memory leakage. The second technique involves the inclusion of a software-controlled (scratchpad) memory that stores GC instructions under the JVM control to reduce the active energy consumption and also improve the performance of the target embedded system and it is aimed at all kind of garbage collectors. For this last technique we have experimented with two different approaches for selecting the GC code to be stored in the scratchpad memory: one static and one dynamic. Our experimental results show that the proposed dynamic scratchpad management approach for GCs enables up to 63% energy consumption reduction and 25% performance improvement during the collector phase, which means, in terms of JVM execution, a global reduction of 29% and 17% for energy and cycles respectively. Overall, this work outlines that the key for an efficient low-power implementation of Java Virtual Machines for high-performance embedded systems is the synergy between the GC choice, the memory architecture tuning, and the inclusion of power management schemes controlled by the JVM, exploiting knowledge of the GC behavior

    Gerenciamento explícito de memória auxiliar a partir de arquivos-objeto para melhoria da eficiência energética de sistemas embarcados

    Get PDF
    Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Ciência da Computação, Florianópolis, 2010Memórias de rascunho (Scratchpad Memories - SPM) tornaram-se populares em sistemas embarcados por conta de sua eficiência energética. A literatura sobre SPMs parece indicar que a alteração dinâmica de seu conteúdo suplanta a alocação estática. Embora técnicas overlay-based (OVB) operando em nível de código-fonte possam beneficiar-se de múltiplos hot spots para uma maior economia de energia, elas não conseguem explorar elementos de programa oriundos de bibliotecas. Entretanto, quando operam diretamente em binários, as abordagens OVB conduzem a uma menor economia, frequentemente exigem hardware dedicado e às vezes impossibilitam a alocação de dados. Por outro lado, a economia de energia reportada por todas as técnicas, até o momento, ignora o fato de que, em sistemas que possuem caches, estas deverão ser otimizadas antes da alocação para SPM. Este trabalho mostra evidência experimental de que, quando métodos non-overlay-based (NOB) são utilizados para manipulação de arquivos binários, a economia de energia em memória, por conta da alocação em SPM, varia entre 15% a 33%, e média, e é tão boa ou melhor do que a economia reportada para abordagens OVB que operam sobre binários. Como esta economia (ao contrário dos trabalhos correlatos) foi medida após o ajuste-fino das caches - quando existe menos espaço para otimização -, estes resultados estimulam o uso de métodos NOB, mais simples, para a construção de alocadores capazes de considerar elementos de bibliotecas e que não dependam de hardware especializado. Este trabalho também mostra que, dada uma capacidade CT de uma cache pré-ajustada equivalente, o tamanho ótimo de SPM reside em [CT//2, CT] para 85% dos programas avaliados. Finalmente, mostram-se evidências contra-intuitivas de que, mesmo para arquiteturas baseadas em cache contendo SPMs pequenas, é preferível utilizar-se a granularidade de procedimentos à de blocos básicos, exceto em algumas poucas aplicações que combinam elementos frequentemente acessados e taxas de faltas relativamente altas
    corecore