6 research outputs found

    Management of Scratchpad Memory Using Programming Techniques

    Get PDF
    Consuming the conventional approaches, processors are incapable to achieve effective energy reduction. In upcoming processors on-chip memory system will be the major restriction. On-chip memories are managed by the software SMCs (Software Managed Chips), and are work with caches (on-chip), where inside a block of caches software can explicitly read as well as write specific or complete memory references, or work separately just like scratchpad memory. In embedded systems Scratch memory is generally used as an addition to caches or as a substitute of cache, but due to their comprehensive ease of programmability cache containing architectures are still to be chosen in numerous applications. In contrast to conventional caches in embedded schemes because of their better energy and silicon range effectiveness SPM (Scratch-Pad Memories) are being progressively used. Power consumption of ported applications can significantly be lower as well as portability of scratchpad architectures will be advanced with the language agnostic software management method which is suggested in this manuscript. To enhance the memory configuration and optimization on relevant architectures based on SPM, the variety of current methods are reviewed for finding the chances of optimizations and usage of new methods as well as their applicability to numerous schemes of memory management are also discussed in this paper

    Processor Energy Characterization for Compiler-Assisted Software Energy Reduction

    Get PDF

    Power-Efficient and Low-Latency Memory Access for CMP Systems with Heterogeneous Scratchpad On-Chip Memory

    Get PDF
    The gradually widening speed disparity of between CPU and memory has become an overwhelming bottleneck for the development of Chip Multiprocessor (CMP) systems. In addition, increasing penalties caused by frequent on-chip memory accesses have raised critical challenges in delivering high memory access performance with tight power and latency budgets. To overcome the daunting memory wall and energy wall issues, this thesis focuses on proposing a new heterogeneous scratchpad memory architecture which is configured from SRAM, MRAM, and Z-RAM. Based on this architecture, we propose two algorithms, a dynamic programming and a genetic algorithm, to perform data allocation to different memory units, therefore reducing memory access cost in terms of power consumption and latency. Extensive and intensive experiments are performed to show the merits of the heterogeneous scratchpad architecture over the traditional pure memory system and the effectiveness of the proposed algorithms

    Application-aware Performance Optimization for Software Managed Manycore Architectures

    Get PDF
    abstract: One of the main goals of computer architecture design is to improve performance without much increase in the power consumption. It cannot be achieved by adding increasingly complex intelligent schemes in the hardware, since they will become increasingly less power-efficient. Therefore, parallelism comes up as the solution. In fact, the irrevocable trend of computer design in near future is still to keep increasing the number of cores while reducing the operating frequency. However, it is not easy to scale number of cores. One important challenge is that existing cores consume too much power. Another challenge is that cache-based memory hierarchy poses a serious limitation due to the rapidly increasing demand of area and power for coherence maintenance. In this dissertation, opportunities to resolve the aforementioned issues were explored in two aspects. Firstly, the possibility of removing hardware cache altogether, and replacing it with scratchpad memory with software management was explored. Scratchpad memory consumes much less power than caches. However, as data management logic is completely shifted to Software, how to reduce software overhead is challenging. This thesis presents techniques to manage scratchpad memory judiciously by exploiting application semantics and knowledge of data access patterns, thereby enabling optimization of data movement across the memory hierarchy. Experimental results show that the optimization was able to reduce stack data management overhead by 13X, produce better code mapping in more than 80% of the case, and improve performance by 83% in heap management. Secondly, the possibility of using software branch hinting to replace hardware branch prediction to completely eliminate power consumption on corresponding hardware components was explored. As branch predictor is removed from hardware, software logic is responsible for reducing branch penalty. Techniques to minimize the branch penalty by optimizing branch hint placement were proposed, which can reduce branch penalty by 35.4% over the state-of-the-art.Dissertation/ThesisDoctoral Dissertation Computer Science 201

    Compiler and Runtime for Memory Management on Software Managed Manycore Processors

    Get PDF
    abstract: We are expecting hundreds of cores per chip in the near future. However, scaling the memory architecture in manycore architectures becomes a major challenge. Cache coherence provides a single image of memory at any time in execution to all the cores, yet coherent cache architectures are believed will not scale to hundreds and thousands of cores. In addition, caches and coherence logic already take 20-50% of the total power consumption of the processor and 30-60% of die area. Therefore, a more scalable architecture is needed for manycore architectures. Software Managed Manycore (SMM) architectures emerge as a solution. They have scalable memory design in which each core has direct access to only its local scratchpad memory, and any data transfers to/from other memories must be done explicitly in the application using Direct Memory Access (DMA) commands. Lack of automatic memory management in the hardware makes such architectures extremely power-efficient, but they also become difficult to program. If the code/data of the task mapped onto a core cannot fit in the local scratchpad memory, then DMA calls must be added to bring in the code/data before it is required, and it may need to be evicted after its use. However, doing this adds a lot of complexity to the programmer's job. Now programmers must worry about data management, on top of worrying about the functional correctness of the program - which is already quite complex. This dissertation presents a comprehensive compiler and runtime integration to automatically manage the code and data of each task in the limited local memory of the core. We firstly developed a Complete Circular Stack Management. It manages stack frames between the local memory and the main memory, and addresses the stack pointer problem as well. Though it works, we found we could further optimize the management for most cases. Thus a Smart Stack Data Management (SSDM) is provided. In this work, we formulate the stack data management problem and propose a greedy algorithm for the same. Later on, we propose a general cost estimation algorithm, based on which CMSM heuristic for code mapping problem is developed. Finally, heap data is dynamic in nature and therefore it is hard to manage it. We provide two schemes to manage unlimited amount of heap data in constant sized region in the local memory. In addition to those separate schemes for different kinds of data, we also provide a memory partition methodology.Dissertation/ThesisPh.D. Computer Science 201

    Gerenciamento energeticamente eficiente de memória para multiprocessamento em chip explorando múltiplas scratchpads

    Get PDF
    Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro Tecnológico. Programa de Pós-Graduação em Ciência da ComputaçãoA fim de proporcionar a alta capacidade de processamento requerida pelos dispositivos eletrônicos pessoais, sem ultrapassar os limites aceitáveis de potência e de consumo de energia, os sistemas em chip (SoCs) adotam o multiprocessamento. Para tanto, os SoCs possuem 2, 4 ou mais processadores, cada um com caches L1 privativas, conectados por meio de um barramento. Como o espaço de endereçamento visto pelos processadores é único, a programação do sistema pode assumir o modelo de memória compartilhada. A coerência entre as caches geralmente é assegurada pelo protocolo snooping. Para tirar proveito do paralelismo dos SoCs multiprocessados (MPSoCs), aplicações são desenvolvidas com uso de múltiplas threads executando concorrentemente. Neste contexto, observa-se que os dados de pilha de uma dada thread são acessados somente pelo processador no qual a thread está executando. Desta forma, a relocação da pilha para memória scratchpad (SPM) pode ser explorada para reduzir a energia do subsistema de memória. Esta redução advém não apenas da menor energia gasta em cada acesso à pilha, mas também da redução das faltas nas caches L1 de dados e da penalidade imposta pelo protocolo snooping. No presente trabalho propõe-se uma técnica para o gerenciamento dinâmico de dados de pilha em múltiplas SPMs, visando redução de energia no subsistema de memória em MPSoCs. A técnica utiliza um gerenciador totalmente em software, o qual é responsável por alocar e desalocar os dados de pilha de thread em SPM. A utilização da técnica dispensa intervenção do programador, pois as alterações necessárias no código da aplicação são realizadas por um compilador adaptado. Foram obtidos resultados experimentais através da simulação de 400 aplicações geradas aleatoriamente, assumindo-se 20 plataformas multiprocessadas, totalizando 8000 casos de uso. Os resultados mostram que, variando-se o perfil das aplicações quanto à proporção de acessos a dados de pilha, a técnica proporciona reduções de energia no subsistema de memória entre 11% e 20%, em média, para plataformas com caches L1 de 32KB, e reduções entre 14,7% e 25,9%, em média, para plataformas com caches L1 de 64KB. Para plataformas com caches L1 de menor capacidade, a redução de energia é menor pois a penalidade de faltas nas caches L1 de instruções imposta pelo gerenciador torna-se relevante
    corecore