284 research outputs found

    Maximize the throughput of Linux system by using user-oriented Scratch-pad Memory

    Get PDF

    Variable-based multi-module data caches for clustered VLIW processors

    Get PDF
    Memory structures consume an important fraction of the total processor energy. One solution to reduce the energy consumed by cache memories consists of reducing their supply voltage and/or increase their threshold voltage at an expense in access time. We propose to divide the L1 data cache into two cache modules for a clustered VLIW processor consisting of two clusters. Such division is done on a variable basis so that the address of a datum determines its location. Each cache module is assigned to a cluster and can be set up as a fast power-hungry module or as a slow power-aware module. We also present compiler techniques in order to distribute variables between the two cache modules and generate code accordingly. We have explored several cache configurations using the Mediabench suite and we have observed that the best distributed cache organization outperforms traditional cache organizations by 19%-31% in energy-delay and by 11%-29% in energy-delay. In addition, we also explore a reconfigurable distributed cache, where the cache can be reconfigured on a context switch. This reconfigurable scheme further outperforms the best previous distributed organization by 3%-4%.Peer ReviewedPostprint (published version

    Compiler and Runtime for Memory Management on Software Managed Manycore Processors

    Get PDF
    abstract: We are expecting hundreds of cores per chip in the near future. However, scaling the memory architecture in manycore architectures becomes a major challenge. Cache coherence provides a single image of memory at any time in execution to all the cores, yet coherent cache architectures are believed will not scale to hundreds and thousands of cores. In addition, caches and coherence logic already take 20-50% of the total power consumption of the processor and 30-60% of die area. Therefore, a more scalable architecture is needed for manycore architectures. Software Managed Manycore (SMM) architectures emerge as a solution. They have scalable memory design in which each core has direct access to only its local scratchpad memory, and any data transfers to/from other memories must be done explicitly in the application using Direct Memory Access (DMA) commands. Lack of automatic memory management in the hardware makes such architectures extremely power-efficient, but they also become difficult to program. If the code/data of the task mapped onto a core cannot fit in the local scratchpad memory, then DMA calls must be added to bring in the code/data before it is required, and it may need to be evicted after its use. However, doing this adds a lot of complexity to the programmer's job. Now programmers must worry about data management, on top of worrying about the functional correctness of the program - which is already quite complex. This dissertation presents a comprehensive compiler and runtime integration to automatically manage the code and data of each task in the limited local memory of the core. We firstly developed a Complete Circular Stack Management. It manages stack frames between the local memory and the main memory, and addresses the stack pointer problem as well. Though it works, we found we could further optimize the management for most cases. Thus a Smart Stack Data Management (SSDM) is provided. In this work, we formulate the stack data management problem and propose a greedy algorithm for the same. Later on, we propose a general cost estimation algorithm, based on which CMSM heuristic for code mapping problem is developed. Finally, heap data is dynamic in nature and therefore it is hard to manage it. We provide two schemes to manage unlimited amount of heap data in constant sized region in the local memory. In addition to those separate schemes for different kinds of data, we also provide a memory partition methodology.Dissertation/ThesisPh.D. Computer Science 201

    Energy Saving Techniques for Phase Change Memory (PCM)

    Full text link
    In recent years, the energy consumption of computing systems has increased and a large fraction of this energy is consumed in main memory. Towards this, researchers have proposed use of non-volatile memory, such as phase change memory (PCM), which has low read latency and power; and nearly zero leakage power. However, the write latency and power of PCM are very high and this, along with limited write endurance of PCM present significant challenges in enabling wide-spread adoption of PCM. To address this, several architecture-level techniques have been proposed. In this report, we review several techniques to manage power consumption of PCM. We also classify these techniques based on their characteristics to provide insights into them. The aim of this work is encourage researchers to propose even better techniques for improving energy efficiency of PCM based main memory.Comment: Survey, phase change RAM (PCRAM

    Dynamic code mapping for limited local memory systems

    Full text link
    Abstract—This paper presents heuristics for dynamic man-agement of application code on limited local memories present in high-performance multi-core processors. Previous techniques formulate the problem using call graphs, which do not capture the temporal ordering of functions. In addition, they only use a conservative estimate of the interference cost between functions to obtain a mapping. As a result previous techniques are unable to achieve efficient code mapping. Techniques proposed in this paper overcome both these limitations and achieve superior code mapping. Experimental results from executing benchmarks from MiBench onto the Cell processor in the Sony Playstation 3 demonstrate upto 29 % and average 12 % performance improve-ment, at tolerable compile-time overhead. I

    Management of Scratchpad Memory Using Programming Techniques

    Get PDF
    Consuming the conventional approaches, processors are incapable to achieve effective energy reduction. In upcoming processors on-chip memory system will be the major restriction. On-chip memories are managed by the software SMCs (Software Managed Chips), and are work with caches (on-chip), where inside a block of caches software can explicitly read as well as write specific or complete memory references, or work separately just like scratchpad memory. In embedded systems Scratch memory is generally used as an addition to caches or as a substitute of cache, but due to their comprehensive ease of programmability cache containing architectures are still to be chosen in numerous applications. In contrast to conventional caches in embedded schemes because of their better energy and silicon range effectiveness SPM (Scratch-Pad Memories) are being progressively used. Power consumption of ported applications can significantly be lower as well as portability of scratchpad architectures will be advanced with the language agnostic software management method which is suggested in this manuscript. To enhance the memory configuration and optimization on relevant architectures based on SPM, the variety of current methods are reviewed for finding the chances of optimizations and usage of new methods as well as their applicability to numerous schemes of memory management are also discussed in this paper

    A composable, energy-managed, real-time MPSOC platform.

    Get PDF
    Multi-processors systems on chip (MPSOC) platforms emerged in embedded systems as hardware solutions to support the continuously increasing functionality and performance demands in this domain. Such a platform has to execute a mix of applications with diverse performance and timing constraints, i.e., real-time or non-real-time, thus different application schedulers should co-exist on an MPSOC. Moreover, applications share many MPSOC resources, thus their timing depends on the arbitration at these resources. Arbitration may create inter-application dependencies, e.g., the timing of a low priority application depends on the timing of all higher priority ones. Application inter-dependencies make the functional and timing verification and the integration process harder. This is especially problematic for real-time applications, for which fulfilling the time-related constraints should be guaranteed by construction. Moreover, energy and power management, commonly employed in embedded systems, make this verification even more difficult. Typically, energy and power management involves scaling the resources operating point, which has a direct impact on the resource performance, thus influences the application time behaviour. Finally, a small change in one application leads to the need to re-verify all other applications, incurring a large effort. Composability is a property meant to ease the verification and integration process. A system is composable if the functionality and the timing behaviour of each application is independent of other applications mapped on the same platform. Composability is achieved by utilising arbiters that ensure applications independence. In this paper we present the concepts behind a composable, scalable, energy-managed MPSOC platform, able to support different real-time and nonreal time schedulers concurrently, and discuss its advantages and limitations
    • …
    corecore