765 research outputs found

    Performance and Memory Space Optimizations for Embedded Systems

    Get PDF
    Embedded systems have three common principles: real-time performance, low power consumption, and low price (limited hardware). Embedded computers use chip multiprocessors (CMPs) to meet these expectations. However, one of the major problems is lack of efficient software support for CMPs; in particular, automated code parallelizers are needed. The aim of this study is to explore various ways to increase performance, as well as reducing resource usage and energy consumption for embedded systems. We use code restructuring, loop scheduling, data transformation, code and data placement, and scratch-pad memory (SPM) management as our tools in different embedded system scenarios. The majority of our work is focused on loop scheduling. Main contributions of our work are: We propose a memory saving strategy that exploits the value locality in array data by storing arrays in a compressed form. Based on the compressed forms of the input arrays, our approach automatically determines the compressed forms of the output arrays and also automatically restructures the code. We propose and evaluate a compiler-directed code scheduling scheme, which considers both parallelism and data locality. It analyzes the code using a locality parallelism graph representation, and assigns the nodes of this graph to processors.We also introduce an Integer Linear Programming based formulation of the scheduling problem. We propose a compiler-based SPM conscious loop scheduling strategy for array/loop based embedded applications. The method is to distribute loop iterations across parallel processors in an SPM-conscious manner. The compiler identifies potential SPM hits and misses, and distributes loop iterations such that the processors have close execution times. We present an SPM management technique using Markov chain based data access. We propose a compiler directed integrated code and data placement scheme for 2-D mesh based CMP architectures. Using a Code-Data Affinity Graph (CDAG) to represent the relationship between loop iterations and array data, it assigns the sets of loop iterations to processing cores and sets of data blocks to on-chip memories. We present a memory bank aware dynamic loop scheduling scheme for array intensive applications.The goal is to minimize the number of memory banks needed for executing the group of loop iterations

    Library Cache Coherence

    Get PDF
    Directory-based cache coherence is a popular mechanism for chip multiprocessors and multicores. The directory protocol, however, requires multicast for invalidation messages and the collection of acknowledgement messages, which can be expensive in terms of latency and network traffic. Furthermore, the size of the directory increases with the number of cores. We present Library Cache Coherence (LCC), which requires neither broadcast/multicast for invalidations nor waiting for invalidation acknowledgements. A library is a set of timestamps that are used to auto-invalidate shared cache lines, and delay writes on the lines until all shared copies expire. The size of library is independent of the number of cores. By removing the complex invalidation process of directory-based cache coherence protocols, LCC generates fewer network messages. At the same time, LCC also allows reads on a cache block to take place while a write to the block is being delayed, without breaking sequential consistency. As a result, LCC has 1.85X less average memory latency than a MESI directory-based protocol on our set of benchmarks, even with a simple timestamp choosing algorithm; moreover, our experimental results on LCC with an ideal timestamp scheme (though not implementable) show the potential of further improvement for LCC with more sophisticated timestamp schemes

    A survey of checkpointing algorithms for parallel and distributed computers

    Get PDF
    Checkpoint is defined as a designated place in a program at which normal processing is interrupted specifically to preserve the status information necessary to allow resumption of processing at a later time. Checkpointing is the process of saving the status information. This paper surveys the algorithms which have been reported in the literature for checkpointing parallel/distributed systems. It has been observed that most of the algorithms published for checkpointing in message passing systems are based on the seminal article by Chandy and Lamport. A large number of articles have been published in this area by relaxing the assumptions made in this paper and by extending it to minimise the overheads of coordination and context saving. Checkpointing for shared memory systems primarily extend cache coherence protocols to maintain a consistent memory. All of them assume that the main memory is safe for storing the context. Recently algorithms have been published for distributed shared memory systems, which extend the cache coherence protocols used in shared memory systems. They however also include methods for storing the status of distributed memory in stable storage. Most of the algorithms assume that there is no knowledge about the programs being executed. It is however felt that in development of parallel programs the user has to do a fair amount of work in distributing tasks and this information can be effectively used to simplify checkpointing and rollback recovery

    Energy Saving Techniques for Phase Change Memory (PCM)

    Full text link
    In recent years, the energy consumption of computing systems has increased and a large fraction of this energy is consumed in main memory. Towards this, researchers have proposed use of non-volatile memory, such as phase change memory (PCM), which has low read latency and power; and nearly zero leakage power. However, the write latency and power of PCM are very high and this, along with limited write endurance of PCM present significant challenges in enabling wide-spread adoption of PCM. To address this, several architecture-level techniques have been proposed. In this report, we review several techniques to manage power consumption of PCM. We also classify these techniques based on their characteristics to provide insights into them. The aim of this work is encourage researchers to propose even better techniques for improving energy efficiency of PCM based main memory.Comment: Survey, phase change RAM (PCRAM

    A two-level directory architecture for highly scalable cc-NUMA multiprocessors

    Full text link
    • …
    corecore