276,865 research outputs found

    Implementing time-predictable load and store operations

    Full text link
    Scratchpads have been widely proposed as an alternative to caches for embedded systems. Advantages of scratchpads in-clude reduced energy consumption in comparison to a cache and access latencies that are independent of the preceding memory access pattern. The latter property makes memory accesses time-predictable, which is useful for hard real-time tasks as the worst-case execution time (WCET) must be safely estimated in order to check that the system will meet timing requirements. However, data must be explicitly moved between scratch-pad and external memory as a task executes in order to make best use of the limited scratchpad space. When dy-namic data is moved, issues such as pointer aliasing and pointer invalidation become problematic. Previous work has proposed solutions that are not suitable for hard real-time tasks because memory accesses are not time-predictable. This paper proposes the scratchpad memory management unit (SMMU) as an enhancement to scratchpad technol-ogy. The SMMU implements an alternative solution to the pointer aliasing and pointer invalidation problems which (1) does not require whole-program pointer analysis and (2) makes every memory access operation time-predictable. This allows WCET analysis to be applied to hard-real time tasks which use a scratchpad and dynamic data, but results are also applicable in the wider context of minimizing en-ergy consumption or average execution time. Experiments using C software show that the combination of an SMMU and scratchpad compares favorably with the best and worst case performance of a conventional data cache

    Power-Adaptive Computing System Design for Solar-Energy-Powered Embedded Systems

    Get PDF

    Macroservers: An Execution Model for DRAM Processor-In-Memory Arrays

    Get PDF
    The emergence of semiconductor fabrication technology allowing a tight coupling between high-density DRAM and CMOS logic on the same chip has led to the important new class of Processor-In-Memory (PIM) architectures. Newer developments provide powerful parallel processing capabilities on the chip, exploiting the facility to load wide words in single memory accesses and supporting complex address manipulations in the memory. Furthermore, large arrays of PIMs can be arranged into a massively parallel architecture. In this report, we describe an object-based programming model based on the notion of a macroserver. Macroservers encapsulate a set of variables and methods; threads, spawned by the activation of methods, operate asynchronously on the variables' state space. Data distributions provide a mechanism for mapping large data structures across the memory region of a macroserver, while work distributions allow explicit control of bindings between threads and data. Both data and work distributuions are first-class objects of the model, supporting the dynamic management of data and threads in memory. This offers the flexibility required for fully exploiting the processing power and memory bandwidth of a PIM array, in particular for irregular and adaptive applications. Thread synchronization is based on atomic methods, condition variables, and futures. A special type of lightweight macroserver allows the formulation of flexible scheduling strategies for the access to resources, using a monitor-like mechanism

    The design and implementation of a multimedia storage server tosupport video-on-demand applications

    Get PDF
    In this paper we present the design and implementation of a client/server based multimedia architecture for supporting video-on-demand applications. We describe in detail the software architecture of the implementation along with the adopted buffering mechanism. The proposed multithreaded architecture obtains, on one hand, a high degree of parallelism at the server side, allowing both the disk controller and the network card controller work in parallel. On the other hand; at the client side, it achieves the synchronized playback of the video stream at its precise rate, decoupling this process from the reception of data through the network. Additionally, we have derived, under an engineering perspective, some services that a real-time operating system should offer to satisfy the requirements found in video-on-demand applications.This research has been supported by the Regional Research Plan of the Autonomus Community of Madrid under an F.P.I. research grant.Publicad

    The "MIND" Scalable PIM Architecture

    Get PDF
    MIND (Memory, Intelligence, and Network Device) is an advanced parallel computer architecture for high performance computing and scalable embedded processing. It is a Processor-in-Memory (PIM) architecture integrating both DRAM bit cells and CMOS logic devices on the same silicon die. MIND is multicore with multiple memory/processor nodes on each chip and supports global shared memory across systems of MIND components. MIND is distinguished from other PIM architectures in that it incorporates mechanisms for efficient support of a global parallel execution model based on the semantics of message-driven multithreaded split-transaction processing. MIND is designed to operate either in conjunction with other conventional microprocessors or in standalone arrays of like devices. It also incorporates mechanisms for fault tolerance, real time execution, and active power management. This paper describes the major elements and operational methods of the MIND architecture

    DeSyRe: on-Demand System Reliability

    No full text
    The DeSyRe project builds on-demand adaptive and reliable Systems-on-Chips (SoCs). As fabrication technology scales down, chips are becoming less reliable, thereby incurring increased power and performance costs for fault tolerance. To make matters worse, power density is becoming a significant limiting factor in SoC design, in general. In the face of such changes in the technological landscape, current solutions for fault tolerance are expected to introduce excessive overheads in future systems. Moreover, attempting to design and manufacture a totally defect and fault-free system, would impact heavily, even prohibitively, the design, manufacturing, and testing costs, as well as the system performance and power consumption. In this context, DeSyRe delivers a new generation of systems that are reliable by design at well-balanced power, performance, and design costs. In our attempt to reduce the overheads of fault-tolerance, only a small fraction of the chip is built to be fault-free. This fault-free part is then employed to manage the remaining fault-prone resources of the SoC. The DeSyRe framework is applied to two medical systems with high safety requirements (measured using the IEC 61508 functional safety standard) and tight power and performance constraints
    • 

    corecore