11,443 research outputs found

    Parallelism-Aware Memory Interference Delay Analysis for COTS Multicore Systems

    Full text link
    In modern Commercial Off-The-Shelf (COTS) multicore systems, each core can generate many parallel memory requests at a time. The processing of these parallel requests in the DRAM controller greatly affects the memory interference delay experienced by running tasks on the platform. In this paper, we model a modern COTS multicore system which has a nonblocking last-level cache (LLC) and a DRAM controller that prioritizes reads over writes. To minimize interference, we focus on LLC and DRAM bank partitioned systems. Based on the model, we propose an analysis that computes a safe upper bound for the worst-case memory interference delay. We validated our analysis on a real COTS multicore platform with a set of carefully designed synthetic benchmarks as well as SPEC2006 benchmarks. Evaluation results show that our analysis is more accurately capture the worst-case memory interference delay and provides safer upper bounds compared to a recently proposed analysis which significantly under-estimate the delay.Comment: Technical Repor

    CAN Fieldbus Communication in the CSP-based CT Library

    Get PDF
    In closed-loop control systems several realworld entities are simultaneously communicated to through a multitude of spatially distributed sensors and actuators. This intrinsic parallelism and complexity motivates implementing control software in the form of concurrent processes deployed on distributed hardware architectures. A CSP based occam-like architecture seems to be the most convenient for such a purpose. Many, often conflicting, requirements make design and implementation of distributed real-time control systems an extremely difficult task. The scope of this paper is limited to achieving safe and real-time communication over a CAN fieldbus for an\ud existing CSP-based framework

    Energy-efficient and high-performance lock speculation hardware for embedded multicore systems

    Full text link
    Embedded systems are becoming increasingly common in everyday life and like their general-purpose counterparts, they have shifted towards shared memory multicore architectures. However, they are much more resource constrained, and as they often run on batteries, energy efficiency becomes critically important. In such systems, achieving high concurrency is a key demand for delivering satisfactory performance at low energy cost. In order to achieve this high concurrency, consistency across the shared memory hierarchy must be accomplished in a cost-effective manner in terms of performance, energy, and implementation complexity. In this article, we propose Embedded-Spec, a hardware solution for supporting transparent lock speculation, without the requirement for special supporting instructions. Using this approach, we evaluate the energy consumption and performance of a suite of benchmarks, exploring a range of contention management and retry policies. We conclude that for resource-constrained platforms, lock speculation can provide real benefits in terms of improved concurrency and energy efficiency, as long as the underlying hardware support is carefully configured.This work is supported in part by NSF under Grants CCF-0903384, CCF-0903295, CNS-1319495, and CNS-1319095 as well the Semiconductor Research Corporation under grant number 1983.001. (CCF-0903384 - NSF; CCF-0903295 - NSF; CNS-1319495 - NSF; CNS-1319095 - NSF; 1983.001 - Semiconductor Research Corporation

    Run-time implementation issues for real-time embedded Ada

    Get PDF
    A motivating factor in the development of Ada as the department of defense standard language was the high cost of embedded system software development. It was with embedded system requirements in mind that many of the features of the language were incorporated. Yet it is the designers of embedded systems that seem to comprise the majority of the Ada community dissatisfied with the language. There are a variety of reasons for this dissatisfaction, but many seem to be related in some way to the Ada run-time support system. Some of the areas in which the inconsistencies were found to have the greatest impact on performance from the standpoint of real-time systems are presented. In particular, a large part of the duties of the tasking supervisor are subject to the design decisions of the implementer. These include scheduling, rendezvous, delay processing, and task activation and termination. Some of the more general issues presented include time and space efficiencies, generic expansions, memory management, pragmas, and tracing features. As validated compilers become available for bare computer targets, it is important for a designer to be aware that, at least for many real-time issues, all validated Ada compilers are not created equal

    A hardware scheduler based on task queues for FPGA-based embedded real-time systems

    Get PDF
    A hardware scheduler is developed to improve real-time performance of soft-core processor based computing systems. A hardware scheduler typically accelerates system performance at the cost of increased hardware resources, inflexibility and integration difficulty. However, the reprogrammability of FPGA-based systems removes the problems of inflexibility and integration difficulty. This paper introduces a new task-queue architecture to better support practical task controls and maintain good resource scaling. The scheduler can be configured to support various algorithms such as time sliced priority scheduling, Earliest Deadline First and Least Slack Time. The hardware scheduler reduces scheduling overhead by more than 1,000 clock cycles and raises the system utilization bound by a maximum 19.2 percent. Scheduling jitter is reduced from hundreds of clock cycles in software to just two or three cycles for most operations. The additional resource cost is no more than 17 percent of a typical softcore system for a small scale embedded application

    Performance analysis of a hardware accelerator of dependence management for taskbased dataflow programming models

    Get PDF
    Along with the popularity of multicore and manycore, task-based dataflow programming models obtain great attention for being able to extract high parallelism from applications without exposing the complexity to programmers. One of these pioneers is the OpenMP Superscalar (OmpSs). By implementing dynamic task dependence analysis, dataflow scheduling and out-of-order execution in runtime, OmpSs achieves high performance using coarse and medium granularity tasks. In theory, for the same application, the more parallel tasks can be exposed, the higher possible speedup can be achieved. Yet this factor is limited by task granularity, up to a point where the runtime overhead outweighs the performance increase and slows down the application. To overcome this handicap, Picos was proposed to support task-based dataflow programming models like OmpSs as a fast hardware accelerator for fine-grained task and dependence management, and a simulator was developed to perform design space exploration. This paper presents the very first functional hardware prototype inspired by Picos. An embedded system based on a Zynq 7000 All-Programmable SoC is developed to study its capabilities and possible bottlenecks. Initial scalability and hardware consumption studies of different Picos designs are performed to find the one with the highest performance and lowest hardware cost. A further thorough performance study is employed on both the prototype with the most balanced configuration and the OmpSs software-only alternative. Results show that our OmpSs runtime hardware support significantly outperforms the software-only implementation currently available in the runtime system for finegrained tasks.This work is supported by the Spanish Government through Programa Severo Ochoa (SEV-2015-0493), by the Spanish Ministry of Science and Technology through TIN2015-65316-P project, by the Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272) and by the European Research Council RoMoL Grant Agreement number 321253. We also thank the Xilinx University Program for its hardware and software donations.Peer ReviewedPostprint (published version

    Running real time distributed simulations under Linux and CERTI

    Get PDF
    This paper presents some experiments and some results to enforce real time distributed simulations in accordance with the High Level Architecture (HLA). Simulations were run by using CERTI, an open source middleware, as the Run Time Infrastructure (RTI). Models were distributed over computers under various available versions of the 2.6 Linux kernel. Studies and experiments relied on a real case study. The chosen case study was the simulation of an "in formation" flight of observation satellites. This case study brings up some real applicative needs in real time distributed simulations and real configurations of simulators and models. Two simulations of "in formation" flight of satellites were studied. The study consisted in modeling the behaviour of the simulators and in running these models by using various kernel or middleware operating mechanisms and services. Time measurements were performed at each test giving some results on the ability of the simulation to meet its real time requirements
    • 

    corecore