28 research outputs found

    Predictable Implementation of Real-Time Applications on Multiprocessor Systems on Chip

    Get PDF
    Worst-case execution time (WCET) analysis and, in general, the predictability of real-time applications implemented on multiprocessor systems has been addressed only in very restrictive and particular contexts. One important aspect that makes the analysis difficult is the estimation of the system\u27s communication behavior. The traffic on the bus does not solely originate from data transfers due to data dependencies between tasks, but is also affected by memory transfers as result of cache misses. As opposed to the analysis performed for a single processor system, where the cache miss penalty is constant, in a multiprocessor system each cache miss has a variable penalty, depending on the bus contention. This affects the tasks\u27 WCET which, however, is needed in order to perform system scheduling. At the same time, the WCET depends on the system schedule due to the bus interference. In this context, we present an approach to worst-case execution time analysis and system scheduling for real-time applications implemented on multiprocessor SoC architectures. We will also address the bus scheduling policy and its optimization, which are of huge importance for the performance of such predictable multiprocessor applications

    An analyzable memory controller for hard real-time CMPs

    Get PDF
    Multicore processors (CMPs) represent a good solution to provide the performance required by current and future hard real-time systems. However, it is difficult to compute a tight WCET estimation for CMPs due to interferences that tasks suffer when accessing shared hardware resources.We propose an analyzable JEDEC-compliant DDRx SDRAM memory controller (AMC) for hard real-time CMPs, that reduces the impact of memory interferences caused by other tasks on WCET estimation, providing a predictable memory access time and allowing the computation of tight WCET estimations.Peer ReviewedPostprint (published version

    A Time-predictable Memory Network-on-Chip

    Get PDF
    To derive safe bounds on worst-case execution times (WCETs), all components of a computer system need to be time-predictable: the processor pipeline, the caches, the memory controller, and memory arbitration on a multicore processor. This paper presents a solution for time-predictable memory arbitration and access for chip-multiprocessors. The memory network-on-chip is organized as a tree with time-division multiplexing (TDM) of accesses to the shared memory. The TDM based arbitration completely decouples processor cores and allows WCET analysis of the memory accesses on individual cores without considering the tasks on the other cores. Furthermore, we perform local, distributed arbitration according to the global TDM schedule. This solution avoids a central arbiter and scales to a large number of processors

    Contention-aware performance monitoring counter support for real-time MPSoCs

    Get PDF
    Tasks running in MPSoCs experience contention delays when accessing MPSoC’s shared resources, complicating task timing analysis and deriving execution time bounds. Understanding the Actual Contention Delay (ACD) each task suffers due to other corunning tasks, and the particular hardware shared resources in which contention occurs, is of prominent importance to increase confidence on derived execution time bounds of tasks. And, whenever those bounds are violated, ACD provides information on the reasons for overruns. Unfortunately, existing MPSoC designs considered in real-time domains offer limited hardware support to measure tasks’ ACD losing all these potential benefits. In this paper we propose the Contention Cycle Stack (CCS), a mechanism that extends performance monitoring counters to track specific events that allow estimating the ACD that each task suffers from every contending task on every hardware shared resource. We build the CCS using a set of specialized low-overhead Performance Monitoring Counters for the Cobham Gaisler GR740 (NGMP) MPSoC – used in the space domain – for which we show CCS’s benefits.The research leading to these results has received funding from the European Space Agency under contracts 4000109680, 4000110157 and NPI 4000102880, and the Ministry of Science and Technology of Spain under contract TIN-2015-65316-P. Jaume Abella has been partially supported by the Ministry of Economy and Competitiveness under Ramon y Cajal postdoctoral fellowship number RYC-2013-14717.Peer ReviewedPostprint (author's final draft

    Bounding memory interference delay in COTS-based multi-core systems

    Get PDF

    Composable Virtual Memory for an Embedded SoC

    Full text link
    Systems on a Chip concurrently execute multiple applications that may start and stop at run-time, creating many use-cases. Composability reduces the verifcation effort, by making the functional and temporal behaviours of an application independent of other applications. Existing approaches link applications to static address ranges that cannot be reused between applications that are not simultaneously active, wasting resources. In this paper we propose a composable virtual memory scheme that enables dynamic binding and relocation of applications. Our virtual memory is also predictable, for applications with real-time constraints. We integrated the virtual memory on, CompSOC, an existing composable SoC prototyped in FPGA. The implementation indicates that virtual memory is in general expensive, because it incurs a performance loss around 39% due to address translation latency. On top of this, composability adds to virtual memory an insigni cant extra performance penalty, below 1%

    Reliable Performance Analysis of a Multicore Multithreaded System-On-Chip (with Appendix)

    Get PDF
    Formal performance analysis is now regularly applied in the design of distributed embedded systems such as automotive electronics, where it greatly contributes to an improved predictability and platform robustness of complex networked systems. Even though it might be highly beneficial also in MpSoC design, formal performance analysis could not easily be applied so far, because the classical task communication model does not cover processor-memory traffic, which is an integral part of MpSoC timing. Introducing memory accesses as individual transactions under the classical model has shown to be inefficient, and previous approaches work well only under strict orthogonalization of different traffic streams. Recent research has presented extensions of the classical task model and a corresponding analysis that covers performance implications of shared memory traffic. In this paper we present a multithreaded multiprocessors platform and multimedia application. We conduct performance analysis using the new analysis options and specifically benchmark the quality of the available approach. Our experiments show that corner case coverage can now be supplied with a very high accuracy, allowing to quickly investigate architectural alternatives

    Estimation of Cache Related Migration Delays for Multi-Core Processors with Shared Instruction Caches

    Get PDF
    International audienceMulti-core architectures, which have multiple processors on a single chip, have been adopted by most chip manufacturers. In most such architectures, the different cores have private caches and also shared on-chip caches. For real-time systems to exploit multi-core architectures, it is required to obtain both tight and safe estimations of a number of metrics required to validate the system temporal behaviour in all situations, including the worst-case: tasks worst-case execution times (WCET), preemption delays and migration delays. Estimating such metrics is very challenging because of the possible interferences between cores due to shared hardware resources such as shared caches, memory bus, etc. In this paper, we propose a new method to estimate worst-case cache reload cost due to a task migration between cores. Safe estimations of the so-called Cache- Related Migration Delay (CRMD) are obtained through static code analysis. Experimental results demonstrate the practicality of our approach by comparing predicted worstcase CRMDs with those obtained by a naive approach. To the best of our knowledge, our method is the first one to provide safe upper bounds of cache-related migration delays in multi-core architectures with shared instruction cache

    Adapting TDMA arbitration for measurement-based probabilistic timing analysis

    Get PDF
    Critical Real-Time Embedded Systems require functional and timing validation to prove that they will perform their functionalities correctly and in time. For timing validation, a bound to the Worst-Case Execution Time (WCET) for each task is derived and passed as an input to the scheduling algorithm to ensure that tasks execute timely. Bounds to WCET can be derived with deterministic timing analysis (DTA) and probabilistic timing analysis (PTA), each of which relies upon certain predictability properties coming from the hardware/software platform beneath. In particular, specific hardware designs are needed for both DTA and PTA, which challenges their adoption by hardware vendors. This paper makes a step towards reconciling the hardware needs of DTA and PTA timing analyses to increase the likelihood of those hardware designs to be adopted by hardware vendors. In particular, we show how Time Division Multiple Access (TDMA), which has been regarded as one of the main DTA-compliant arbitration policies, can be used in the context of PTA and, in particular, of the industrially-friendly Measurement-Based PTA (MBPTA). We show how the execution time measurements taken as input for MBPTA need to be padded to obtain reliable and tight WCET estimates on top of TDMA-arbitrated hardware resources with no further hardware support. Our results show that TDMA delivers tighter WCET estimates than MBPTA-friendly arbitration policies, whereas MBPTA-friendly policies provide higher average performance. Thus, the best policy to choose depends on the particular needs of the end user.The research leading to these results has been funded by the EU FP7 under grant agreement no. 611085 (PROXIMA) and 287519 (parMERASA). This work has also been partially supported by the Spanish Ministry of Economy and Competitiveness (MINECO) under grant TIN2015-65316-P and the HiPEAC Network of Excellence. Miloˇs Pani´c is funded by the Spanish Ministry of Education under the FPU grant FPU12/05966. Jaume Abella has been partially supported by the MINECO under Ramon y Cajal postdoctoral fellowship number RYC-2013-14717.Peer ReviewedPostprint (author's final draft

    Real-time scheduling with resource sharing on heterogeneous multiprocessors

    Get PDF
    Consider the problem of scheduling a task set τ of implicit-deadline sporadic tasks to meet all deadlines on a t-type heterogeneous multiprocessor platform where tasks may access multiple shared resources. The multiprocessor platform has m k processors of type-k, where k∈{1,2,…,t}. The execution time of a task depends on the type of processor on which it executes. The set of shared resources is denoted by R. For each task τ i , there is a resource set R i ⊆R such that for each job of τ i , during one phase of its execution, the job requests to hold the resource set R i exclusively with the interpretation that (i) the job makes a single request to hold all the resources in the resource set R i and (ii) at all times, when a job of τ i holds R i , no other job holds any resource in R i . Each job of task τ i may request the resource set R i at most once during its execution. A job is allowed to migrate when it requests a resource set and when it releases the resource set but a job is not allowed to migrate at other times. Our goal is to design a scheduling algorithm for this problem and prove its performance. We propose an algorithm, LP-EE-vpr, which offers the guarantee that if an implicit-deadline sporadic task set is schedulable on a t-type heterogeneous multiprocessor platform by an optimal scheduling algorithm that allows a job to migrate only when it requests or releases a resource set, then our algorithm also meets the deadlines with the same restriction on job migration, if given processors 4×(1+MAXP×⌈|P|×MAXPmin{m1,m2,…,mt}⌉) times as fast. (Here MAXP and |P| are computed based on the resource sets that tasks request.) For the special case that each task requests at most one resource, the bound of LP-EE-vpr collapses to 4×(1+⌈|R|min{m1,m2,…,mt}⌉). To the best of our knowledge, LP-EE-vpr is the first algorithm with proven performance guarantee for real-time scheduling of sporadic tasks with resource sharing on t-type heterogeneous multiprocessors