432 research outputs found
Response-time analysis for fixed-priority systems with a write-back cache
This paper introduces analyses of write-back caches integrated into response-time analysis for fixed-priority preemptive and non-preemptive scheduling. For each scheduling paradigm, we derive four different approaches to computing the additional costs incurred due to write backs. We show the dominance relationships between these different approaches and note how they can be combined to form a single state-of-the-art approach in each case. The evaluation explores the relative performance of the different methods using a set of benchmarks, as well as making comparisons with no cache and a write-through cache. We also explore the effect of write buffers used to hide the latency of write-through caches. We show that depending upon the depth of the buffer used and the policies employed, such buffers can result in domino effects. Our evaluation shows that even ignoring domino effects, a substantial write buffer is needed to match the guaranteed performance of write-back caches
Bounding Preemption Delay within Data Cache Reference Patterns for Real-Time Tasks
Caches have become invaluable for higher-end architectures to hide, in part, the increasing gap between processor speed and memory access times. While the effect of caches on timing predictability of single real-time tasks has been the focus of much research, bounding the overhead of cache warm-ups after preemptions remains a challenging problem, particularly for data caches. In this paper, we bound the penalty of cache interference for real-time tasks by providing accurate predictions of the data cache behavior across preemptions. For every task, we derive data cache reference patterns for all scalar and non-scalar references. Partial timing of a task is performed up to a preemption point using these patterns. The effects of cache interference are then analyzed using a settheoretic approach, which identifies the number and location of additional misses due to preemption. A feedback mechanism provides the means to interact with the timing analyzer, which subsequently times another interval of a task bounded by the next preemption. Our experimental results demonstrate that it is sufficient to consider the n most expensive preemption points, where n is the maximum possible number of preemptions. Further, it is shown that such accurate modeling of data cache behavior in preemptive systems significantly improves the WCET predictions for a task. To the best of our knowledge, our work of bounding preemption delay for data caches is unprecedented
Tightening the Bounds on Feasible Preemption Points
Caches have become invaluable for higher-end architectures to hide, in part, the increasing gap between processor speed and memory access times. While the effect of caches on timing predictability of single real-time tasks has been the focus of much research, bounding the overhead of cache warm-ups after preemptions remains a challenging problem, particularly for data caches. This paper makes multiple contributions. 1) We bound the penalty of cache interference for real-time tasks by providing accurate predictions of data cache behavior across preemptions, including instruction cache and pipeline effects. We show that, when considering cache preemption, the critical instant does not occur upon simultaneous release of all tasks. 2) We develop analysis methods to calculate upper bounds on the number of possible preemption points for each job of a task. To make these bounds tight, we consider the entire range between the best-case and worst-case execution times (BCET and WCET) of higher priority jobs. The effects of cache interference are integrated into the WCET calculations by using a feedback mechanism to interact with a static timing analyzer. Significant improvements in tightening bounds of up to an order of magnitude over two prior methods and up to half a magnitude over a third prior method are obtained by experiments for (a) the number of preemptions, (b) the WCET and (c) the response time of a task. Overall, this work contributes by calculating the worst-case preemption delay under consideration of data caches
Time-predictable Chip-Multiprocessor Design
Abstract—Real-time systems need time-predictable platforms to enable static worst-case execution time (WCET) analysis. Improving the processor performance with superscalar techniques makes static WCET analysis practically impossible. However, most real-time systems are multi-threaded applications and performance can be improved by using several processor cores on a single chip. In this paper we present a time-predictable chipmultiprocessor system that aims to improve system performance while still enabling WCET analysis. The proposed chip-multiprocessor (CMP) uses a shared memory with a time-division multiple access (TDMA) based memory access scheduling. The static TDMA schedule can be integrated into the WCET analysis. Experiments with a JOP based CMP showed that the memory access starts to dominate the execution time when using more than 4 processor cores. To provide a better scalability, more local memories have to be used. We add a processor local scratchpad memory and split data caches, which are still time-predictable, to the processor cores. I
Effective And Efficient Preemption Placement For Cache Overhead Minimization In Hard Real-Time Systems
Schedulability analysis for real-time systems has been the subject of prominent research over the past several decades. One of the key foundations of schedulability analysis is an accurate worst case execution time (WCET) for each task. In preemption based real-time systems, the CRPD can represent a significant component (up to 44% as documented in research literature) of variability to overall task WCET. Several methods have been employed to calculate CRPD with significant levels of pessimism that may result in a task set erroneously declared as non-schedulable. Furthermore, they do not take into account that CRPD cost is inherently a function of where preemptions actually occur. Our approach for computing CRPD via loaded cache blocks (LCBs) is more accurate in the sense that cache state reflects which cache blocks and the specific program locations where they are reloaded. Limited preemption models attempt to minimize preemption overhead (CRPD) by reducing the number of allowed preemptions and/or allowing preemption at program locations where the CRPD effect is minimized. These algorithms rely heavily on accurate CRPD measurements or estimation models in order to identify an optimal set of preemption points. Our approach improves the effectiveness of limited optimal preemption point placement algorithms by calculating the LCBs for each pair of adjacent preemptions to more accurately model task WCET and maximize schedulability as compared to existing preemption point placement approaches. We utilize dynamic programming technique to develop an optimal preemption point placement algorithm. Lastly, we will demonstrate, using a case study, improved task set schedulability and optimal preemption point placement via our new LCB characterization. We propose a new CRPD metric, called loaded cache blocks (LCB) which accurately characterizes the CRPD a real-time task may be subjected to due to the preemptive execution of higher priority tasks. We show how to integrate our new LCB metric into our newly developed algorithms that automatically
place preemption points supporting linear control flow graphs (CFGs) for limited preemption scheduling applications. We extend the derivation of loaded cache blocks (LCB), that was proposed for linear control flow graphs (CFGs) to conditional CFGs. We show how to integrate our revised LCB metric into our newly developed algorithms that automatically place preemption points supporting conditional control flow graphs (CFGs) for limited preemption scheduling applications. For future work, we will verify the correctness of our framework through other measurable physical and hardware constraints. Also, we plan to complete our work on developing a generalized framework that can be seamlessly integrated into real-time schedulability analysis
A Survey on Cache Management Mechanisms for Real-Time Embedded Systems
© ACM, 2015. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM Computing Surveys, {48, 2, (November 2015)} http://doi.acm.org/10.1145/2830555Multicore processors are being extensively used by real-time systems, mainly because of their demand for
increased computing power. However, multicore processors have shared resources that affect the predictability
of real-time systems, which is the key to correctly estimate the worst-case execution time of tasks. One of
the main factors for unpredictability in a multicore processor is the cache memory hierarchy. Recently, many
research works have proposed different techniques to deal with caches in multicore processors in the context
of real-time systems. Nevertheless, a review and categorization of these techniques is still an open topic and
would be very useful for the real-time community. In this article, we present a survey of cache management
techniques for real-time embedded systems, from the first studies of the field in 1990 up to the latest research
published in 2014. We categorize the main research works and provide a detailed comparison in terms of
similarities and differences. We also identify key challenges and discuss future research directions.King Saud University
NSER
Fixed-Priority Memory-Centric Scheduler for COTS-Based Multiprocessors
Memory-centric scheduling attempts to guarantee temporal predictability on commercial-off-the-shelf (COTS) multiprocessor systems to exploit their high performance for real-time applications. Several solutions proposed in the real-time literature have hardware requirements that are not easily satisfied by modern COTS platforms, like hardware support for strict memory partitioning or the presence of scratchpads. However, even without said hardware support, it is possible to design an efficient memory-centric scheduler.
In this article, we design, implement, and analyze a memory-centric scheduler for deterministic memory management on COTS multiprocessor platforms without any hardware support. Our approach uses fixed-priority scheduling and proposes a global "memory preemption" scheme to boost real-time schedulability. The proposed scheduling protocol is implemented in the Jailhouse hypervisor and Erika real-time kernel. Measurements of the scheduler overhead demonstrate the applicability of the proposed approach, and schedulability experiments show a 20% gain in terms of schedulability when compared to contention-based and static fair-share approaches
Bundle: Taming The Cache And Improving Schedulability Of Multi-Threaded Hard Real-Time Systems
For hard real-time systems, schedulability of a task set is paramount. If a task set is not deemed schedulable under all conditions, the system may fail during operation and cannot be deployed in a high risk environment. Schedulability testing has typically been separated from worst-case execution time (WCET) analysis. Each task’s WCET value is calculated independently and provided as input to a schedulability test. However, a task’s WCET value is influenced by scheduling decisions and the impact of cache memory. Thus, schedulability tests have been augmented to include cache-related preemption delay (CRPD). From this classical perspective, the effect of cache memory on WCET and schedulability is always negative; increasing execution times and demand. In this work we propose a new positive perspective, where cache memory benefits multi-threaded tasks by scheduling threads in a manner that shares values predictably.
This positive perspective is reached by integrating, rather than separating the disciplines of schedulability analysis and worst-case execution time. These integrated techniques are referred to as the BUNDLE family of worst-case execution time and cache overhead
(WCETO) analysis and scheduling algorithm. WCETO calculation divides the task’s structure into conflict free regions and calculates a bound utilizing explicit understanding of the thread-level scheduling algorithm. Conflict free regions are utilized by the scheduling algorithm, which associates with each region a thread container called a bundle. At any time only one bundle may be active, and only threads of the active bundle may execute on the processor.
The BUNDLE family of scheduling algorithms developed in this work increase in scope from BUNDLE through ITCB-DAG. As the fundamental contribution, BUNDLE and BUNDLEP apply to a single multi-threaded task running on a uniprocessor architecture with a single level direct mapped instruction cache. NPM-BUNDLE expands the positive perspective to multiple tasks on a uniprocessor system. With ITCB-DAG bringing BUNDLE’s analysis and scheduling techniques to multi-processor systems.
Each of the scheduling algorithms require a novel hardware mechanism to anticipate execution and make scheduling decisions. To support anticipation of execution, a novel XFLICT interrupt is proposed. It is a simple mechanism that emulates the behavior of hardware breakpoints. An implementation of the BUNDLEP analytical techniques, scheduling algorithm, and XFLICT interrupt is available as a simulated platform for further research and extension.
Future work is planned to expand BUNDLE’s positive perspective and increase adoption. The most significant barrier to adoption is the ability to deploy BUNDLE’s scheduling algorithm, this mandates a viable and available hardware or software mechanism to anticipate
execution. NPM-BUNDLE is limited to non-preemptive multi-task scheduling and analysis, support for preemptive scheduling will increase the positive impact of BUNDLE’s integrated perspective
- …