917 research outputs found
MARACAS: a real-time multicore VCPU scheduling framework
This paper describes a multicore scheduling and load-balancing framework called MARACAS, to address shared cache and memory bus contention. It builds upon prior work centered around the concept of virtual CPU (VCPU) scheduling. Threads are associated with VCPUs that have periodically replenished time budgets. VCPUs are guaranteed to receive their periodic budgets even if they are migrated between cores. A load balancing algorithm ensures VCPUs are mapped to cores to fairly distribute surplus CPU cycles, after ensuring VCPU timing guarantees. MARACAS uses surplus cycles to throttle the execution of threads running on specific cores when memory contention exceeds a certain threshold. This enables threads on other cores to make better progress without interference from co-runners. Our scheduling framework features a novel memory-aware scheduling approach that uses performance counters to derive an average memory request latency. We show that latency-based memory throttling is more effective than rate-based memory access control in reducing bus contention. MARACAS also supports cache-aware scheduling and migration using page recoloring to improve performance isolation amongst VCPUs. Experiments show how MARACAS reduces multicore resource contention, leading to improved task progress.http://www.cs.bu.edu/fac/richwest/papers/rtss_2016.pdfAccepted manuscrip
Analysis of Dynamic Memory Bandwidth Regulation in Multi-core Real-Time Systems
One of the primary sources of unpredictability in modern multi-core embedded
systems is contention over shared memory resources, such as caches,
interconnects, and DRAM. Despite significant achievements in the design and
analysis of multi-core systems, there is a need for a theoretical framework
that can be used to reason on the worst-case behavior of real-time workload
when both processors and memory resources are subject to scheduling decisions.
In this paper, we focus our attention on dynamic allocation of main memory
bandwidth. In particular, we study how to determine the worst-case response
time of tasks spanning through a sequence of time intervals, each with a
different bandwidth-to-core assignment. We show that the response time
computation can be reduced to a maximization problem over assignment of memory
requests to different time intervals, and we provide an efficient way to solve
such problem. As a case study, we then demonstrate how our proposed analysis
can be used to improve the schedulability of Integrated Modular Avionics
systems in the presence of memory-intensive workload.Comment: Accepted for publication in the IEEE Real-Time Systems Symposium
(RTSS) 2018 conferenc
Bounding Worst-Case Data Cache Behavior by Analytically Deriving Cache Reference Patterns
While caches have become invaluable for higher-end architectures due to their ability to hide, in part, the gap between processor speed and memory access times, caches (and particularly data caches) limit the timing predictability for data accesses that may reside in memory or in cache. This is a significant problem for real-time systems. The objective our work is to provide accurate predictions of data cache behavior of scalar and non-scalar references whose reference patterns are known at compile time. Such knowledge about cache behavior provides the basis for significant improvements in bounding the worst-case execution time (WCET) of real-time programs, particularly for hard-to-analyze data caches. We exploit the power of the Cache Miss Equations (CME) framework but lift a number of limitations of traditional CME to generalize the analysis to more arbitrary programs. We further devised a transformation, coined “forced” loop fusion, which facilitates the analysis across sequential loops. Our contributions result in exact data cache reference patterns — in contrast to approximate cache miss behavior of prior work. Experimental results indicate improvements on the accuracy of worst-case data cache behavior up to two orders of magnitude over the original approach. In fact, our results closely bound and sometimes even exactly match those obtained by trace-driven simulation for worst-case inputs. The resulting WCET bounds of timing analysis confirm these findings in terms of providing tight bounds. Overall, our contributions lift analytical approaches to predict data cache behavior to a level suitable for efficient static timing analysis and, subsequently, real-time schedulability of tasks with predictable WCET
Removing Abstraction Overhead in the Composition of Hierarchical Real-Time System
The hierarchical real-time scheduling framework is a widely accepted model to facilitate the design and analysis of the increasingly complex real-time systems. Interface abstraction and composition are the key issues in the hierarchical scheduling framework analysis. Schedulability is essential to guarantee that the timing requirements of all components are satisfied. In order for the design to be resource efficient, the composition must be bandwidth optimal. Associativity is desirable for open systems in which components may be added or deleted at run time. Previous techniques on compositional scheduling are either not resource efficient in some aspects, or cannot achieve optimality and associativity at the same time. In this paper, several important properties regarding the periodic resource model are identified. Based on those properties, we propose a novel interface abstraction and composition framework which achieves schedulability, optimality, and associativity. Our approach eliminates abstraction overhead in the composition
WCET Derivation under Single Core Equivalence with Explicit Memory Budget Assignment
In the last decade there has been a steady uptrend in the popularity of embedded multi-core platforms. This represents a turning point in the theory and implementation of real-time systems. From a real-time standpoint, however, the extensive sharing of hardware resources (e.g. caches, DRAM subsystem, I/O channels) represents a major source of unpredictability. Budget-based memory regulation (throttling) has been extensively studied to enforce a strict partitioning of the DRAM subsystem’s bandwidth. The common approach to analyze a task under memory bandwidth regulation is to consider the budget of the core where the task is executing, and assume the worst-case about the remaining cores' budgets. In this work, we propose a novel analysis strategy to derive the WCET of a task under memory bandwidth regulation that takes into account the exact distribution of memory budgets to cores. In this sense, the proposed analysis represents a generalization of approaches that consider (i) even budget distribution across cores; and (ii) uneven but unknown (except for the core under analysis) budget assignment. By exploiting the additional piece of information, we show that it is possible to derive a more accurate WCET estimation. Our evaluations highlight that the proposed technique can reduce overestimation by 30% in average, and up to 60%, compared to the state of the art.Accepted manuscrip
Dynamic Memory Bandwidth Allocation for Real-Time GPU-Based SoC Platforms
Heterogeneous SoC platforms, comprising both general purpose CPUs and accelerators such as a GPU, are becoming increasingly attractive for real-time and mixed-criticality systems to cope with the computational demand of data parallel applications. However, contention for access to shared main memory can lead to significant performance degradation on both CPU and GPU. Existing work has shown that memory bandwidth throttling is effective in protecting real-time applications from memory-intensive, best-effort ones; however, due to the inherent pessimism involved in worst-case execution time estimation,
such approaches can unduly restrict the bandwidth available to best-effort applications. In this work, we propose a novel memory bandwidth allocation scheme where we dynamically monitor the progress of a real-time application and increase the bandwidth share of best-effort ones whenever it is safe to do so. Specifically, we demonstrate our approach by protecting a real-time GPU kernel from best-effort CPU tasks. Based on profiling information, we first build a worst case execution time estimation model for the GPU kernel. Using such model, we then show how to dynamically recompute on-line the maximum memory budget that can be allocated to best-effort tasks without exceeding the kernel’s assigned execution budget. We implement our proposed technique on NVIDIA embedded SoC and demonstrate its effectiveness on a variety of GPU and CPU benchmarks
Towards a Reconfiguration Service for Distributed Real-Time Java
REACTION 2012. 1st International workshop on Real-time and distributed computing in emerging applications. December 4th, 2012, San Juan, Puerto Rico.Ancient monolithic distributed systems were
attached to well-known development practices and offline
analysis. Current scenarios are more dynamic, and open,
plenty of applications and services which appear and
disappear dynamically at runtime. Likewise, these scenarios
require taking into account actions that were traditionally
addressed offline, this time in an online scenario. This paper
contributes a reconfiguration service in the context of
distributed real-time Java application as a means to include
real-time reconfiguration into next generation real-time Java
systems. The paper addresses the integration taking into
account changes required in the API and the cost of some
reconfiguration strategies.This research was partially supported by the European Commission (ARTIST2 NoE, ST-2004-004527; iLAND ARTEMIS-JU Call 1) and by the Spanish national project
REM4VSS (TIN-2011-28339)
IRQ Coloring: Mitigating Interrupt-Generated Interference on ARM Multicore Platforms
Mixed-criticality systems, which consolidate workloads with different criticalities, must comply with stringent spatial and temporal isolation requirements imposed by safety-critical standards (e.g., ISO26262). This, per se, has proven to be a challenge with the advent of multicore platforms due to the inner interference created by multiple subsystems while disputing access to shared resources. With this work, we pioneer the concept of Interrupt (IRQ) coloring as a novel mechanism to minimize the interference created by co-existing interrupt-driven workloads. The main idea consists of selectively deactivating specific ("colored") interrupts if the QoS of critical workloads (e.g., Virtual Machines) drops below a well-defined threshold. The IRQ Coloring approach encompasses two artifacts, i.e., the IRQ Coloring Design-Time Tool (IRQ DTT) and the IRQ Coloring Run-Time Mechanism (IRQ RTM). In this paper, we focus on presenting the conceptual IRQ coloring design, describing the first prototype of the IRQ RTM on Bao hypervisor, and providing initial evidence about the effectiveness of the proposed approach on a synthetic use case
- …