329 research outputs found
The potential of programmable logic in the middle: cache bleaching
Consolidating hard real-time systems onto modern multi-core Systems-on-Chip (SoC) is an open challenge. The extensive sharing of hardware resources at the memory hierarchy raises important unpredictability concerns. The problem is exacerbated as more computationally demanding workload is expected to be handled with real-time guarantees in next-generation Cyber-Physical Systems (CPS). A large body of works has approached the problem by proposing novel hardware re-designs, and by proposing software-only solutions to mitigate performance interference. Strong from the observation that unpredictability arises from a lack of fine-grained control over the behavior of shared hardware components, we outline a promising new resource management approach. We demonstrate that it is possible to introduce Programmable Logic In-the-Middle (PLIM) between a traditional multi-core processor and main memory. This provides the unique capability of manipulating individual memory transactions. We propose a proof-of-concept system implementation of PLIM modules on a commercial multi-core SoC. The PLIM approach is then leveraged to solve long-standing issues with cache coloring. Thanks to PLIM, colored sparse addresses can be re-compacted in main memory. This is the base principle behind the technique we call Cache Bleaching. We evaluate our design on real applications and propose hypervisor-level adaptations to showcase the potential of the PLIM approach.Accepted manuscrip
Parallelism-Aware Memory Interference Delay Analysis for COTS Multicore Systems
In modern Commercial Off-The-Shelf (COTS) multicore systems, each core can
generate many parallel memory requests at a time. The processing of these
parallel requests in the DRAM controller greatly affects the memory
interference delay experienced by running tasks on the platform. In this paper,
we model a modern COTS multicore system which has a nonblocking last-level
cache (LLC) and a DRAM controller that prioritizes reads over writes. To
minimize interference, we focus on LLC and DRAM bank partitioned systems. Based
on the model, we propose an analysis that computes a safe upper bound for the
worst-case memory interference delay. We validated our analysis on a real COTS
multicore platform with a set of carefully designed synthetic benchmarks as
well as SPEC2006 benchmarks. Evaluation results show that our analysis is more
accurately capture the worst-case memory interference delay and provides safer
upper bounds compared to a recently proposed analysis which significantly
under-estimate the delay.Comment: Technical Repor
Governing with Insights: Towards Profile-Driven Cache Management of Black-Box Applications
There exists a divide between the ever-increasing demand for high-performance embedded systems and the availability of practical methodologies to understand the interplay of complex data-intensive applications with hardware memory resources. On the one hand, traditional static analysis approaches are seldomly applicable to latest-generation multi-core platforms due to a lack of accurate micro-architectural models. On the other hand, measurement-based methods only provide coarse-grained information about the end-to-end execution of a given real-time application.
In this paper, we describe a novel methodology, namely Black-Box Profiling (BBProf), to gather fine-grained insights on the usage of cache resources in applications of realistic complexity. The goal of our technique is to extract the relative importance of individual memory pages towards the overall temporal behavior of a target application. Importantly, BBProf does not require the semantics of the target application to be known - i.e., applications are treated as black-boxes - and it does not rely on any platform-specific hardware support. We provide an open-source full-system implementation and showcase how BBProf can be used to perform profile-driven cache management
Real-time systems on multicore platforms: managing hardware resources for predictable execution
Shared hardware resources in commodity multicore processors are subject to contention from co-running threads. The resultant interference can lead to highly-variable performance for individual applications. This is particularly problematic for real-time applications, which require predictable timing guarantees. It also leads to a pessimistic estimate of the Worst Case Execution Time (WCET) for every real-time application. More CPU time needs to be reserved, thus less applications can enter the system. As the average execution time is usually far less than the WCET, a significant amount of reserved CPU resource would be wasted.
Previous works have attempted partitioning the shared resources, amongst either CPUs or processes, to improve performance isolation. However, they have not proven to be both efficient and effective. In this thesis, we propose several mechanisms and frameworks that manage the shared caches and memory buses on multicore platforms. Firstly, we introduce a multicore real-time scheduling framework with the foreground/background scheduling model. Combining real-time load balancing with background scheduling, CPU utilization is greatly improved. Besides, a memory bus management mechanism is implemented on top of the background scheduling, making sure bus contention is under control while utilizing unused CPU cycles. Also, cache partitioning is thoroughly studied in this thesis, with a cache-aware load balancing algorithm and a dynamic cache partitioning framework proposed. Lastly, we describe a system architecture to integrate the above solutions all together. It tackles one of the toughest problems in OS innovation, legacy support, by converting existing OSes into libraries in a virtualized environment. Thus, within a single multicore platform, we benefit from the fine-grained resource control of a real-time OS and the richness of functionality of a general-purpose OS
A Survey on Cache Management Mechanisms for Real-Time Embedded Systems
© ACM, 2015. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM Computing Surveys, {48, 2, (November 2015)} http://doi.acm.org/10.1145/2830555Multicore processors are being extensively used by real-time systems, mainly because of their demand for
increased computing power. However, multicore processors have shared resources that affect the predictability
of real-time systems, which is the key to correctly estimate the worst-case execution time of tasks. One of
the main factors for unpredictability in a multicore processor is the cache memory hierarchy. Recently, many
research works have proposed different techniques to deal with caches in multicore processors in the context
of real-time systems. Nevertheless, a review and categorization of these techniques is still an open topic and
would be very useful for the real-time community. In this article, we present a survey of cache management
techniques for real-time embedded systems, from the first studies of the field in 1990 up to the latest research
published in 2014. We categorize the main research works and provide a detailed comparison in terms of
similarities and differences. We also identify key challenges and discuss future research directions.King Saud University
NSER
Cache-aware Interfaces for Compositional Real-Time Systems
Interface-based compositional analysis is by now a fairly established area of research in real-time systems. However, current research has not yet fully considered practical aspects, such as the effects of cache interferences on multicore platforms. This position paper discusses the analysis challenges and motivates the need for cache scheduling in this setting, and it highlights several research questions towards cache-aware interfaces for compositional systems on multicore platforms
A survey of techniques for reducing interference in real-time applications on multicore platforms
This survey reviews the scientific literature on techniques for reducing interference in real-time multicore systems, focusing on the approaches proposed between 2015 and 2020. It also presents proposals that use interference reduction techniques without considering the predictability issue. The survey highlights interference sources and categorizes proposals from the perspective of the shared resource. It covers techniques for reducing contentions in main memory, cache memory, a memory bus, and the integration of interference effects into schedulability analysis. Every section contains an overview of each proposal and an assessment of its advantages and disadvantages.This work was supported in part by the Comunidad de Madrid Government "Nuevas Técnicas de Desarrollo de Software de Tiempo Real Embarcado Para Plataformas. MPSoC de Próxima Generación" under Grant IND2019/TIC-17261
- …