563 research outputs found
WCET-aware Software Based Cache Partitioning for Multi-Task Real-Time Systems
Caches are a source of unpredictability since it is very difficult to predict if a memory access results in a cache hit or miss. In systems running multiple tasks steered by a preempting scheduler, it is even impossible to determine the cache behavior since interrupt-driven schedulers lead to unknown points of time for context switches. Partitioned caches are already used in multi-task environments to increase the cache hit ratio by avoiding mutual eviction of tasks from the cache.
For real-time systems, the upper bound of the execution time is one of the most important metrics, called the Worst-Case Execution Time (WCET). In this paper, we use partitioning of instruction caches as a technique to achieve tighter WCET estimations since tasks can not be evicted from their partition by other tasks. We propose a novel WCET-aware cache partitioning algorithm, which determines the optimal partition size for each task with focus on decreasing the system\u27s WCET for a given set of possible partition sizes. Employing this algorithm, we are able to decrease the WCET depending on the number of tasks in a set by up to 34%. On average, reductions between 12% and 19% can be achieved
Parallelism-Aware Memory Interference Delay Analysis for COTS Multicore Systems
In modern Commercial Off-The-Shelf (COTS) multicore systems, each core can
generate many parallel memory requests at a time. The processing of these
parallel requests in the DRAM controller greatly affects the memory
interference delay experienced by running tasks on the platform. In this paper,
we model a modern COTS multicore system which has a nonblocking last-level
cache (LLC) and a DRAM controller that prioritizes reads over writes. To
minimize interference, we focus on LLC and DRAM bank partitioned systems. Based
on the model, we propose an analysis that computes a safe upper bound for the
worst-case memory interference delay. We validated our analysis on a real COTS
multicore platform with a set of carefully designed synthetic benchmarks as
well as SPEC2006 benchmarks. Evaluation results show that our analysis is more
accurately capture the worst-case memory interference delay and provides safer
upper bounds compared to a recently proposed analysis which significantly
under-estimate the delay.Comment: Technical Repor
A Survey on Cache Management Mechanisms for Real-Time Embedded Systems
© ACM, 2015. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM Computing Surveys, {48, 2, (November 2015)} http://doi.acm.org/10.1145/2830555Multicore processors are being extensively used by real-time systems, mainly because of their demand for
increased computing power. However, multicore processors have shared resources that affect the predictability
of real-time systems, which is the key to correctly estimate the worst-case execution time of tasks. One of
the main factors for unpredictability in a multicore processor is the cache memory hierarchy. Recently, many
research works have proposed different techniques to deal with caches in multicore processors in the context
of real-time systems. Nevertheless, a review and categorization of these techniques is still an open topic and
would be very useful for the real-time community. In this article, we present a survey of cache management
techniques for real-time embedded systems, from the first studies of the field in 1990 up to the latest research
published in 2014. We categorize the main research works and provide a detailed comparison in terms of
similarities and differences. We also identify key challenges and discuss future research directions.King Saud University
NSER
A Multi-core processor for hard real-time systems
The increasing demand for new functionalities in current and future hard real-time embedded systems, like the ones deployed in automotive and avionics industries, is driving an increment in the performance required in current embedded processors. Multi-core processors represent a good design solution to cope with such higher performance requirements due to their better performance-per-watt ratio while maintaining the core design simple. Moreover, multi-cores also allow executing mixed-criticality level workloads composed of tasks with and without hard real-time requirements, maximizing the utilization of the hardware resources while guaranteeing low cost and low power consumption.
Despite those benefits, current multi-core processors are less analyzable than single-core ones due to the interferences between different tasks when accessing hardware shared resources. As a result, estimating a meaningful Worst-Case Execution Time (WCET) estimation - i.e. to compute an upper bound of the application's execution time - becomes extremely difficult, if not even impossible, because the execution time of a task may change depending on the other threads running at the same time. This makes the WCET of a task dependent on the set of inter-task interferences introduced by the co-running tasks.
Providing a WCET estimation independent from the other tasks (time composability property) is a key requirement in hard real-time systems.
This thesis proposes a new multi-core processor design in which time composability is achieved, hence enabling the use of multi-cores in hard real-time systems. With our proposals the WCET estimation of a HRT is independent from the other co-running tasks. To that end, we design a multi-core processor in which the maximum delay a request from a Hard Real-time Task (HRT), accessing a hardware shared resource can suffer due to other tasks is bounded: our processor guarantees that a request to a shared resource cannot be delayed longer than a given Upper Bound Delay (UBD).
In addition, the UBD allows identifying the impact that different processor configurations may have on the WCET by determining the sensitivity of a HRT to different resource allocations. This thesis proposes an off-line task allocation algorithm (called IA3: Interference-Aware Allocation Algorithm), that allocates tasks in a task set based on the HRT's sensitivity to different resource allocations. As a result the hardware shared resources used by HRTs are minimized, by allowing Non Hard Real-time Tasks (NHRTs) to use the rest of resources. Overall, our proposals provide analyzability for the HRTs allowing NHRTs to be executed into the same chip without any effect on the HRTs.
The previous first two proposals of this thesis focused on supporting the execution of multi-programmed workloads with mixed-criticality levels (composed of HRTs and NHRTs).
Higher performance could be achieved by implementing multi-threaded applications. As a first step towards supporting hard real-time parallel applications, this thesis proposes a new hardware/software approach to guarantee a predictable execution of software pipelined parallel programs.
This thesis also investigates a solution to verify the timing correctness of HRTs without requiring any modification in the core design: we design a hardware unit which is interfaced with the processor and integrated into a functional-safety aware methodology. This unit monitors the execution time of a block of instructions and it detects if it exceeds the WCET. Concretely, we show how to handle timing faults on a real industrial automotive platform.La creciente demanda de nuevas funcionalidades en los sistemas empotrados de tiempo real actuales y futuros en
industrias como la automovilÃstica y la de aviación, está impulsando un incremento en el rendimiento necesario en los
actuales procesadores empotrados. Los procesadores multi-núcleo son una solución eficiente para obtener un mayor
rendimiento ya que aumentan el rendimiento por vatio, manteniendo el diseño del núcleo simple.
Por otra parte, los procesadores multi-núcleo también permiten ejecutar cargas de trabajo con niveles de tiempo real mixtas
(formadas por tareas de tiempo real duro y laxo asà como tareas sin requerimientos de tiempo real), maximizando asà la
utilización de los recursos de procesador y garantizando el bajo consumo de energÃa.
Sin embargo, a pesar los beneficios mencionados anteriormente, los actuales procesadores multi-núcleo son menos
analizables que los de un solo núcleo debido a las interferencias surgidas cuando múltiples tareas acceden
simultáneamente a los recursos compartidos del procesador.
Como resultado, la estimación del peor tiempo de ejecución (conocido como WCET) - es decir, una cota superior del tiempo
de ejecución de la aplicación - se convierte en extremadamente difÃcil, si no imposible, porque el tiempo de ejecución de
una tarea puede cambiar dependiendo de las otras tareas que se estén ejecutando concurrentemente. Determinar una
estimación del WCET independiente de las otras tareas es un requisito clave en los sistemas empotrados de tiempo real
duro. Esta tesis propone un nuevo diseño de procesador multi-núcleo en el que el tiempo de ejecución de las tareas se
puede componer, lo que permitirá el uso de procesadores multi-núcleo en los sistemas de tiempo real duro. Para ello,
diseñamos un procesador multi-núcleo en el que la máxima demora que puede sufrir una petición de una tarea de tiempo
real duro (HRT) para acceder a un recurso hardware compartido debido a otras tareas está acotado, tiene un lÃmite superior
(UBD).
Además, UBD permite identificar el impacto que las diferentes posibles configuraciones del procesador pueden tener en el
WCET, mediante la determinación de la sensibilidad en la variación del tiempo de ejecución de diferentes reservas de
recursos del procesador. Esta tesis propone un algoritmo estático de reserva de recursos (llamado IA3), que asigna tareas
a núcleos en función de dicha sensibilidad. Como resultado los recursos compartidos del procesador usados por tareas
HRT se reducen al mÃnimo, permitiendo que las tareas sin requerimiento de tiempo real (NHRTs) puedas beneficiarse del
resto de recursos.
Por lo tanto, las propuestas presentadas en esta tesis permiten el análisis del WCET para tareas HRT, permitiendo asÃ
mismo la ejecución de tareas NHRTs en el mismo procesador multi-núcleo, sin que estas tengan ningún efecto sobre las
tareas HRT.
Las propuestas presentadas anteriormente se centran en el soporte a la ejecución de múltiples cargas de trabajo con
diferentes niveles de tiempo real (HRT y NHRTs).
Sin embargo, un mayor rendimiento puede lograrse mediante la transformación una tarea en múltiples sub-tareas
paralelas. Esta tesis propone una nueva técnica, con soporte del procesador y del sistema operativo, que garantiza una
ejecución analizable del modelo de ejecución paralela software pipelining.
Esta tesis también investiga una solución para verificar la corrección del WCET de HRT sin necesidad de ninguna
modificación en el diseño de la base: un nuevo componente externo al procesador se conecta a este sin necesidad de
modificarlo. Esta nueva unidad monitorea el tiempo de ejecución de un bloque de instrucciones y detecta si se excede el
WCET. Esta unidad permite detectar fallos de sincronización en sistemas de computación utilizados en automóviles
An Overview of Approaches Towards the Timing Analysability of Parallel Architecture
In order to meet performance/low energy/integration requirements, parallel architectures (multithreaded cores and multi-cores) are more and more considered in the design of embedded systems running critical software. The objective is to run several applications concurrently. When applications have strict real-time constraints, two questions arise: a) how can the worst-case execution time (WCET) of each application be computed while concurrent applications might interfere? b)~how can the tasks be scheduled so that they are guarantee to meet their deadlines? The second question has received much attention for several years~cite{CFHS04,DaBu11}. Proposed schemes generally assume that the first question has been solved, and in addition that they do not impact the WCETs. In effect, the first question is far from been answered even if several approaches have been proposed in the literature. In this paper, we present an overview of these approaches from the point of view of static WCET analysis techniques
A survey of techniques for reducing interference in real-time applications on multicore platforms
This survey reviews the scientific literature on techniques for reducing interference in real-time multicore systems, focusing on the approaches proposed between 2015 and 2020. It also presents proposals that use interference reduction techniques without considering the predictability issue. The survey highlights interference sources and categorizes proposals from the perspective of the shared resource. It covers techniques for reducing contentions in main memory, cache memory, a memory bus, and the integration of interference effects into schedulability analysis. Every section contains an overview of each proposal and an assessment of its advantages and disadvantages.This work was supported in part by the Comunidad de Madrid Government "Nuevas Técnicas de Desarrollo de Software de Tiempo Real Embarcado Para Plataformas. MPSoC de Próxima Generación" under Grant IND2019/TIC-17261
Cache Interference-aware Task Partitioning for Non-preemptive Real-time Multi-core Systems
Shared caches in multi-core processors introduce serious difficulties in providing guarantees on the real-time properties of embedded software due to the interaction and the resulting contention in the shared caches. Prior work has studied the schedulability analysis of global scheduling for real-time multi-core systems with shared caches. This article considers another common scheduling paradigm: partitioned scheduling in the presence of shared cache interference. To achieve this, we propose CITTA, a cache interference-aware task partitioning algorithm. We first analyze the shared cache interference between two programs for set-associative instruction and data caches. Then, an integer programming formulation is constructed to calculate the upper bound on cache interference exhibited by a task, which is required by CITTA. We conduct schedulability analysis of CITTA and formally prove its correctness. A set of experiments is performed to evaluate the schedulability performance of CITTA against global EDF scheduling and other greedy partition approaches such as First-fit and Worst-fit over randomly generated tasksets and realistic workloads in embedded systems. Our empirical evaluations show that CITTA outperforms global EDF scheduling and greedy partition approaches in terms of task sets deemed schedulable
Holistic resource allocation for multicore real-time systems
This paper presents CaM, a holistic cache and memory bandwidth resource allocation strategy for multicore real-time systems. CaM is designed for partitioned scheduling, where tasks are mapped onto cores, and the shared cache and memory bandwidth resources are partitioned among cores to reduce resource interferences due to concurrent accesses. Based on our extension of LITMUSRT with Intel’s Cache Allocation Technology and MemGuard, we present an experimental evaluation of the relationship between the allocation of cache and memory bandwidth resources and a task’s WCET. Our resource allocation strategy exploits this relationship to map tasks onto cores, and to compute the resource allocation for each core. By grouping tasks with similar characteristics (in terms of resource demands) to the same core, it enables tasks on each core to fully utilize the assigned resources. In addition, based on the tasks’ execution time behaviors with respect to their assigned resources, we can determine a desirable allocation that maximizes schedulability under resource constraints. Extensive evaluations using real-world benchmarks show that CaM offers near optimal schedulability performance while being highly efficient, and that it substantially outperforms existing solutions
Cache-aware static scheduling for hard real-time multicore systems based on communication affinities
The growing need for continuous processing capabilities has led to the
development of multicore systems with a complex cache hierarchy. Such multicore
systems are generally designed for improving the performance in average case,
while hard real-time systems must consider worst-case scenarios. An open
challenge is therefore to efficiently schedule hard real-time tasks on a
multicore architecture. In this work, we propose a mathematical formulation for
computing a static scheduling that minimize L1 data cache misses between hard
real-time tasks on a multicore architecture using communication affinities
On the effectiveness of cache partitioning in hard real-time systems
In hard real-time systems, cache partitioning is often suggested as a means of increasing the predictability of caches in pre-emptively scheduled systems: when a task is assigned its own cache partition, inter-task cache eviction is avoided, and timing verification is reduced to the standard worst-case execution time analysis used in non-pre-emptive systems. The downside of cache partitioning is the potential increase in execution times. In this paper, we evaluate cache partitioning for hard real-time systems in terms of overall schedulability. To this end, we examine the sensitivity of (i) task execution times and (ii) pre-emption costs to the size of the cache partition allocated and present a cache partitioning algorithm that is optimal with respect to taskset schedulability. We also devise an alternative algorithm which primarily optimises schedulability but also minimises processor utilization. We evaluate the performance of cache partitioning compared to state-of-the-art pre-emption cost analysis based on benchmark code and on a large number of synthetic tasksets with both fixed priority and EDF scheduling. This allows us to derive general conclusions about the usability of cache partitioning and identify taskset and system parameters that influence the relative effectiveness of cache partitioning. We also examine the improvement in processor utilization obtained using an alternative cache partitioning algorithm, and the tradeoff in terms of increased analysis time
- …