Search CORE

7 research outputs found

Code generation for multi-phase tasks on a multi-core distributed memory platform

Author: Forget Julien
Fort Frédéric
Publication venue: HAL CCSD
Publication date: 19/08/2019
Field of study

International audienceEnsuring temporal predictability of real-time systems on a multi-core platform is difficult, mainly due to hard to predict delays related to shared access to the main memory. Task models where computation phases and communication phases are separated (such as the PRedictable Execution Model), have been proposed to both mitigate these delays and make them easier to analyze. In this paper we present a compilation process, part of the Prelude compiler, that automatically translates a high-level synchronous data-flow system specification into a PREM-compliant C program. By automating the production of the PREM-compliant C code, low-level implementation concerns related to task communications become the responsibility of the compiler, which saves tedious and error-prone development efforts

Contending memory in heterogeneous SoCs: Evolution in NVIDIA Tegra embedded platforms

Author: Bertogna M.
Capodieci N.
Cavicchioli R.
Olmedo I. S.
Solieri M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

Modern embedded platforms are known to be constrained by size, weight and power (SWaP) requirements. In such contexts, achieving the desired performance-per-watt target calls for increasing the number of processors rather than ramping up their voltage and frequency. Hence, generation after generation, modern heterogeneous System on Chips (SoC) present a higher number of cores within their CPU complexes as well as a wider variety of accelerators that leverages massively parallel compute architectures. Previous literature demonstrated that while increasing parallelism is theoretically optimal for improving on average performance, shared memory hierarchies (i.e. caches and system DRAM) act as a bottleneck by exposing the platform processors to severe contention on memory accesses, hence dramatically impacting performance and timing predictability. In this work we characterize how subsequent generations of embedded platforms from the NVIDIA Tegra family balanced the increasing parallelism of each platform's processors with the consequent higher potential on memory interference. We also present an open-source software for generating test scenarios aimed at measuring memory contention in highly heterogeneous SoCs

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Implementation of Memory Centric Scheduling for COTS Multi-Core Real-Time Systems

Author: Paolillo Antonio
Poczekajlo Xavier
Rivas Juan M.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 31st Euromicro Conference on Real-Time Systems (ECRTS 2019)
Publication date: 01/01/2019
Field of study

The demands for high performance computing with a low cost and low power consumption are driving a transition towards multi-core processors in many consumer and industrial applications. However, the adoption of multi-core processors in the domain of real-time systems faces a series of challenges that has been the focus of great research intensity during the last decade. These challenges arise in great part from the non real-time nature of the hardware arbiters that schedule the access to shared resources, such as the main memory. One solution proposed in the literature is called Memory Centric Scheduling, which defines a separate software scheduler for the sections of the tasks that will access the main memory, hence circumventing the low level unpredictable hardware arbiters. Several Memory Centric schedulers and associated theoretical analyses have been proposed, but as far as we know, no actual implementation of the required OS-level underpinnings to support dynamic event-driven Memory Centric Scheduling has been presented before. In this paper we aim to fill this gap, targeting cache based COTS multi-core systems. We will confirm via measurements the main theoretical benefits of Memory Centric Scheduling (e.g. task isolation). Furthermore, we will describe an effective schedulability analysis using concepts from distributed systems

Dagstuhl Research Online Publication Server

DI-fusion

PREM-Based Optimal Task Segmentation Under Fixed Priority Scheduling

Author: Pellizzoni Rodolfo
Soliman Muhammad R.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 31st Euromicro Conference on Real-Time Systems (ECRTS 2019)
Publication date: 01/01/2019
Field of study

Recently, a large number of works have discussed scheduling tasks consisting of a sequence of memory phases, where code and data are moved between main memory and local memory, and computation phases, where the task executes based on the content of local memory only; the key idea is to prevent main memory contention by scheduling the memory phase of one task in parallel with computation phases of tasks running on other cores. This paper provides two main contributions: (1) we present a compiler-level tool, based on the LLVM intermediate representation, that automatically converts a program into a conditional sequence of segments comprising memory and computation phases; (2) we propose an algorithm to find optimal segmentation decisions for a task set scheduled according to a fixed-priority partitioned scheme. Our evaluation shows that the proposed framework can be feasibly applied to realistic programs, and vastly overperforms a baseline greedy approach

Dagstuhl Research Online Publication Server

Dynamic Memory Bandwidth Allocation for Real-Time GPU-Based SoC Platforms

Author: Aghilinasab Homa
Publication venue: 'University of Waterloo'
Publication date: 14/05/2020
Field of study

Heterogeneous SoC platforms, comprising both general purpose CPUs and accelerators such as a GPU, are becoming increasingly attractive for real-time and mixed-criticality systems to cope with the computational demand of data parallel applications. However, contention for access to shared main memory can lead to significant performance degradation on both CPU and GPU. Existing work has shown that memory bandwidth throttling is effective in protecting real-time applications from memory-intensive, best-effort ones; however, due to the inherent pessimism involved in worst-case execution time estimation, such approaches can unduly restrict the bandwidth available to best-effort applications. In this work, we propose a novel memory bandwidth allocation scheme where we dynamically monitor the progress of a real-time application and increase the bandwidth share of best-effort ones whenever it is safe to do so. Specifically, we demonstrate our approach by protecting a real-time GPU kernel from best-effort CPU tasks. Based on profiling information, we first build a worst case execution time estimation model for the GPU kernel. Using such model, we then show how to dynamically recompute on-line the maximum memory budget that can be allocated to best-effort tasks without exceeding the kernel’s assigned execution budget. We implement our proposed technique on NVIDIA embedded SoC and demonstrate its effectiveness on a variety of GPU and CPU benchmarks

University of Waterloo's Institutional Repository

A survey of techniques for reducing interference in real-time applications on multicore platforms

Author: Carretero Pérez Jesús
Fernández Muñoz Javier
Lozano Santiago
Lugo Tamara
Publication venue: IEEE
Publication date: 15/02/2022
Field of study

This survey reviews the scientific literature on techniques for reducing interference in real-time multicore systems, focusing on the approaches proposed between 2015 and 2020. It also presents proposals that use interference reduction techniques without considering the predictability issue. The survey highlights interference sources and categorizes proposals from the perspective of the shared resource. It covers techniques for reducing contentions in main memory, cache memory, a memory bus, and the integration of interference effects into schedulability analysis. Every section contains an overview of each proposal and an assessment of its advantages and disadvantages.This work was supported in part by the Comunidad de Madrid Government "Nuevas Técnicas de Desarrollo de Software de Tiempo Real Embarcado Para Plataformas. MPSoC de Próxima Generación" under Grant IND2019/TIC-17261

Universidad Carlos III de Madrid e-Archivo

Automated Compilation Framework for Scratchpad-based Real-Time Systems

Author: Soliman Muhammad Refaat Sedky
Publication venue: 'University of Waterloo'
Publication date: 19/07/2019
Field of study

ScratchPad Memory (SPM) is highly adopted in real-time systems as it exhibits a predictable behaviour. SPM is software-managed by explicitly inserting instructions to move code and data transfers between the SPM and the main memory. However, it is a tedious job to decide how to manage the SPM and to manually modify the code to insert memory transfers. Hence, an automated compilation tool is essential to efficiently utilize the SPM. Another key problem with SPM is the latency suffered by the system due to memory transfers. Hiding this latency is important for high-performance systems. In this thesis, we address the problems of managing SPM and reducing the impact of memory latency. To realize the automation of our work, we develop a compilation framework based on the LLVM compiler to analyze and transform the program code. We exploit our framework to improve the performance of the execution of single and multi-tasks in real-time systems. For the single task execution, Worst-Case Execution Time (WCET) is of great importance to assure correct and safe behaviour of the system. So, we propose a WCET-driven allocation technique for data SPM that employs software prefetching to efficiently manage the SPM and to overlap the memory transfer and the task execution in a predictable way. On the other hand, multi-tasking requires the system to be schedulable such that all the tasks can meet their timing requirements. However, executing multiple tasks on a multi-processor platform suffers from the contention of the accesses to the shared main memory. To avoid the contention, several scheduling techniques adopted the 3-phase execution model which executes the task as a sequence of memory and computation phases. This provides the means to avoid the contention as well as to hide the memory latency by using a Direct Memory Access (DMA) engine. Executing memory transfers using the DMA allows overlapping the memory transfers with the computations on the processor. Using the 3-phase model in systems with limited sizes of local SPM may necessitate a segmentation of the task. Automating the segmentation process is necessary especially for systems with large task sets. Hence, we propose a set of efficient segmentation algorithms that follow the 3-phase execution model. The application of these algorithms shows a significant improvement in the system schedulability. For our segmentation algorithms to be more applicable, we extend the 3-phase model to allow programs with multiple paths represented as conditional Directed Acyclic Graphs (DAGs), unlike the previous works that targeted sequential programs. We also introduce a multi-steaming model to exploit the benefits of prefetching by overlapping the memory and computation phases of the same task, which was not allowed in the previous approaches. By combining the automated compilation with the proposed algorithms, we are able to achieve our goal to efficiently manage data SPM in real-time systems

University of Waterloo's Institutional Repository