Search CORE

100 research outputs found

Real-Time Scheduling on Multi-core: Theory and Practice

Author: Paulo Manuel Baltarejo Sousa
Publication venue
Publication date: 29/10/2013
Field of study

Repositório Aberto da Universidade do Porto

Real-Time Wireless Sensor-Actuator Networks for Cyber-Physical Systems

Author: Saifullah Abusayeed
Publication venue: Washington University Open Scholarship
Publication date: 01/09/2014
Field of study

A cyber-physical system (CPS) employs tight integration of, and coordination between computational, networking, and physical elements. Wireless sensor-actuator networks provide a new communication technology for a broad range of CPS applications such as process control, smart manufacturing, and data center management. Sensing and control in these systems need to meet stringent real-time performance requirements on communication latency in challenging environments. There have been limited results on real-time scheduling theory for wireless sensor-actuator networks. Real-time transmission scheduling and analysis for wireless sensor-actuator networks requires new methodologies to deal with unique characteristics of wireless communication. Furthermore, the performance of a wireless control involves intricate interactions between real-time communication and control. This thesis research tackles these challenges and make a series of contributions to the theory and system for wireless CPS. (1) We establish a new real-time scheduling theory for wireless sensor-actuator networks. (2) We develop a scheduling-control co-design approach for holistic optimization of control performance in a wireless control system. (3) We design and implement a wireless sensor-actuator network for CPS in data center power management. (4) We expand our research to develop scheduling algorithms and analyses for real-time parallel computing to support computation-intensive CPS

Washington University St. Louis: Open Scholarship

Towards Timing Analysis of Multi-core Platforms for Hard Real-Time Systems

Author: Syed Aftab Rashid
Publication venue
Publication date: 09/04/2021
Field of study

Repositório Aberto da Universidade do Porto

Response-time analysis for fixed-priority systems with a write-back cache

Author: Altmeyer Sebastian
Davis Robert Ian
Reineke Jan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

This paper introduces analyses of write-back caches integrated into response-time analysis for fixed-priority preemptive and non-preemptive scheduling. For each scheduling paradigm, we derive four different approaches to computing the additional costs incurred due to write backs. We show the dominance relationships between these different approaches and note how they can be combined to form a single state-of-the-art approach in each case. The evaluation explores the relative performance of the different methods using a set of benchmarks, as well as making comparisons with no cache and a write-through cache. We also explore the effect of write buffers used to hide the latency of write-through caches. We show that depending upon the depth of the buffer used and the policies employed, such buffers can result in domino effects. Our evaluation shows that even ignoring domino effects, a substantial write buffer is needed to match the guaranteed performance of write-back caches

OPUS Augsburg

INRIA a CCSD electronic archive server

White Rose Research Online

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Dynamic resource provisioning for data center workloads with data constraints

Author: Li Shen
Publication venue
Publication date: 01/05/2016
Field of study

Dynamic resource provisioning, as an important data center software building block, helps to achieve high resource usage efficiency, leading to enormous monetary benefits. Most existing work for data center dynamic provisioning target on stateless servers, where any request can be routed to any server. However, the assumption of stateless behaviors no longer holds for subsystems that subject to data constraints, as a request may depend on a certain dataset stored on a small subset of servers. Routing a request to a server without the required dataset violates data locality or data availability properties, which may negatively impact on the response times. To solve this problem, this thesis provides an unified framework consisting of two main steps: 1) determining the proper amount of resources to serve the workload by analyzing the schedulability utilization bound; 2) avoiding transition penalties during cluster resizing operations by deliberately design data distribution policies. We apply this framework to both storage and computing subsystems, where the former includes distributed file systems, databases, memory caches, and the latter refers to systems such as Hadoop, Spark, and Storm. Proposed solutions are implemented into MemCached, HBase/HDFS, and Spark, and evaluated using various datasets, including Wikipedia, NYC taxi trace, Twitter traces, etc

Illinois Digital Environment for Access to Learning and Scholarship Repository

Real-time operating system support for multicore applications

Author: Gracioli Giovani
Publication venue
Publication date: 01/01/2014
Field of study

Tese (doutorado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Engenharia de Automação e Sistemas, Florianópolis, 2014Plataformas multiprocessadas atuais possuem diversos níveis da memória cache entre o processador e a memória principal para esconder a latência da hierarquia de memória. O principal objetivo da hierarquia de memória é melhorar o tempo médio de execução, ao custo da previsibilidade. O uso não controlado da hierarquia da cache pelas tarefas de tempo real impacta a estimativa dos seus piores tempos de execução, especialmente quando as tarefas de tempo real acessam os níveis da cache compartilhados. Tal acesso causa uma disputa pelas linhas da cache compartilhadas e aumenta o tempo de execução das aplicações. Além disso, essa disputa na cache compartilhada pode causar a perda de prazos, o que é intolerável em sistemas de tempo real críticos. O particionamento da memória cache compartilhada é uma técnica bastante utilizada em sistemas de tempo real multiprocessados para isolar as tarefas e melhorar a previsibilidade do sistema. Atualmente, os estudos que avaliam o particionamento da memória cache em multiprocessadores carecem de dois pontos fundamentais. Primeiro, o mecanismo de particionamento da cache é tipicamente implementado em um ambiente simulado ou em um sistema operacional de propósito geral. Consequentemente, o impacto das atividades realizados pelo núcleo do sistema operacional, tais como o tratamento de interrupções e troca de contexto, no particionamento das tarefas tende a ser negligenciado. Segundo, a avaliação é restrita a um escalonador global ou particionado, e assim não comparando o desempenho do particionamento da cache em diferentes estratégias de escalonamento. Ademais, trabalhos recentes confirmaram que aspectos da implementação do SO, tal como a estrutura de dados usada no escalonamento e os mecanismos de tratamento de interrupções, impactam a escalonabilidade das tarefas de tempo real tanto quanto os aspectos teóricos. Entretanto, tais estudos também usaram sistemas operacionais de propósito geral com extensões de tempo real, que afetamos sobre custos de tempo de execução observados e a escalonabilidade das tarefas de tempo real. Adicionalmente, os algoritmos de escalonamento tempo real para multiprocessadores atuais não consideram cenários onde tarefas de tempo real acessam as mesmas linhas da cache, o que dificulta a estimativa do pior tempo de execução. Esta pesquisa aborda os problemas supracitados com as estratégias de particionamento da cache e com os algoritmos de escalonamento tempo real multiprocessados da seguinte forma. Primeiro, uma infraestrutura de tempo real para multiprocessadores é projetada e implementada em um sistema operacional embarcado. A infraestrutura consiste em diversos algoritmos de escalonamento tempo real, tais como o EDF global e particionado, e um mecanismo de particionamento da cache usando a técnica de coloração de páginas. Segundo, é apresentada uma comparação em termos da taxa de escalonabilidade considerando o sobre custo de tempo de execução da infraestrutura criada e de um sistema operacional de propósito geral com extensões de tempo real. Em alguns casos, o EDF global considerando o sobre custo do sistema operacional embarcado possui uma melhor taxa de escalonabilidade do que o EDF particionado com o sobre custo do sistema operacional de propósito geral, mostrando claramente como diferentes sistemas operacionais influenciam os escalonadores de tempo real críticos em multiprocessadores. Terceiro, é realizada uma avaliação do impacto do particionamento da memória cache em diversos escalonadores de tempo real multiprocessados. Os resultados desta avaliação indicam que um sistema operacional "leve" não compromete as garantias de tempo real e que o particionamento da cache tem diferentes comportamentos dependendo do escalonador e do tamanho do conjunto de trabalho das tarefas. Quarto, é proposto um algoritmo de particionamento de tarefas que atribui as tarefas que compartilham partições ao mesmo processador. Os resultados mostram que essa técnica de particionamento de tarefas reduz a disputa pelas linhas da cache compartilhadas e provê garantias de tempo real para sistemas críticos. Finalmente, é proposto um escalonador de tempo real de duas fases para multiprocessadores. O escalonador usa informações coletadas durante o tempo de execução das tarefas através dos contadores de desempenho em hardware. Com base nos valores dos contadores, o escalonador detecta quando tarefas de melhor esforço o interferem com tarefas de tempo real na cache. Assim é possível impedir que tarefas de melhor esforço acessem as mesmas linhas da cache que tarefas de tempo real. O resultado desta estratégia de escalonamento é o atendimento dos prazos críticos e não críticos das tarefas de tempo real.Abstracts: Modern multicore platforms feature multiple levels of cache memory placed between the processor and main memory to hide the latency of ordinary memory systems. The primary goal of this cache hierarchy is to improve average execution time (at the cost of predictability). The uncontrolled use of the cache hierarchy by realtime tasks may impact the estimation of their worst-case execution times (WCET), specially when real-time tasks access a shared cache level, causing a contention for shared cache lines and increasing the application execution time. This contention in the shared cache may leadto deadline losses, which is intolerable particularly for hard real-time (HRT) systems. Shared cache partitioning is a well-known technique used in multicore real-time systems to isolate task workloads and to improve system predictability. Presently, the state-of-the-art studies that evaluate shared cache partitioning on multicore processors lack two key issues. First, the cache partitioning mechanism is typically implemented either in a simulated environment or in a general-purpose OS (GPOS), and so the impact of kernel activities, such as interrupt handlers and context switching, on the task partitions tend to be overlooked. Second, the evaluation is typically restricted to either a global or partitioned scheduler, thereby by falling to compare the performance of cache partitioning when tasks are scheduled by different schedulers. Furthermore, recent works have confirmed that OS implementation aspects, such as the choice of scheduling data structures and interrupt handling mechanisms, impact real-time schedulability as much as scheduling theoretic aspects. However, these studies also used real-time patches applied into GPOSes, which affects the run-time overhead observed in these works and consequently the schedulability of real-time tasks. Additionally, current multicore scheduling algorithms do not consider scenarios where real-time tasks access the same cache lines due to true or false sharing, which also impacts the WCET. This thesis addresses these aforementioned problems with cache partitioning techniques and multicore real-time scheduling algorithms as following. First, a real-time multicore support is designed and implemented on top of an embedded operating system designed from scratch. This support consists of several multicore real-time scheduling algorithms, such as global and partitioned EDF, and a cache partitioning mechanism based on page coloring. Second, it is presented a comparison in terms of schedulability ratio considering the run-time overhead of the implemented RTOS and a GPOS patched with real-time extensions. In some cases, Global-EDF considering the overhead of the RTOS is superior to Partitioned-EDF considering the overhead of the patched GPOS, which clearly shows how different OSs impact hard realtime schedulers. Third, an evaluation of the cache partitioning impacton partitioned, clustered, and global real-time schedulers is performed.The results indicate that a lightweight RTOS does not impact real-time tasks, and shared cache partitioning has different behavior depending on the scheduler and the task's working set size. Fourth, a task partitioning algorithm that assigns tasks to cores respecting their usage of cache partitions is proposed. The results show that by simply assigning tasks that shared cache partitions to the same processor, it is possible to reduce the contention for shared cache lines and to provideHRT guarantees. Finally, a two-phase multicore scheduler that provides HRT and soft real-time (SRT) guarantees is proposed. It is shown that by using information from hardware performance counters at run-time, the RTOS can detect when best-effort tasks interfere with real-time tasks in the shared cache. Then, the RTOS can prevent best effort tasks from interfering with real-time tasks. The results also show that the assignment of exclusive partitions to HRT tasks together with the two-phase multicore scheduler provides HRT and SRT guarantees, even when best-effort tasks share partitions with real-time tasks

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositório Institucional da UFSC

RCAAP - Repositório Científico de Acesso Aberto de Portugal

Automated Compilation Framework for Scratchpad-based Real-Time Systems

Author: Soliman Muhammad Refaat Sedky
Publication venue: 'University of Waterloo'
Publication date: 19/07/2019
Field of study

ScratchPad Memory (SPM) is highly adopted in real-time systems as it exhibits a predictable behaviour. SPM is software-managed by explicitly inserting instructions to move code and data transfers between the SPM and the main memory. However, it is a tedious job to decide how to manage the SPM and to manually modify the code to insert memory transfers. Hence, an automated compilation tool is essential to efficiently utilize the SPM. Another key problem with SPM is the latency suffered by the system due to memory transfers. Hiding this latency is important for high-performance systems. In this thesis, we address the problems of managing SPM and reducing the impact of memory latency. To realize the automation of our work, we develop a compilation framework based on the LLVM compiler to analyze and transform the program code. We exploit our framework to improve the performance of the execution of single and multi-tasks in real-time systems. For the single task execution, Worst-Case Execution Time (WCET) is of great importance to assure correct and safe behaviour of the system. So, we propose a WCET-driven allocation technique for data SPM that employs software prefetching to efficiently manage the SPM and to overlap the memory transfer and the task execution in a predictable way. On the other hand, multi-tasking requires the system to be schedulable such that all the tasks can meet their timing requirements. However, executing multiple tasks on a multi-processor platform suffers from the contention of the accesses to the shared main memory. To avoid the contention, several scheduling techniques adopted the 3-phase execution model which executes the task as a sequence of memory and computation phases. This provides the means to avoid the contention as well as to hide the memory latency by using a Direct Memory Access (DMA) engine. Executing memory transfers using the DMA allows overlapping the memory transfers with the computations on the processor. Using the 3-phase model in systems with limited sizes of local SPM may necessitate a segmentation of the task. Automating the segmentation process is necessary especially for systems with large task sets. Hence, we propose a set of efficient segmentation algorithms that follow the 3-phase execution model. The application of these algorithms shows a significant improvement in the system schedulability. For our segmentation algorithms to be more applicable, we extend the 3-phase model to allow programs with multiple paths represented as conditional Directed Acyclic Graphs (DAGs), unlike the previous works that targeted sequential programs. We also introduce a multi-steaming model to exploit the benefits of prefetching by overlapping the memory and computation phases of the same task, which was not allowed in the previous approaches. By combining the automated compilation with the proposed algorithms, we are able to achieve our goal to efficiently manage data SPM in real-time systems

University of Waterloo's Institutional Repository

Deployment and Debugging of Real-Time Applications on Multicore Architectures

Author: Kashif Hany
Publication venue: 'University of Waterloo'
Publication date: 01/01/2015
Field of study

It is essential to enable information extraction from software. Program tracing techniques are an example of information extraction. Program tracing extracts information from the program during execution. Tracing helps with the testing and validation of software to ensure that the software under test is correct. Information extraction is done by instrumenting the program. Logged information can be stored in dedicated logging memories or can be buffered and streamed off-chip to an external monitor. The designer inspects the trace after execution to identify potentially erroneous state information. In addition, the trace can provide the state information that serves as input to generate the erroneous output for reproducibility. Information extraction can be difficult and expensive due to the increase in size and complexity of modern software systems. For the sub-class of software systems known as real-time systems, these issues are further aggravated. This is because real-time systems demand timing guarantees in addition to functional correctness. Consequently, any instrumentation to the original program code for the purpose of information extraction may affect the temporal behaviors of the program. This perturbation of temporal behaviors can lead to the violation of timing constraints, which may bias the program execution and/or cause the program to miss its deadline. As a result, there is considerable interest in devising techniques to allow for information extraction without missing a program’s deadline that is known as time-aware instrumentation. This thesis investigates time-aware instrumentation mechanisms to instrument programs while respecting their timing constraints and functional behavior. Knowledge of the underlying hardware on which the software runs, enables the extraction of more information via the instrumentation process. Chip-multiprocessors offer a solution to the performance bottleneck on uni-processors. Providing timing guarantees for hard real-time systems, however, on chip-multiprocessors is difficult. This is because conventional communication interconnects are designed to optimize the average-case performance. Therefore, researchers propose interconnects such as the priority-aware networks to satisfy the requirements of hard real-time systems. The priority-aware interconnects, however, lack the proper analysis techniques to facilitate the deployment of real-time systems. This thesis also investigates latency and buffer space analysis techniques for pipelined communication resource models, as well as algorithms for the proper deployment of real-time applications to these platforms. The analysis techniques proposed in this thesis provide guarantees on the schedulability of real-time systems on chip-multiprocessors. These guarantees are based on reducing contention in the interconnect while simultaneously accurately computing the worst-case communication latencies. While these worst-case latencies provide bounds for computing the overall worst-case execution time of applications on chip-multiprocessors, they also provide means to assigning instrumentation budgets required by time-aware instrumentation. Leveraging these platform-specific analysis techniques for the assignment of instrumentation budgets, allows for extracting more information from the instrumentation process

University of Waterloo's Institutional Repository

Scheduling Tasks on Intermittently-Powered Real-Time Systems

Author: Islam Bashima
Publication venue: University of North Carolina at Chapel Hill Graduate School
Publication date: 01/01/2021
Field of study

Batteryless systems go through sporadic power on and off phases due to intermittently available energy; thus, they are called intermittent systems. Unfortunately, this intermittence in power supply hinders the timely execution of tasks and limits such devices’ potential in certain application domains, e.g., healthcare, live-stock tracking. Unlike prior work on time-aware intermittent systems that focuses on timekeeping [1, 2, 3] and discarding expired data [4], this dissertation concentrates on finishing task execution on time. I leverage the data processing and control layer of batteryless systems by developing frameworks that (1) integrate energy harvesting and real-time systems, (2) rethink machine learning algorithms for an energy-aware imprecise task scheduling framework, (3) develop scheduling algorithms that, along with deciding what to compute, answers when to compute and when to harvest, and (4) utilize distributed systems that collaboratively emulate a persistently powered system. Scheduling Framework for Intermittently Powered Computing Systems. Batteryless systems rely on sporadically available harvestable energy. For example, kinetic-powered motion detector sensors on the impalas can only harvest energy when the impalas are moving, which cannot be ascertained in advance. This uncertainty poses a unique real-time scheduling problem where existing real-time algorithms fail due to the interruption in execution time. This dissertation proposes a unified scheduling framework that includes both harvesting and computing. Imprecise Deep Neural Network Inference in Deadline-Aware Intermittent Systems. This dissertation proposes Zygarde- an energy-aware and outcome-aware soft-real-time imprecise deep neural network (DNN) task scheduling framework for intermittent systems. Zygarde leverages the semantic diversity of input data and layer-dependent expressiveness of deep features and infers only the necessary DNN layers based on available time and energy. Zygarde proposes a novel technique to determine the imprecise boundary at the runtime by exploiting the clustering classifiers and specialized offline training of the DNNs to minimize the loss of accuracy due to partial execution. It also proposes a single metric, η to represent a system’s predictability that measures how close a harvesterâs harvesting pattern is to a constant energy source. Besides, Zygarde consists of a scheduling algorithm that takes available time, available energy, impreciseness, and the classifier's performance into account. Scheduling Mutually Exclusive Computing and Harvesting Tasks in Deadline-Aware Intermittent Systems. The lack of sufficient ambient energy to directly power the intermittent systems introduces mutually exclusive computing and charging cycles of intermittently powered systems. This introduces a challenging real-time scheduling problem where the existing real-time algorithms fail due to the lack of interruption in execution time. To address this, this dissertation proposes Celebi, which considers the dynamics of the available energy and schedules when to harvest and when to compute in batteryless systems. Using data-driven simulation and real-world experiments, this dissertation shows that Celebi significantly increases the number of tasks that complete execution before their deadline when power was only available intermittently. Persistent System Emulation with Distributed Intermittent System. Intermittently-powered sensing and computing systems go through sporadic power-on and off periods due to the uncertain availability of energy sources. Despite the recent efforts to advance time-sensitive intermittent systems, such systems fail to capture important target events when the energy is absent for a prolonged time. This event miss limits the potential usage of intermittent systems in fault- intolerant and safety-critical applications. To address this problem, this dissertation proposes Falinks, a framework that allows a swarm of distributed intermittently powered nodes to collaboratively imitate the sensing and computing capabilities of a persistently powered system. This framework provides power-on and off schedules for the swamp of intermittent nodes which has no communication capability with each other.Doctor of Philosoph

Carolina Digital Repository

Efficient Synchronization for Real-Time Systems with Nested Resource Access

Author: Nemitz Catherine
Publication venue: University of North Carolina at Chapel Hill Graduate School
Publication date: 01/01/2021
Field of study

Real-time systems are comprised of tasks, each of which must be guaranteed to meet its timing requirements. These tasks may request access to shared system components, called resources. Each such request may experience delays before being granted resource access. These delays can be separated into two categories: (i) those caused by the order in which tasks are granted resource access, and (ii) those caused by the time it takes to coordinate this ordering. If these delays are too large, a task may be unable to meet its timing requirements. Tasks can require access to multiple resources concurrently, acquiring these resources in a nested fashion. This nested resource access can cause significant delays to tasks; these delays can far exceed those when only a single resource is required at a time, as certain request orderings cause delays between tasks that do not share any resources. This dissertation presents locking protocols and a protocol-independent approach to mitigate resource access delays. Nested resource access can increase delays for all requests, including non-nested requests. The first protocol eliminates these additional delays by separating requests by type and creating a fast-path mechanism for non-nested requests. The next two protocols both reduce delays by reordering requests. One protocol reorders requests as they are issued, and the other uses an offline process to determine which requests may execute concurrently. These three protocols were compared to prior approaches in an evaluation across a range of task systems; all three protocols resulted in more task systems guaranteed to meet their timing requirements. Finally, a protocol-independent approach reduces delays by using a designated task to execute the locking protocol on behalf of other tasks. When applied to two protocol variants, this approach significantly reduced delays.Doctor of Philosoph

Carolina Digital Repository