134 research outputs found

    A generic framework to integrate data caches in the WCET analysis of real-time systems

    Get PDF
    Worst-case execution time (WCET) analysis of systems with data caches is one of the key challenges in real-time systems. Caches exploit the inherent reuse properties of programs by temporarily storing certain memory contents near the processor, in order that further accesses to such contents do not require costly memory transfers. Current worst-case data cache analysis methods focus on specific cache organizations (set-associative LRU, locked, ACDC, etc.), most of the times adapting techniques designed to analyze instruction caches. On the other hand, there are methodologies to analyze the data reuse of a program, independently of the data cache. In this paper we propose a generic WCET analysis framework to analyze data caches taking profit of such reuse information. It includes the categorization of data references and their integration in an IPET model. We apply it to a conventional LRU cache, an ACDC, and other baseline systems, and compare them using the TACLeBench benchmark suite. Our results show that persistence-based LRU analyses dismiss essential information on data, and a reuse-based analysis improves the WCET bound around 17% in average. In general, the best WCET estimations are obtained with optimization level 2, where the ACDC cache performs 39% better than a set-associative LRU

    Quality of Information in Mobile Crowdsensing: Survey and Research Challenges

    Full text link
    Smartphones have become the most pervasive devices in people's lives, and are clearly transforming the way we live and perceive technology. Today's smartphones benefit from almost ubiquitous Internet connectivity and come equipped with a plethora of inexpensive yet powerful embedded sensors, such as accelerometer, gyroscope, microphone, and camera. This unique combination has enabled revolutionary applications based on the mobile crowdsensing paradigm, such as real-time road traffic monitoring, air and noise pollution, crime control, and wildlife monitoring, just to name a few. Differently from prior sensing paradigms, humans are now the primary actors of the sensing process, since they become fundamental in retrieving reliable and up-to-date information about the event being monitored. As humans may behave unreliably or maliciously, assessing and guaranteeing Quality of Information (QoI) becomes more important than ever. In this paper, we provide a new framework for defining and enforcing the QoI in mobile crowdsensing, and analyze in depth the current state-of-the-art on the topic. We also outline novel research challenges, along with possible directions of future work.Comment: To appear in ACM Transactions on Sensor Networks (TOSN

    Energy-efficient thermal-aware multiprocessor scheduling for real-time tasks using TCPNs

    Get PDF
    We present an energy-effcient thermal-aware real-time global scheduler for a set of hard real-time (HRT) tasks running on a multiprocessor system. This global scheduler fulfills the thermal and temporal constraints by handling two independent variables, the task allocation time and the selection of clock frequency. To achieve its goal, the proposed scheduler is split into two stages. An off-line stage, based on a deadline partitioning scheme, computes the cycles that the HRT tasks must run per deadline interval at the minimum clock frequency to save energy while honoring the temporal and thermal constraints, and computes the maximum frequency at which the system can run below the maximum temperature. Then, an on-line, event-driven stage performs global task allocation applying a Fixed-Priority Zero-Laxity policy, reducing the overhead of quantum-based or interval-based global schedulers. The on-line stage embodies an adaptive scheduler that accepts or rejects soft RT aperiodic tasks throttling CPU frequency to the upper lowest available one to minimize power consumption while meeting time and thermal constraints. This approach leverages the best of two worlds: the off-line stage computes an ideal discrete HRT multiprocessor schedule, while the on-line stage manage soft real-time aperiodic tasks with minimum power consumption and maximum CPU utilization

    Data cache organization for accurate timing analysis

    Get PDF

    A UNIFIED HARDWARE/SOFTWARE PRIORITY SCHEDULING MODEL FOR GENERAL PURPOSE SYSTEMS

    Get PDF
    Migrating functionality from software to hardware has historically held the promise of enhancing performance through exploiting the inherent parallel nature of hardware. Many early exploratory efforts in repartitioning traditional software based services into hardware were hampered by expensive ASIC development costs. Recent advancements in FPGA technology have made it more economically feasible to explore migrating functionality across the hardware/software boundary. The flexibility of the FPGA fabric and availability of configurable soft IP components has opened the potential to rapidly and economically investigate different hardware/software partitions. Within the real time operating systems community, there has been continued interest in applying hardware/software co-design approaches to address scheduling issues such as latency and jitter. Many hardware based approaches have been reported to reduce the latency of computing the scheduling decision function itself. However continued adherence to classic scheduler invocation mechanisms can still allow variable latencies to creep into the time taken to make the scheduling decision, and ultimately into application timelines. This dissertation explores how hardware/software co-design can be applied past the scheduling decision itself to also reduce the non-predictable delays associated with interrupts and timers. By expanding the window of hardware/software co-design to these invocation mechanisms, we seek to understand if the jitter introduced by classical hardware/software partitionings can be removed from the timelines of critical real time user processes. This dissertation makes a case for resetting the classic boundaries of software thread level scheduling, software timers, hardware timers and interrupts. We show that reworking the boundaries of the scheduling invocation mechanisms helps to rectify the current imbalance of traditional hardware invocation mechanisms (timers and interrupts) and software scheduling policy (operating system scheduler). We re-factor these mechanisms into a unified hardware software priority scheduling model to facilitate improvements in performance, timeliness and determinism in all domains of computing. This dissertation demonstrates and prototypes the creation of a new framework that effects this basic policy change. The advantage of this approach lies within it's ability to unify, simplify and allow for more control within the operating systems scheduling policy

    Impact of DM-LRU on WCET: A Static Analysis Approach

    Get PDF
    Cache memories in modern embedded processors are known to improve average memory access performance. Unfortunately, they are also known to represent a major source of unpredictability for hard real-time workload. One of the main limitations of typical caches is that content selection and replacement is entirely performed in hardware. As such, it is hard to control the cache behavior in software to favor caching of blocks that are known to have an impact on an application\u27s worst-case execution time (WCET). In this paper, we consider a cache replacement policy, namely DM-LRU, that allows system designers to prioritize caching of memory blocks that are known to have an important impact on an application\u27s WCET. Considering a single-core, single-level cache hierarchy, we describe an abstract interpretation-based timing analysis for DM-LRU. We implement the proposed analysis in a self-contained toolkit and study its qualitative properties on a set of representative benchmarks. Apart from being useful to compute the WCET when DM-LRU or similar policies are used, the proposed analysis can allow designers to perform WCET impact-aware selection of content to be retained in cache

    Real-time operating system support for multicore applications

    Get PDF
    Tese (doutorado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Engenharia de Automação e Sistemas, Florianópolis, 2014Plataformas multiprocessadas atuais possuem diversos níveis da memória cache entre o processador e a memória principal para esconder a latência da hierarquia de memória. O principal objetivo da hierarquia de memória é melhorar o tempo médio de execução, ao custo da previsibilidade. O uso não controlado da hierarquia da cache pelas tarefas de tempo real impacta a estimativa dos seus piores tempos de execução, especialmente quando as tarefas de tempo real acessam os níveis da cache compartilhados. Tal acesso causa uma disputa pelas linhas da cache compartilhadas e aumenta o tempo de execução das aplicações. Além disso, essa disputa na cache compartilhada pode causar a perda de prazos, o que é intolerável em sistemas de tempo real críticos. O particionamento da memória cache compartilhada é uma técnica bastante utilizada em sistemas de tempo real multiprocessados para isolar as tarefas e melhorar a previsibilidade do sistema. Atualmente, os estudos que avaliam o particionamento da memória cache em multiprocessadores carecem de dois pontos fundamentais. Primeiro, o mecanismo de particionamento da cache é tipicamente implementado em um ambiente simulado ou em um sistema operacional de propósito geral. Consequentemente, o impacto das atividades realizados pelo núcleo do sistema operacional, tais como o tratamento de interrupções e troca de contexto, no particionamento das tarefas tende a ser negligenciado. Segundo, a avaliação é restrita a um escalonador global ou particionado, e assim não comparando o desempenho do particionamento da cache em diferentes estratégias de escalonamento. Ademais, trabalhos recentes confirmaram que aspectos da implementação do SO, tal como a estrutura de dados usada no escalonamento e os mecanismos de tratamento de interrupções, impactam a escalonabilidade das tarefas de tempo real tanto quanto os aspectos teóricos. Entretanto, tais estudos também usaram sistemas operacionais de propósito geral com extensões de tempo real, que afetamos sobre custos de tempo de execução observados e a escalonabilidade das tarefas de tempo real. Adicionalmente, os algoritmos de escalonamento tempo real para multiprocessadores atuais não consideram cenários onde tarefas de tempo real acessam as mesmas linhas da cache, o que dificulta a estimativa do pior tempo de execução. Esta pesquisa aborda os problemas supracitados com as estratégias de particionamento da cache e com os algoritmos de escalonamento tempo real multiprocessados da seguinte forma. Primeiro, uma infraestrutura de tempo real para multiprocessadores é projetada e implementada em um sistema operacional embarcado. A infraestrutura consiste em diversos algoritmos de escalonamento tempo real, tais como o EDF global e particionado, e um mecanismo de particionamento da cache usando a técnica de coloração de páginas. Segundo, é apresentada uma comparação em termos da taxa de escalonabilidade considerando o sobre custo de tempo de execução da infraestrutura criada e de um sistema operacional de propósito geral com extensões de tempo real. Em alguns casos, o EDF global considerando o sobre custo do sistema operacional embarcado possui uma melhor taxa de escalonabilidade do que o EDF particionado com o sobre custo do sistema operacional de propósito geral, mostrando claramente como diferentes sistemas operacionais influenciam os escalonadores de tempo real críticos em multiprocessadores. Terceiro, é realizada uma avaliação do impacto do particionamento da memória cache em diversos escalonadores de tempo real multiprocessados. Os resultados desta avaliação indicam que um sistema operacional "leve" não compromete as garantias de tempo real e que o particionamento da cache tem diferentes comportamentos dependendo do escalonador e do tamanho do conjunto de trabalho das tarefas. Quarto, é proposto um algoritmo de particionamento de tarefas que atribui as tarefas que compartilham partições ao mesmo processador. Os resultados mostram que essa técnica de particionamento de tarefas reduz a disputa pelas linhas da cache compartilhadas e provê garantias de tempo real para sistemas críticos. Finalmente, é proposto um escalonador de tempo real de duas fases para multiprocessadores. O escalonador usa informações coletadas durante o tempo de execução das tarefas através dos contadores de desempenho em hardware. Com base nos valores dos contadores, o escalonador detecta quando tarefas de melhor esforço o interferem com tarefas de tempo real na cache. Assim é possível impedir que tarefas de melhor esforço acessem as mesmas linhas da cache que tarefas de tempo real. O resultado desta estratégia de escalonamento é o atendimento dos prazos críticos e não críticos das tarefas de tempo real.Abstracts: Modern multicore platforms feature multiple levels of cache memory placed between the processor and main memory to hide the latency of ordinary memory systems. The primary goal of this cache hierarchy is to improve average execution time (at the cost of predictability). The uncontrolled use of the cache hierarchy by realtime tasks may impact the estimation of their worst-case execution times (WCET), specially when real-time tasks access a shared cache level, causing a contention for shared cache lines and increasing the application execution time. This contention in the shared cache may leadto deadline losses, which is intolerable particularly for hard real-time (HRT) systems. Shared cache partitioning is a well-known technique used in multicore real-time systems to isolate task workloads and to improve system predictability. Presently, the state-of-the-art studies that evaluate shared cache partitioning on multicore processors lack two key issues. First, the cache partitioning mechanism is typically implemented either in a simulated environment or in a general-purpose OS (GPOS), and so the impact of kernel activities, such as interrupt handlers and context switching, on the task partitions tend to be overlooked. Second, the evaluation is typically restricted to either a global or partitioned scheduler, thereby by falling to compare the performance of cache partitioning when tasks are scheduled by different schedulers. Furthermore, recent works have confirmed that OS implementation aspects, such as the choice of scheduling data structures and interrupt handling mechanisms, impact real-time schedulability as much as scheduling theoretic aspects. However, these studies also used real-time patches applied into GPOSes, which affects the run-time overhead observed in these works and consequently the schedulability of real-time tasks. Additionally, current multicore scheduling algorithms do not consider scenarios where real-time tasks access the same cache lines due to true or false sharing, which also impacts the WCET. This thesis addresses these aforementioned problems with cache partitioning techniques and multicore real-time scheduling algorithms as following. First, a real-time multicore support is designed and implemented on top of an embedded operating system designed from scratch. This support consists of several multicore real-time scheduling algorithms, such as global and partitioned EDF, and a cache partitioning mechanism based on page coloring. Second, it is presented a comparison in terms of schedulability ratio considering the run-time overhead of the implemented RTOS and a GPOS patched with real-time extensions. In some cases, Global-EDF considering the overhead of the RTOS is superior to Partitioned-EDF considering the overhead of the patched GPOS, which clearly shows how different OSs impact hard realtime schedulers. Third, an evaluation of the cache partitioning impacton partitioned, clustered, and global real-time schedulers is performed.The results indicate that a lightweight RTOS does not impact real-time tasks, and shared cache partitioning has different behavior depending on the scheduler and the task's working set size. Fourth, a task partitioning algorithm that assigns tasks to cores respecting their usage of cache partitions is proposed. The results show that by simply assigning tasks that shared cache partitions to the same processor, it is possible to reduce the contention for shared cache lines and to provideHRT guarantees. Finally, a two-phase multicore scheduler that provides HRT and soft real-time (SRT) guarantees is proposed. It is shown that by using information from hardware performance counters at run-time, the RTOS can detect when best-effort tasks interfere with real-time tasks in the shared cache. Then, the RTOS can prevent best effort tasks from interfering with real-time tasks. The results also show that the assignment of exclusive partitions to HRT tasks together with the two-phase multicore scheduler provides HRT and SRT guarantees, even when best-effort tasks share partitions with real-time tasks

    Placement, ordonnancement et mécanismes de migration de tâches temps-réel pour des architectures distribuées multicoeurs

    Get PDF
    Les systèmes temps-réel embarqués critiques intègrent un nombre croissant de fonctionnalités comme le montrent les domaines de l'automobile ou de l'aéronautique. Ces systèmes doivent offrir un niveau maximal de sûreté de fonctionnement en disposant des mécanismes pour traiter les défaillances éventuelles et doivent être également performants, avec le respect de contraintes temps-réel strictes. Ces systèmes sont en outre contraints par leur nature embarquée : les ressources sont limitées, tels que par exemple leur espace mémoire et leur capacité de calcul. Dans cette thèse, nous traitons deux problématiques principales de ce type de systèmes. La première porte sur la manière d'apporter une meilleure tolérance aux fautes dans les systèmes temps-réel distribués subissant des défaillances matérielles multiples et permanentes. Ces systèmes sont souvent conçus avec une allocation statique des tâches. Une approche plus exible effectuant des recon gurations est utile si elle permet d'optimiser l'allocation à chaque défaillance rencontrée, pour les ressources restantes. Nous proposons une telle approche hors-ligne assurant un dimensionnement adapté pour prendre en compte les ressources nécessaires à l'exécution de ces actions. Ces recon gurations peuvent demander une réallocation des tâches ou répliques si l'espace mémoire local est limité. Dans un contexte temps-réel strict, nous dé nissons notamment des mécanismes et des techniques de migration garantissant l'ordonnançabilité globale du système. La deuxième problématique se focalise sur l'optimisation de l'exécution des tâches au niveau local dans un contexte multicoeurs préemptif. Nous proposons une méthode d'ordonnancement optimal disposant d'une meilleure extensibilité que les approches existantes en minimisant les surcoûts : le nombre de changements de contexte préemptions et migrations locales) et la complexité de l'ordonnanceurCritical real-time embedded systems are integrating an increasing number of functionalities, as shown in automotive domain or aeronautics. These systems require high dependability including mechanisms to handle possible failures and have to be effective, meeting hard real-time constraints. These systems are also constrained by their embedded nature : resources are limited, such as their memory and their computing capacities. In this thesis, we focus on two main problems for this type of systems. The rst one is about a way to bring a better fault-tolerance in distributed real-time systems when multiple and permanent hardware failures can occur. In classical systems, the design is limited to a static task assignment. A more exible approach exploiting recon gurations is useful if it allows to optimize assignment at each failure for the remaining resources. We propose an off-line approach to obtain an adapted sizing taking into account necessary resources to execute these actions. These recon gurations may require to reallocate tasks or replicas if memory capacities are limited. In a hard real-time context, we de ne mechanisms and migration techniques to guarantee global schedulability of the system. The second problem focus on optimizing performance to run tasks at a local level in a multicore preemptive context. We propose an optimal scheduling method allowing a better scalability than existing approaches by minimizing overheads : the number of context switches (local preemptions and migrations) and the scheduler complexityTOULOUSE-INP (315552154) / SudocSudocFranceF

    Placement, ordonnancement et mécanismes de migration de tâches temps-réel pour des architectures distribuées multicoeurs

    Get PDF
    Les systèmes temps-réel embarqués critiques intègrent un nombre croissant de fonctionnalités comme le montrent les domaines de l'automobile ou de l'aéronautique. Ces systèmes doivent offrir un niveau maximal de sûreté de fonctionnement en disposant des mécanismes pour traiter les défaillances éventuelles et doivent être également performants, avec le respect de contraintes temps-réel strictes. Ces systèmes sont en outre contraints par leur nature embarquée : les ressources sont limitées, tels que par exemple leur espace mémoire et leur capacité de calcul. Dans cette thèse, nous traitons deux problématiques principales de ce type de systèmes. La première porte sur la manière d'apporter une meilleure tolérance aux fautes dans les systèmes temps-réel distribués subissant des défaillances matérielles multiples et permanentes. Ces systèmes sont souvent conçus avec une allocation statique des tâches. Une approche plus flexible effectuant des reconfigurations est utile si elle permet d'optimiser l'allocation à chaque défaillance rencontrée, pour les ressources restantes. Nous proposons une telle approche hors-ligne assurant un dimensionnement adapté pour prendre en compte les ressources nécessaires à l'exécution de ces actions. Ces reconfigurations peuvent demander une réallocation des tâches ou répliques si l'espace mémoire local est limité. Dans un contexte temps-réel strict, nous définissons notamment des mécanismes et des techniques de migration garantissant l'ordonnançabilité globale du système. La deuxième problématique se focalise sur l'optimisation de l'exécution des tâches au niveau local dans un contexte multicoeurs préemptif. Nous proposons une méthode d'ordonnancement optimal disposant d'une meilleure extensibilité que les approches existantes en minimisant les surcoûts : le nombre de changements de contexte préemptions et migrations locales) et la complexité de l'ordonnanceur. ABSTRACT : Critical real-time embedded systems are integrating an increasing number of functionalities, as shown in automotive domain or aeronautics. These systems require high dependability including mechanisms to handle possible failures and have to be effective, meeting hard real-time constraints. These systems are also constrained by their embedded nature : resources are limited, such as their memory and their computing capacities. In this thesis, we focus on two main problems for this type of systems. The first one is about a way to bring a better fault-tolerance in distributed real-time systems when multiple and permanent hardware failures can occur. In classical systems, the design is limited to a static task assignment. A more flexible approach exploiting reconfigurations is useful if it allows to optimize assignment at each failure for the remaining resources. We propose an off-line approach to obtain an adapted sizing taking into account necessary resources to execute these actions. These reconfigurations may require to reallocate tasks or replicas if memory capacities are limited. In a hard real-time context, we define mechanisms and migration techniques to guarantee global schedulability of the system. The second problem focus on optimizing performance to run tasks at a local level in a multicore preemptive context. We propose an optimal scheduling method allowing a better scalability than existing approaches by minimizing overheads : the number of context switches (local preemptions and migrations) and the scheduler complexity
    corecore