5 research outputs found
Cache-Conscious Offline Real-Time Task Scheduling for Multi-Core Processors
Most schedulability analysis techniques for multi-core architectures assume a single Worst-Case Execution Time (WCET) per task, which is valid in all execution conditions. This assumption is too pessimistic for parallel applications running on multi-core architectures with local instruction or data caches, for which the WCET of a task depends on the cache contents at the beginning of its execution, itself depending on the task that was executed before the task under study.
In this paper, we propose two scheduling techniques for multi-core architectures equipped with local instruction and data caches. The two techniques schedule a parallel application modeled as a task graph, and generate a static partitioned non-preemptive schedule. We propose an optimal method, using an Integer Linear Programming (ILP) formulation, as well as a heuristic method based on list scheduling. Experimental results show that by taking into account the effect of private caches on tasks\u27 WCETs, the length of generated schedules is significantly reduced as compared to schedules generated by cache-unaware scheduling methods. The observed schedule length reduction on streaming applications is 11% on average for the optimal method and 9% on average for the heuristic method
Analysis and simulation of scheduling techniques for real-time embedded multi-core architectures
In this modern era of technological progress, multi-core processors have brought significant and consequential improvements in the available processing potential to the world of real-time embedded systems. These improvements impose a rapid increment of software complexity as well as processing demand placed on the underlying hardware. As a consequence, the need for efficient yet predictable multi-core scheduling techniques is on the rise.
As part of this thesis, in-depth research of currently available multi-core scheduling techniques, belonging to both partitioned and global approaches, is done in the context of real-time embedded systems. The emphasis is on the degree of their usability on hard real-time systems, focusing on the scheduling techniques offering better processor affinity and the lower number of context switching. Also, an extensive research of currently available real-time test-beds as well as real-time operating systems is performed.
Finally, a subset of the analyzed multi-core scheduling techniques comprising PSN-EDF, GSN-EDF, PD and PD is simulated on the real-time test-bed LITMUS
Cache-conscious Off-Line Real-Time Scheduling for Multi-Core Platforms: Algorithms and Implementation
International audienceMost schedulability analysis techniques for multi-core architectures assume a single Worst-Case Execution Time (WCET) per task, which is valid in all execution conditions. This assumption is too pessimistic for parallel applications running on multi-core architectures with local instruction or data caches, for which the WCET of a task depends on the cache contents at the beginning of its execution, itself depending on the tasks that were executed immediately before the task under study. In this paper, we propose two scheduling techniques for multi-core architectures equipped with local instruction and data caches. The two techniques schedule a parallel application modeled as a task graph, and generate a static partitioned non-preemptive schedule, that takes benefit of cache reuse between pairs of consecutive tasks. We propose an exact method, using an Integer Linear Programming (ILP) formulation, as well as a heuristic method based on list scheduling. The efficiency of the techniques is demonstrated through an implementation of these cache-conscious schedules on a real multi-core hardware: a 16-core cluster of the Kalray MPPA-256, Andey generation. We point out implementation issues that arise when implementing the schedules on this particular platform. In addition, we propose strategies to adapt the schedules to the identified implementation factors. An experimental evaluation reveals that our proposed scheduling methods significantly reduce the length of schedules as compared to cache-agnostic scheduling methods. Furthermore, our experiments show that among the identified implementation factors, shared bus contention has the most impact
Ordonnancement temps-rĂ©el conscient des caches dans des architectures multi-cĆurs : algorithmes et rĂ©alisation
Nowadays, real-time applications are more compute-intensive as more functionalities are introduced. Multi-core platforms have been released to satisfy the computing demand while reducing the size, weight, and power requirements. The most significant challenge when deploying real-time systems on multi-core platforms is to guarantee the real-time constraints of hard real-time applications on such platforms. This is caused by interdependent problems, referred to as a chicken and egg situation, which is explained as follows. Due to the effect of multi-core hardware, such as local caches and shared hardware resources, the timing behavior of tasks are strongly influenced by their execution context (i.e., co-located tasks, concurrent tasks), which are determined by scheduling strategies. Symmetrically, scheduling algorithms require the Worst-Case Execution Time (WCET) of tasks as prior knowledge to determine their allocation and their execution order.Most schedulability analysis techniques for multi-core architectures assume a single WCET per task, which is valid in all execution conditions. This assumption is too pes- simistic for parallel applications running on multi-core architectures with local caches. In such architectures, the WCET of a task depends on the cache contents at the beginning of its execution, itself depending on the task that was executed before the task under study. In this thesis, we address the issue by proposing scheduling algorithms that take into account context-sensitive WCETs of tasks due to the effect of private caches.We propose two scheduling techniques for multi-core architectures equipped with local caches. The two techniques schedule a parallel application modeled as a task graph, and generate a static partitioned non-preemptive schedule. We propose an optimal method, using an Integer Linear Programming (ILP) formulation, as well as a heuristic method based on list scheduling. Experimental results show that by taking into account the effect of private caches on tasksâ WCETs, the length of generated schedules are significantly reduced as compared to schedules generated by cache-unaware scheduling methods.Furthermore, we perform the implementation of time-driven cache-conscious schedules on the Kalray MPPA-256 machine, a clustered many-core platform. We first identify the practical challenges arising when implementing time-driven cache-conscious schedules on the machine, including cache pollution caused by the scheduler, shared bus contention, delay to the start time of tasks, and the absence of data cache coherence. We then propose our strategies including an ILP formulation for adapting cache-conscious schedules to the identified practical factors, and a method for generating the code of applications to be executed on the machine. Experimental validation shows the functional and the temporal correctness of our implementation. Additionally, shared bus contention is observed to be the most impacting factor on the length of adapted cache-conscious schedules.Les temps avancent et les applications temps-reÌel deviennent de plus en plus gourmandes en ressources. Les plateformes multi-cĆurs sont apparues dans le but de satisfaire les demandes des applications en ressources, tout en reÌduisant la taille, le poids, et la consommation eÌnergeÌtique. Le challenge le plus pertinent, lors du deÌploiement dâun systeÌme temps-reÌel sur une plateforme multi-cĆur, est de garantir les contraintes temporelles des applications temps reÌel strict sâexeÌcutant sur de telles plateformes. La difficulteÌ de ce challenge provient dâune interdeÌpendance entre les analyses de preÌdictabiliteÌ temporelle. Cette interdeÌpendance peut eÌtre figurativement lieÌe au probleÌme philosophique de lâĆuf et de la poule, et expliqueÌ comme suit. Lâun des preÌ-requis des algorithmes dâordonnancement est le Pire Temps dâExeÌcution (PTE) des taÌches pour deÌterminer leur placement et leur ordre dâexeÌcution. Mais ce PTE est lui aussi influenceÌ par les deÌcisions de lâordonnanceur qui va deÌterminer quelles sont les taÌches co-localiseÌes ou concurrentes propageant des effets sur les caches locaux et les ressources physiquement partageÌes et donc le PTE.La plupart des meÌthodes dâanalyse pour les architectures multi-cĆurs supputent un seul PTE par taÌche, lequel est valide pour toutes conditions dâexeÌcutions confondues. Cette hypotheÌse est beaucoup trop pessimiste pour entrevoir un gain de performance sur des architectures doteÌes de caches locaux. Pour de telles architectures, le PTE dâune taÌche est deÌpendant du contenu du cache au deÌbut de lâexeÌcution de la dite taÌche, qui est lui-meÌme deÌpendant de la taÌche exeÌcuteÌe avant et ainsi de suite. Dans cette theÌse, nous proposons de prendre en compte des PTEs incluant les effets des caches priveÌs sur le contexte dâeÌxecution de chaque taÌche.Nous proposons dans cette theÌse deux techniques dâordonnancement ciblant des archi- tectures multi-cĆurs eÌquipeÌes de caches locaux. Ces deux techniques ordonnancent une application paralleÌle modeÌliseÌe par un graphe de taÌches, et geÌneÌrent un planning statique partitionneÌ et non-preÌemptif. Nous proposons une meÌthode optimale aÌ base de Programma- tion LineÌaire en Nombre Entier (PLNE), ainsi quâune meÌthode de reÌsolution par heuristique baseÌe sur de lâordonnancement par liste. Les reÌsultats expeÌrimentaux montrent que la prise en compte des effets des caches priveÌs sur les PTE des taÌches reÌduit significativement la longueur des ordonnancements geÌneÌreÌs, ce compareÌ aÌ leur homologue ignorant les caches locaux.Afin de parfaire les reÌsultats ainsi obtenus, nous avons reÌaliseÌ lâimpleÌmentation de nos ordonnancements dirigeÌs par le temps et conscients du cache pour un deÌploiement sur une machine Kalray MPPA-256, une plateforme multi-cĆur en grappes (clusters). En premier lieu, nous avons identifeÌ les challenges reÌels survenant lors de ce type dâimpleÌmentation, tel que la pollution des caches, la contention induite par le partage du bus, les deÌlais de lancement dâune taÌche introduits par la preÌsence de lâordonnanceur, et lâabsence de coheÌrence des caches de donneÌes. En second lieu, nous proposons des strateÌgies adapteÌes et incluant, dans la formulation PLNE, les contraintes mateÌrielles ; ainsi quâune meÌthode permettant de geÌneÌrer le code final de lâapplication. Enfin, lâeÌvaluation expeÌrimentale valide la correction fonctionnelle et temporelle de notre impleÌmentation pendant laquelle nous avons pu observeÌ le facteur le plus impactant la longeur de lâordonnancement: la contention
Ordonnanceur hors-ligne temps-réel et conscient du cache ciblant les architectures multi-coeurs : algorithmes et implémentations
Les temps avancent et les applications temps-rĂ©el deviennent de plus en plus gourmandes en ressources. Les plate-formes multi-cĆurs sont apparues dans le but de satisfaire les demandes des applications en ressources, tout en rĂ©duisant la taille, le poids, et la consommation Ă©nergĂ©tique. Le challenge le plus pertinent, lors du dĂ©ploiement d'un systĂšme temps-rĂ©el sur une plate-forme multi-cĆur, est de garantir les contraintes temporelles des applications temps rĂ©el strict s'exĂ©cutant sur de telles plate-formes. La difficultĂ© de ce challenge provient d'une interdĂ©pendance entre les analyses de prĂ©dictabilitĂ© temporelle. Cette interdĂ©pendance peut ĂȘtre figurativement liĂ©e au problĂšme philosophique de l'Ćuf et de la poule, et expliquĂ© comme suit. L'un des prĂ©-requis des algorithmes d'ordonnancement est le Pire Temps d'ExĂ©cution (PTE) des tĂąches pour dĂ©terminer leur placement et leur ordre d'exĂ©cution. Mais ce PTE est lui aussi influencĂ© par les dĂ©cisions de l'ordonnanceur qui va dĂ©terminer quelles sont les tĂąches co-localisĂ©es ou concurrentes propageant des effets sur les caches locaux et les ressources physiquement partagĂ©es et donc le PTE. La plupart des mĂ©thodes d'analyse pour les architectures multi-cĆurs supputent un seul PTE par tĂąche, lequel est valide pour toutes conditions d'exĂ©cutions confondues. Cette hypothĂšse est beaucoup trop pessimiste pour entrevoir un gain de performance sur des architectures dotĂ©es de caches locaux. Pour de telles architectures, le PTE d'une tĂąche est dĂ©pendant du contenu du cache au dĂ©but de l'exĂ©cution de la dite tĂąche, qui est lui-mĂȘme dĂ©pendant de la tĂąche exĂ©cutĂ©e avant et ainsi de suite. Dans cette thĂšse, nous proposons de prendre en compte des PTEs incluant les effets des caches privĂ©s sur le contexte dâexĂ©cution de chaque tĂąche. Nous proposons dans cette thĂšse deux techniques d'ordonnancement ciblant des architectures multi-cĆurs Ă©quipĂ©es de caches locaux. Ces deux techniques ordonnancent une application parallĂšle modĂ©lisĂ©e par un graphe de tĂąches, et gĂ©nĂšrent un planning statique partitionnĂ© et non-prĂ©emptif. Nous proposons une mĂ©thode optimale Ă base de Programmation LinĂ©aire en Nombre Entier (PLNE), ainsi qu'une mĂ©thode de rĂ©solution par heuristique basĂ©e sur de l'ordonnancement par liste. Les rĂ©sultats expĂ©rimentaux montrent que la prise en compte des effets des caches privĂ©s sur les PTE des tĂąches rĂ©duit significativement la longueur des ordonnancements gĂ©nĂ©rĂ©s, ce comparĂ© Ă leur homologue ignorant les caches locaux. Afin de parfaire les rĂ©sultats ainsi obtenus, nous avons rĂ©alisĂ© l'implĂ©mentation de nos ordonnancements dirigĂ©s par le temps et conscients du cache pour un dĂ©ploiement sur une machine Kalray MPPA-256, une plate-forme multi-cĆur en grappes (clusters). En premier lieu, nous avons identifiĂ© les challenges rĂ©els survenant lors de ce type d'implĂ©mentation, tel que la pollution des caches, la contention induite par le partage du bus, les dĂ©lais de lancement d'une tĂąche introduits par la prĂ©sence de l'ordonnanceur, et l'absence de cohĂ©rence des caches de donnĂ©es. En second lieu, nous proposons des stratĂ©gies adaptĂ©es et incluant, dans la formulation PLNE, les contraintes matĂ©rielles ; ainsi qu'une mĂ©thode permettant de gĂ©nĂ©rer le code final de l'application. Enfin, l'Ă©valuation expĂ©rimentale valide la correction fonctionnelle et temporelle de notre implĂ©mentation pendant laquelle nous avons pu observĂ© le facteur le plus impactant la longueur de l'ordonnancement: la contention.Nowadays, real-time applications are more compute-intensive as more functionalities are introduced. Multi-core platforms have been released to satisfy the computing demand while reducing the size, weight, and power requirements. The most significant challenge when deploying real-time systems on multi-core platforms is to guarantee the real-time constraints of hard real-time applications on such platforms. This is caused by interdependent problems, referred to as a chicken and egg situation, which is explained as follows. Due to the effect of multi-core hardware, such as local caches and shared hardware resources, the timing behavior of tasks are strongly influenced by their execution context (i.e., co-located tasks, concurrent tasks), which are determined by scheduling strategies. Symetrically, scheduling algorithms require the Worst-Case Execution Time (WCET) of tasks as prior knowledge to determine their allocation and their execution order. Most schedulability analysis techniques for multi-core architectures assume a single WCET per task, which is valid in all execution conditions. This assumption is too pessimistic for parallel applications running on multi-core architectures with local caches. In such architectures, the WCET of a task depends on the cache contents at the beginning of its execution, itself depending on the task that was executed before the task under study. In this thesis, we address the issue by proposing scheduling algorithms that take into account context-sensitive WCETs of tasks due to the effect of private caches. We propose two scheduling techniques for multi-core architectures equipped with local caches. The two techniques schedule a parallel application modeled as a task graph, and generate a static partitioned non-preemptive schedule. We propose an optimal method, using an Integer Linear Programming (ILP) formulation, as well as a heuristic method based on list scheduling. Experimental results show that by taking into account the effect of private caches on tasksâ WCETs, the length of generated schedules are significantly reduced as compared to schedules generated by cache-unaware scheduling methods. Furthermore, we perform the implementation of time-driven cache-conscious schedules on the Kalray MPPA-256 machine, a clustered many-core platform. We first identify the practical challenges arising when implementing time-driven cache-conscious schedules on the machine, including cache pollution cause by the scheduler, shared bus contention, delay to the start time of tasks, and data cache inconsistency. We then propose our strategies including an ILP formulation for adapting cache-conscious schedules to the identified practical factors, and a method for generating the code of applications to be executed on the machine. Experimental validation shows the functional and the temporal correctness of our implementation. Additionally, shared bus contention is observed to be the most impacting factor on the length of adapted cache-conscious schedules