5 research outputs found

    Cache-Conscious Offline Real-Time Task Scheduling for Multi-Core Processors

    Get PDF
    Most schedulability analysis techniques for multi-core architectures assume a single Worst-Case Execution Time (WCET) per task, which is valid in all execution conditions. This assumption is too pessimistic for parallel applications running on multi-core architectures with local instruction or data caches, for which the WCET of a task depends on the cache contents at the beginning of its execution, itself depending on the task that was executed before the task under study. In this paper, we propose two scheduling techniques for multi-core architectures equipped with local instruction and data caches. The two techniques schedule a parallel application modeled as a task graph, and generate a static partitioned non-preemptive schedule. We propose an optimal method, using an Integer Linear Programming (ILP) formulation, as well as a heuristic method based on list scheduling. Experimental results show that by taking into account the effect of private caches on tasks\u27 WCETs, the length of generated schedules is significantly reduced as compared to schedules generated by cache-unaware scheduling methods. The observed schedule length reduction on streaming applications is 11% on average for the optimal method and 9% on average for the heuristic method

    Analysis and simulation of scheduling techniques for real-time embedded multi-core architectures

    Get PDF
    In this modern era of technological progress, multi-core processors have brought significant and consequential improvements in the available processing potential to the world of real-time embedded systems. These improvements impose a rapid increment of software complexity as well as processing demand placed on the underlying hardware. As a consequence, the need for efficient yet predictable multi-core scheduling techniques is on the rise. As part of this thesis, in-depth research of currently available multi-core scheduling techniques, belonging to both partitioned and global approaches, is done in the context of real-time embedded systems. The emphasis is on the degree of their usability on hard real-time systems, focusing on the scheduling techniques offering better processor affinity and the lower number of context switching. Also, an extensive research of currently available real-time test-beds as well as real-time operating systems is performed. Finally, a subset of the analyzed multi-core scheduling techniques comprising PSN-EDF, GSN-EDF, PD2^{2} and PD2∗^{2*} is simulated on the real-time test-bed LITMUSRT^{RT}

    Cache-conscious Off-Line Real-Time Scheduling for Multi-Core Platforms: Algorithms and Implementation

    Get PDF
    International audienceMost schedulability analysis techniques for multi-core architectures assume a single Worst-Case Execution Time (WCET) per task, which is valid in all execution conditions. This assumption is too pessimistic for parallel applications running on multi-core architectures with local instruction or data caches, for which the WCET of a task depends on the cache contents at the beginning of its execution, itself depending on the tasks that were executed immediately before the task under study. In this paper, we propose two scheduling techniques for multi-core architectures equipped with local instruction and data caches. The two techniques schedule a parallel application modeled as a task graph, and generate a static partitioned non-preemptive schedule, that takes benefit of cache reuse between pairs of consecutive tasks. We propose an exact method, using an Integer Linear Programming (ILP) formulation, as well as a heuristic method based on list scheduling. The efficiency of the techniques is demonstrated through an implementation of these cache-conscious schedules on a real multi-core hardware: a 16-core cluster of the Kalray MPPA-256, Andey generation. We point out implementation issues that arise when implementing the schedules on this particular platform. In addition, we propose strategies to adapt the schedules to the identified implementation factors. An experimental evaluation reveals that our proposed scheduling methods significantly reduce the length of schedules as compared to cache-agnostic scheduling methods. Furthermore, our experiments show that among the identified implementation factors, shared bus contention has the most impact

    Ordonnancement temps-rĂ©el conscient des caches dans des architectures multi-cƓurs : algorithmes et rĂ©alisation

    Get PDF
    Nowadays, real-time applications are more compute-intensive as more functionalities are introduced. Multi-core platforms have been released to satisfy the computing demand while reducing the size, weight, and power requirements. The most significant challenge when deploying real-time systems on multi-core platforms is to guarantee the real-time constraints of hard real-time applications on such platforms. This is caused by interdependent problems, referred to as a chicken and egg situation, which is explained as follows. Due to the effect of multi-core hardware, such as local caches and shared hardware resources, the timing behavior of tasks are strongly influenced by their execution context (i.e., co-located tasks, concurrent tasks), which are determined by scheduling strategies. Symmetrically, scheduling algorithms require the Worst-Case Execution Time (WCET) of tasks as prior knowledge to determine their allocation and their execution order.Most schedulability analysis techniques for multi-core architectures assume a single WCET per task, which is valid in all execution conditions. This assumption is too pes- simistic for parallel applications running on multi-core architectures with local caches. In such architectures, the WCET of a task depends on the cache contents at the beginning of its execution, itself depending on the task that was executed before the task under study. In this thesis, we address the issue by proposing scheduling algorithms that take into account context-sensitive WCETs of tasks due to the effect of private caches.We propose two scheduling techniques for multi-core architectures equipped with local caches. The two techniques schedule a parallel application modeled as a task graph, and generate a static partitioned non-preemptive schedule. We propose an optimal method, using an Integer Linear Programming (ILP) formulation, as well as a heuristic method based on list scheduling. Experimental results show that by taking into account the effect of private caches on tasks’ WCETs, the length of generated schedules are significantly reduced as compared to schedules generated by cache-unaware scheduling methods.Furthermore, we perform the implementation of time-driven cache-conscious schedules on the Kalray MPPA-256 machine, a clustered many-core platform. We first identify the practical challenges arising when implementing time-driven cache-conscious schedules on the machine, including cache pollution caused by the scheduler, shared bus contention, delay to the start time of tasks, and the absence of data cache coherence. We then propose our strategies including an ILP formulation for adapting cache-conscious schedules to the identified practical factors, and a method for generating the code of applications to be executed on the machine. Experimental validation shows the functional and the temporal correctness of our implementation. Additionally, shared bus contention is observed to be the most impacting factor on the length of adapted cache-conscious schedules.Les temps avancent et les applications temps-réel deviennent de plus en plus gourmandes en ressources. Les plateformes multi-cƓurs sont apparues dans le but de satisfaire les demandes des applications en ressources, tout en réduisant la taille, le poids, et la consommation énergétique. Le challenge le plus pertinent, lors du déploiement d’un système temps-réel sur une plateforme multi-cƓur, est de garantir les contraintes temporelles des applications temps réel strict s’exécutant sur de telles plateformes. La difficulté de ce challenge provient d’une interdépendance entre les analyses de prédictabilité temporelle. Cette interdépendance peut être figurativement liée au problème philosophique de l’Ɠuf et de la poule, et expliqué comme suit. L’un des pré-requis des algorithmes d’ordonnancement est le Pire Temps d’Exécution (PTE) des tâches pour déterminer leur placement et leur ordre d’exécution. Mais ce PTE est lui aussi influencé par les décisions de l’ordonnanceur qui va déterminer quelles sont les tâches co-localisées ou concurrentes propageant des effets sur les caches locaux et les ressources physiquement partagées et donc le PTE.La plupart des méthodes d’analyse pour les architectures multi-cƓurs supputent un seul PTE par tâche, lequel est valide pour toutes conditions d’exécutions confondues. Cette hypothèse est beaucoup trop pessimiste pour entrevoir un gain de performance sur des architectures dotées de caches locaux. Pour de telles architectures, le PTE d’une tâche est dépendant du contenu du cache au début de l’exécution de la dite tâche, qui est lui-même dépendant de la tâche exécutée avant et ainsi de suite. Dans cette thèse, nous proposons de prendre en compte des PTEs incluant les effets des caches privés sur le contexte d’éxecution de chaque tâche.Nous proposons dans cette thèse deux techniques d’ordonnancement ciblant des archi- tectures multi-cƓurs équipées de caches locaux. Ces deux techniques ordonnancent une application parallèle modélisée par un graphe de tâches, et génèrent un planning statique partitionné et non-préemptif. Nous proposons une méthode optimale à base de Programma- tion Linéaire en Nombre Entier (PLNE), ainsi qu’une méthode de résolution par heuristique basée sur de l’ordonnancement par liste. Les résultats expérimentaux montrent que la prise en compte des effets des caches privés sur les PTE des tâches réduit significativement la longueur des ordonnancements générés, ce comparé à leur homologue ignorant les caches locaux.Afin de parfaire les résultats ainsi obtenus, nous avons réalisé l’implémentation de nos ordonnancements dirigés par le temps et conscients du cache pour un déploiement sur une machine Kalray MPPA-256, une plateforme multi-cƓur en grappes (clusters). En premier lieu, nous avons identifé les challenges réels survenant lors de ce type d’implémentation, tel que la pollution des caches, la contention induite par le partage du bus, les délais de lancement d’une tâche introduits par la présence de l’ordonnanceur, et l’absence de cohérence des caches de données. En second lieu, nous proposons des stratégies adaptées et incluant, dans la formulation PLNE, les contraintes matérielles ; ainsi qu’une méthode permettant de générer le code final de l’application. Enfin, l’évaluation expérimentale valide la correction fonctionnelle et temporelle de notre implémentation pendant laquelle nous avons pu observé le facteur le plus impactant la longeur de l’ordonnancement: la contention

    Ordonnanceur hors-ligne temps-réel et conscient du cache ciblant les architectures multi-coeurs : algorithmes et implémentations

    No full text
    Les temps avancent et les applications temps-rĂ©el deviennent de plus en plus gourmandes en ressources. Les plate-formes multi-cƓurs sont apparues dans le but de satisfaire les demandes des applications en ressources, tout en rĂ©duisant la taille, le poids, et la consommation Ă©nergĂ©tique. Le challenge le plus pertinent, lors du dĂ©ploiement d'un systĂšme temps-rĂ©el sur une plate-forme multi-cƓur, est de garantir les contraintes temporelles des applications temps rĂ©el strict s'exĂ©cutant sur de telles plate-formes. La difficultĂ© de ce challenge provient d'une interdĂ©pendance entre les analyses de prĂ©dictabilitĂ© temporelle. Cette interdĂ©pendance peut ĂȘtre figurativement liĂ©e au problĂšme philosophique de l'Ɠuf et de la poule, et expliquĂ© comme suit. L'un des prĂ©-requis des algorithmes d'ordonnancement est le Pire Temps d'ExĂ©cution (PTE) des tĂąches pour dĂ©terminer leur placement et leur ordre d'exĂ©cution. Mais ce PTE est lui aussi influencĂ© par les dĂ©cisions de l'ordonnanceur qui va dĂ©terminer quelles sont les tĂąches co-localisĂ©es ou concurrentes propageant des effets sur les caches locaux et les ressources physiquement partagĂ©es et donc le PTE. La plupart des mĂ©thodes d'analyse pour les architectures multi-cƓurs supputent un seul PTE par tĂąche, lequel est valide pour toutes conditions d'exĂ©cutions confondues. Cette hypothĂšse est beaucoup trop pessimiste pour entrevoir un gain de performance sur des architectures dotĂ©es de caches locaux. Pour de telles architectures, le PTE d'une tĂąche est dĂ©pendant du contenu du cache au dĂ©but de l'exĂ©cution de la dite tĂąche, qui est lui-mĂȘme dĂ©pendant de la tĂąche exĂ©cutĂ©e avant et ainsi de suite. Dans cette thĂšse, nous proposons de prendre en compte des PTEs incluant les effets des caches privĂ©s sur le contexte d’exĂ©cution de chaque tĂąche. Nous proposons dans cette thĂšse deux techniques d'ordonnancement ciblant des architectures multi-cƓurs Ă©quipĂ©es de caches locaux. Ces deux techniques ordonnancent une application parallĂšle modĂ©lisĂ©e par un graphe de tĂąches, et gĂ©nĂšrent un planning statique partitionnĂ© et non-prĂ©emptif. Nous proposons une mĂ©thode optimale Ă  base de Programmation LinĂ©aire en Nombre Entier (PLNE), ainsi qu'une mĂ©thode de rĂ©solution par heuristique basĂ©e sur de l'ordonnancement par liste. Les rĂ©sultats expĂ©rimentaux montrent que la prise en compte des effets des caches privĂ©s sur les PTE des tĂąches rĂ©duit significativement la longueur des ordonnancements gĂ©nĂ©rĂ©s, ce comparĂ© Ă  leur homologue ignorant les caches locaux. Afin de parfaire les rĂ©sultats ainsi obtenus, nous avons rĂ©alisĂ© l'implĂ©mentation de nos ordonnancements dirigĂ©s par le temps et conscients du cache pour un dĂ©ploiement sur une machine Kalray MPPA-256, une plate-forme multi-cƓur en grappes (clusters). En premier lieu, nous avons identifiĂ© les challenges rĂ©els survenant lors de ce type d'implĂ©mentation, tel que la pollution des caches, la contention induite par le partage du bus, les dĂ©lais de lancement d'une tĂąche introduits par la prĂ©sence de l'ordonnanceur, et l'absence de cohĂ©rence des caches de donnĂ©es. En second lieu, nous proposons des stratĂ©gies adaptĂ©es et incluant, dans la formulation PLNE, les contraintes matĂ©rielles ; ainsi qu'une mĂ©thode permettant de gĂ©nĂ©rer le code final de l'application. Enfin, l'Ă©valuation expĂ©rimentale valide la correction fonctionnelle et temporelle de notre implĂ©mentation pendant laquelle nous avons pu observĂ© le facteur le plus impactant la longueur de l'ordonnancement: la contention.Nowadays, real-time applications are more compute-intensive as more functionalities are introduced. Multi-core platforms have been released to satisfy the computing demand while reducing the size, weight, and power requirements. The most significant challenge when deploying real-time systems on multi-core platforms is to guarantee the real-time constraints of hard real-time applications on such platforms. This is caused by interdependent problems, referred to as a chicken and egg situation, which is explained as follows. Due to the effect of multi-core hardware, such as local caches and shared hardware resources, the timing behavior of tasks are strongly influenced by their execution context (i.e., co-located tasks, concurrent tasks), which are determined by scheduling strategies. Symetrically, scheduling algorithms require the Worst-Case Execution Time (WCET) of tasks as prior knowledge to determine their allocation and their execution order. Most schedulability analysis techniques for multi-core architectures assume a single WCET per task, which is valid in all execution conditions. This assumption is too pessimistic for parallel applications running on multi-core architectures with local caches. In such architectures, the WCET of a task depends on the cache contents at the beginning of its execution, itself depending on the task that was executed before the task under study. In this thesis, we address the issue by proposing scheduling algorithms that take into account context-sensitive WCETs of tasks due to the effect of private caches. We propose two scheduling techniques for multi-core architectures equipped with local caches. The two techniques schedule a parallel application modeled as a task graph, and generate a static partitioned non-preemptive schedule. We propose an optimal method, using an Integer Linear Programming (ILP) formulation, as well as a heuristic method based on list scheduling. Experimental results show that by taking into account the effect of private caches on tasks’ WCETs, the length of generated schedules are significantly reduced as compared to schedules generated by cache-unaware scheduling methods. Furthermore, we perform the implementation of time-driven cache-conscious schedules on the Kalray MPPA-256 machine, a clustered many-core platform. We first identify the practical challenges arising when implementing time-driven cache-conscious schedules on the machine, including cache pollution cause by the scheduler, shared bus contention, delay to the start time of tasks, and data cache inconsistency. We then propose our strategies including an ILP formulation for adapting cache-conscious schedules to the identified practical factors, and a method for generating the code of applications to be executed on the machine. Experimental validation shows the functional and the temporal correctness of our implementation. Additionally, shared bus contention is observed to be the most impacting factor on the length of adapted cache-conscious schedules
    corecore